2016-04-15

sdcc, crc and queues

Hello,

this week i was busy working on my Z80 project. Though it always looks like moving in small circles, i made some progress.

In this post:

  • sdcc created broken code
  • Stager Electrics' programmer failed to program an eeprom
  • There is crc-16, crc-16 and crc-16
  • Stager Electrics' programmer failed to program an eeprom
  • Simple design of queues and how long can it take to spot an error
  • Stager Electrics' programmer failed to program an eeprom
  • sdcc could produce really fast code. could…

Stager Electrics

The code to detect whether i'm running on an e- or eeprom also write protects the eeprom (called SDP = software data protection) so that it cannot be overwritten when the program crashes. For that you just write certain bytes into certain addresses. Next i wrote a test message into the eeprom. This is done by writing the SDP sequence and then the bytes to program. Crashed at first as the destination buffer was calculated too small `:-) but worked on the second try.

Next i tried to overwrite an eeprom with the Stager Electrics programmer. Off course this did not work. It took 11 minutes to write the eeprom, and after that verify failed. The programmer knows the eeprom by name and manufacturer but cannot deactivate software data protection in the eeprom. And it can't erase the device as a whole. Luckily i have more than one of these eeproms.

In a later iteration of the rom i added code which, before doing anything else, tests whether an eeprom is inserted in the ram socket (you remember: they are pin-compatible); And if so, it disables software data protection and happily bails out with a blink code. Even later i added an option to disable SDP in the current eeprom. Now that Stager thing can program the eeproms again.

sdcc

I probably spent one full day (after work) on tracking down a not so reproducible crash when trying to read from the i2c eeprom on the SIO board. Finally i could prove it's an error in the C compiler. In a __critical function, which means that it is executed with interrupts disabled, on entry the state of the interrupt enable flip flop is pushed on the stack so the interrupts can be re-enable or not re-enabled on return. The generated code ignored this additional word on the stack and read everything from wrong local variables. By chance the address of the destination buffer was falsely taken from the i2c eeprom reading start address, which was 0, and so reading the eeprom overwrote ram from address 0x0000 onwards. Clearly not a good idea.

Stager Electrics

I forgot to disable SDP in the eeprom and had to add another 11 minutes after eeprom verification failed…

crc-16

I want to download new rom images to the Z80 system so that the system can reprogram itself, which takes 5 seconds (at most) and not 11 minutes. The current speed on the serial port is 9600 Baud, which means 960 bytes can be transmitted per second which means after roughly 17 seconds 16 kB are transmitted (which is the current rom size) or at most 34 seconds to overwrite the whole 32 kB of the eeprom. For the "protocol" i decided after some pros and cons to just wrap the rom image with a 2-byte start and stop prefix/postfix and to add a crc checksum for error detection.

I already have a CCITT crc-16 implementation in C at hand and googled for a Z80 version which was quickly found. Then i did some tests to compare the result and found ... nothing in common.

Ok, there are crc-16 and crc-16 and crc-16 and they are all different.

Let's look at the c implementation:

uint crc16_ccitt( uint8 const* q, uint count, uint crc )
{
   while(count--)
   {
      for( uint c = 0x0100/*stopper*/ + *q++; c>1; c >>= 1 )
      {
         crc = (crc^c) & 1  ?  (crc >> 1) ^ 0x8408  :  (crc >> 1);
      }
   }
   return crc;
}

And the Z80 version converted to a c function for easy understanding:

uint crc16_z80( uint8 const* q, uint count, uint hl )
{
  while(count--)
   {
      uint8 b;        
      hl ^= *q++ << 8;
      for(b=0; b<8; b++)
      {
         if((signed int)hl < 0) hl = (hl<<1) ^ 0x1021;
         else                   hl = (hl<<1);
      }
   }
   return hl;
}

First chance to make a difference is the input value for the crc. This must be 0xffff for the CCITT version and then the function may be called repeatedly to update the CRC as bytes arrive. Off course i called them both with the same starting value. Check.

Next you see that both functions use different polynomials: 0x8408 and 0x1021. Off course they must be the same to produce the same result, and they _ARE_ the same: The c function shifts bits from left to right, the z80 version from right to left, so they just work bit-reversed. Check.

Ok, they work bit-reversed when compared to each other. So the result must be bit reversed. But even when reverting one result the CRCs are completely different.

So what's the difference?

The bytes read from the data buffer must be bit reversed as well (in any one function) to make all data bit-reversed, then the result (of any one function) can be bit reversed and then they will be actually identical!

The fully bit-reversed version of the first function looked like this:

#define  R1(N) ((N<<7)&0x80)+((N<<5)&0x40)+((N<<3)&0x20)+((N<<1)&0x10) + \
               ((N>>7)&0x01)+((N>>5)&0x02)+((N>>3)&0x04)+((N>>1)&0x08)
#define R4(N)  R1(N),R1((N+1)),R1((N+2)),R1((N+3))
#define R16(N) R4(N),R4((N+4)),R4((N+8)),R4((N+12))
#define R64(N) R16(N),R16((N+16)),R16((N+32)),R16((N+48))
uint8 rev[256] = { R64(0), R64(0x40), R64(0x80), R64(0xC0) };

uint16 crc16r( uint8 const* q, uint count )
{
  uint crc = 0xffff;
  while(count--)
  {
    for( uint c = 0x0100 + rev[*q++]; c>1; c >>= 1 )
    {
      crc = (crc^c) & 1 ? (crc >> 1) ^ 0x8408 : (crc >> 1);
    }
  }
  return rev[crc>>8] + (rev[crc&0xff]<<8);
}

Now i have a C and a Z80 implementation for a CRC-16 checksum which work identical. `:-)

Note: To calculate the CCITT CRC-16 checksum with the first function, calculation must be started with CRC = 0xFFFF and the final CRC must be complemented. Then all sources say that you must swap the low and high byte. But that's not true, or, that's not the point. Whether you must swap the bytes depends on how you read the CRC from the data stream and what byte order your computer uses. I believe that the low byte is transmitted first. (to be tested somehow & somewhen…)

The Z80 version calculates the CRC-16 used in the XMODEM file transmission protocol. Here the CRC must be initialized with 0x0000, the final CRC must not be complemented and the high byte is sent first.

Stager Electrics

I forgot to disable SDP in the eeprom and after programming eeprom verification failed and i thought it was defective now…

Queues

I use a nice design for queues (in the sio implementation) which avoids the need for locks (or mutexes).

#define busize 64  // 2^N
#define bumask busize-1

uint8 bu[busize];
uint  ri;          // read_index
uint  wi;          // write_index

Normally writing to a queue works like this:
(I'll only describe writing, reading is similar.)

bu[wi++] = mybyte;
wi &= bumask;

Drawback:

You cannot distinguish between a full and an empty buffer, so you fill it up to at most busize-1 bytes.

This can be helped:

bu[wi++ & bumask];

Now the buffer is empty if wi==ri and full if (wi-ri)==busize.
ri and wi will at some time overflow but the integer arithmetics remain valid.

As not obvious, this implementation needs locking: wi is incremented before the byte is written and the buffer reader could interrupt between wi++ and writing the byte into the buffer, and read the not yet written byte. But this can be remedied like this:

bu[wi & bumask]; wi++;

Now the byte is stored first and then the write pointer is incremented, "releasing the semaphore".

how long can it take to spot an error?

These are the data structs containing the data for each channel:

struct SioData 
{ 
  bool  hw_handshake; 
  uint8 sw_handshake;   // bit.0: enabled  
  uint8 clk_handshake;  // bit.0: emit TX clock
  uint8 device;         // select mask
  uint8 channel;        // 0 = channel A; 1 = channel B
  uint8 baudrate;       // baudrate / 2400

  uint8 ibuwi;          // input  buffer write index
  uint8 iburi;          // input  buffer read index
  uint8 obuwi;          // output buffer write index
  uint8 oburi;          // output buffer read index

  uint8 ibu[ibusz];     // input  buffer
  uint8 obu[obusz];     // output buffer
};

These are two actual implementations in in my sio source:

uint sio_avail_in(struct SioData* channel)  
{ 
  return channel->ibuwi - channel->iburi; 
}
uint sio_avail_out(struct SioData* channel) 
{
  return obusz - (channel->obuwi - channel->oburi); 
}

Nice! :-)

And both wrong. :-?

When i tested transmission of data from my Mac to the Z80 system, i only got transmission errors. The Z80 system received all data when CoolTerm was at 50 .. 80%. I suspected CoolTerm. I suspected the USB-RS232 driver software. (Which actually _IS_ pretty buggy.) I suspected sdcc. I scrutinized the Z80 assembler interrupt routine. I examined the test routine itself. (A common place. Actually i started here… ;-)) I examined gets(…), which receives all available data into a buffer and which is written in C. I examined sio_avail_in(…). Not only once … My source and what sdcc compiled. And sio_avail_in(…) was buggy. But it took me hours to see the error. Do you spot the error? C'mon, it's only one line of code. A single subtraction of two values…

sdcc could produce really fast code. could…

I have written several versions of the CRC routine, two (similar versions) in Z80 and some in C. I timed them and i got interesting results.

CRC-16 ZMODEM of rom (asm1) dt=1180 ms
CRC-16 ZMODEM of rom (asm2) dt=1430 ms
CRC-16 CCITT  of rom (c)    dt=9500 ms
CRC-16 ZMODEM of rom (c)    dt=3020 ms

The C functions to calculate the XMODEM CRC is much faster than the function to calculate the CCITT CRC, though they both contain equivalent source.

That was with sdcc 3.4.

Due to the __critical error mentioned at the beginning of this post i looked for the latest version of sdcc. I thought, if i send in a bug report they'll surely complain that it's for version 3.4, which is 2 years old.

So i looked for the latest version: Version 3.5, which is 10 months old. (sigh).

It still had the __critical bug but i found the bug tracker and an entry for this bug: Fixed in 9'2015. Version 3.5 is from 6'2015. sigh…

So i searched and found the beta versions (more like nightly builds) and the latest OSX version was 13 minutes old. :-) It no longer has the __critical bug (tested), needs some other includes (copied) and produces slightly larger code. And i ran the CRC test again: (rom now slightly bigger)

CRC-16 ZMODEM of rom (asm1) dt=1200 ms
CRC-16 ZMODEM of rom (c)    dt=8500 ms

The C routine is now nearly 3 times slower?

So i reverted to sdcc 3.4 and reinstalled my workaround for the __critical bug…

    ... Kio !

p.s.: @ Google: The editor is crap. could you please fix it?

---- SPOILER WARNING ----

p.p.s.: the read and write indexes in the sio struct are (unsigned) bytes.
When they are subtracted in sio_avail_*(…) they are extended to 2-byte values.
If the write index has already overflowed and the read index not, then the difference is not limited to 8 bits as expected but the high byte of the result is 0xFF.


No comments:

Post a Comment