2016-04-10

Z80 Microcomputer with SRAM and K1-Bus

After suspending the project for a while, i'm now back to it. This is an update to the current state.

Hardware

There were some errors in the circuit, which i could fix. The V2.0 Eagle file on my website already contains these fixes.

2016-04-08 fixed board
Blue wires: A6 and A7 are used (beside A5) to select the target of an i/o operation. One of these is the access to the i2c bus on the k1 bus, where also my debugging LEDs are attached. When i2c is selected in an i/o operation, then A6 and A7 are used to set the i2c data and clock lines. – But wait, A6/7 are used to select i2c operation AND to select something within the i2c operation? Merde… So i rerouted the i2c lines to use A3 and A4 instead.
Yellow wires: As you can see by a look at the ram/rom address decoder in the last post, ram and rom selection is exchanged. First i fixed this by inserting eeprom and ram into each other's socket, which is possible with the eeprom, but not with the eprom.
For my (e)eproms i use a programmer from Stager Electric, Shenzhen, China. If you ever see something made by Stager Electric: run as fast as you can! It takes ~11 minutes to write a few bytes into an eeprom. The application has an option to "disinterest blanck" and eventually in the next version (which i never saw) it even worked… So it always programs full 32k and, since even this can be done in less than 32768/64*10ms = 5.12 seconds, and it actually takes 11 minutes, which is more than 100 times longer, i presume the eeprom is programmed byte by byte, making sure that it's write endurance of 10000 is in reachable distance... So i wanted to use eproms and fixed the circuit. Programming eproms is even faster as well.
Component side: A minor glitch is on the component side: I have carefully engraved "E" and "EE" for the eprom/eeprom selection pin header into the copper layer, and again, did it wrong: exchanged, as always…

Software

I was playing a little with my c-style compiler to add a Z80 target, and found: the Z80 is really bad suited to implement anything a compiler might try to create. Too few registers which frequently have special features. Deploying the second register set is near impossible. Using index registers is painfully slow. (you already knew that) Local variables on the stack are a pain to access.

Basically you have the choice to generate real machine code, which is not only slow but bloated as well, and some kind of Forth-style virtual code, which is short but even slower.

I finally came to the "fastest possible Forth-style" code model, which i will pursue later: It uses a jump table and opcodes which are 1-byte index into this table; which is faster (and shorter) than using 2-byte addresses in the program as Forth implementations typically do. Drawback: i need the table and the table can contain only ~256/3 entries. So there must be "prefix" opcodes which then are slower.

The jump table looks like this:

vector: macro $NAME
$NAME:: equ $ - vtable 
jp _$NAME 
endm                          

; ------------------------------------

vector RESET  ;( -- ) 
vector SHELL  ;( -- ) 
vector NATIVE ;( -- ) 
vector ABORT  ;( uint -- )   

vector MODs   ;( n n -- n ) 
vector DIVs   ;( n n -- n ) 
vector MODu   ;( n n -- n ) 
vector DIVu   ;( n n -- n ) 
vector MUL    ;( n n -- n )  

vector JP1    ;( n $dest -- )
vector JP0    ;( n $dest -- )
vector JP     ;( $dest -- )  

and so on. You see, each entry is a JP opcode (by virtue of the macro), but "inline" code in the table is sometimes possible as well, e.g. if a variant of an opcode just needs a short mockup of the arguments, it's code can be put directly in the table before the other opcode, where it simply runs into. It's a trade-off of used space and gained speed.

A typical "word" looks like this:

_SUB: ;( n1 n2 -- n )           

pop hl     ; hl=n1 
and a      ; de=n2 
sbc hl,de  ; hl=result
ex de,hl   ; de=result
next                      

where next is a macro:

; fetch next virtual opcode and jump to handler
; 
next: MACRO 
ld h,hi(vtable) 
ld a,(bc) 
inc bc 
ld l,a 
jp hl 
ENDM                 

An alternative is to jump to any implementation of macro next, which is slightly slower (10 cc for the jump) but also shorter (just 3 bytes). If it can be done in a relative jump, then it's even shorter (2 bytes) and even slower as well…

As you can see i use register pair BC for the virtual program counter and DE as result register, which frees HL so that machine coded sub routines can pop the return address into HL, do some work, e.g. pop arguments, and finally return via JP HL, which is not possible if you use HL as result register.

If an opcode implementation does not modify the h register, then it does not need to reload h with the high byte of the vtable address. There are actually some (few) opcodes which can exploit this additional speed boost. :-)

As you can see, the interpreter reads just one byte from the program and jumps into the vtable which contains jumps to the actual implementation of the virtual opcodes. This is faster than reading 2 bytes from the program, the program is shorter, but i need the tables and implementations for all opcodes.

The alternative i'm currently working with – because the Z80 backend of my compiler is not yet completed – is sdcc, the "Small Devices C Compiler", which has a Z80 backend. I can really tell that the generated code is bloated, and sometimes suboptimal, the syntax of the generated code is "unusual" and sometimes the compiler even crashes for me. Especially when i use the "<<" operator.

Here an example of what sdcc generates:

;/Firmware-Sdcc/sio.c:394: if(this->clk_handshake)
8520: DD7EFE   ld  a,-2 (ix)
8523: DD77FB   ld  -5 (ix),a
8526: DD7EFF   ld  a,-1 (ix)
8529: DD77FC   ld  -4 (ix),a
852C: DD6EFB   ld  l,-5 (ix)
852F: DD66FC   ld  h,-4 (ix)
8532: 23       inc hl       
8533: 23       inc hl       
8534: 6E       ld  l,(hl)   
8535: 7D       ld  a,l      
8536: B7       or  a, a     
8537: 2814     jr  Z,00106$ 

The first line (the comment) is the compiled source line. As you can see the compiled code reads a word from (ix-2), which seems to be 'this' ( a valid variable name in C ;-) ) and stores it at (ix-5) which seems to be a scratch cell and immediately reads it back into HL. Then it reads the desired value into l and immediately moves it into a for testing. A wonder of elegance. (note: the scratch value is not used anywhere later, l is used later, but while the value in a is still valid too.) 

Current State of the Project

Current setup
Last and this weekend i refitted all hardware, which is the CPU board, as SIO board and a (not yet tested) IDE board, as can be seen to the left, and hooked it up to a regulated power supply. Current consumption is pretty low, as it's all CMOS: only 50 to 80 mA (depending on how many LEDs are lit) for all three boards, including a 96MB IDE flash rom (hiding between the IDE and the SIO board) and a 2.5GB compact flash size hard drive (sticking out from the rear side so you can't see it as well).
Slowly iterating from one broken software step to the next, regularly erasing and reusing my eproms and finally even testing some steps in the emulator (erm, yes, i have written an emulator for the system too, using my Z80 emulation from zxsp and the SIO and a LCD display emulation from my K1 CPU project) i finally got the first text message from the board. I have attached the SIO port A to a RS232-to-USB converter and use CoolTerm on my Mac to receive the messages. I stepped back to an old version of CoolTerm, as the current versions very quickly use 100% of one CPU core. 
For the SIO software i use a simplified approach: The SIO ports are polled on the system timer interrupt (which is generated by the UART as well) which is currently 100 Hz. The UART has 16 byte fifo queues, so 100 Hz is way enough for 9600 Baud. But i'll probably go up to 200 Hz for 19200 Baud at least. As sending data works, the system interrupt works as well. 
Idle CPU usage for this interrupt is approx. 2% (calculated), and will be ~4% with 200 Hz, if i don't find a better solution. On the photo above you can see that the red LED in front is lit. This LED indicates WAIT state and currently the CPU waits approx. 98% of the time. (Or 96%, as it's also sending some text through some ugly compiled c code…) So i can say, this LED works as well. :-)
Today i have tested writing of data into the eeprom. (actually only detecting whether it's an eeprom or not, but that's quite similar.) This works too. 

Next steps: 
  • Actually write some bytes into the eeprom    ➞ done 2016-04-11
  • test reading of the i2c eprom on the SIO board
  • test writing to the i2c eeprom
  • receive data from the SIO port
  • receive program data from the sio port and write it into eeprom.
  • lock away the Stager Electrics programmer. ;-)
Final question is: what should i do with the board? hm hm…

Stay tuned. 

p.s.: Today i wrote a test message into the eeprom. Off course it did not work right from the start – it crashed because the destination space was too short, and behind that in the eeprom was the sio interrupt handler which was then partly overwritten.
I also tried to overwrite the eeprom with the Stager Electrics programmer, – which could not overwrite it. It took 11 minutes to write, and after that verify failed. I had expected this: Off course the programmer cannot deactivate software data protection in the eeprom. And it can't erase the device as a whole. Luckily i have more than one of these eeproms. And i already have a plan to make them writable again (else they'd be nice ceramic bricks): I can insert them into the ram socket and write a short eprom which does the job…

No comments:

Post a Comment