Kio's Hardware Projects

Hit the Ceiling – Going Virtual

2016-05-29T20:10:00.000+02:00

Hit the Ceiling – Going Virtual

End April i hit the ceiling. I'm very tall, but that's not the reason – the code size for the z80 system reached 32 kBytes.
I was working on the file system and when it was in a state where it compiled – just half way done – the resulting rom size was only a tiny amount below 32 kB.

So what could i do?

I could remove all test code, and then i could use z88dk which allegedly creates slightly smaller code, but that would probably not really help: I'd just die later.

I could write everything in assembler.

Or i could finish the z80 backend for Vcc, my 'virtual code compiler'.

I couldn't decide on whether to create real z80 code or virtual code for a Forth-style interpreter. So i implemented them both, mostly. While programming i made some measurements.

Compare virtual code with native z80 code, running the opcode test program:

                virtual code    z80 code
total rom size  10935           11455 bytes
= code blob     5811            4428 bytes  
+ test code     5128            7027 bytes
time            3.630s          2.376s

In this not representative program the z80 code is 37% bigger and 35% faster than virtual code. (The 'code blob' is the support library; 'test code' is what grows.)

I have also compiled my serial driver in various versions.

sdcc                1974 bytes
Vcc z80 code        1520 bytes
Vcc virtual code    1169 bytes

The z80 code generated by Vcc is 23% shorter than that of sdcc, and the virtual code is even 40% shorter than sdcc z80 code or 23% shorter than Vcc z80 code.

When i worked on the z80 backend, i was a little bit frustrated about the little amount of code size reduction i could achieve, though i used all 'illegal' tricks, e.g. i use the RST opcodes for the most frequent building blocks to reduce code size.

The code shrink of approx. 25% is simply not enough, because it does not take into account the size of the static support code blob. This is currently at 4428 bytes and i expect a final size of around 8 kB, after adding all int32 code and if i leave out floating point. That is 25% of the rom size of 32 kB. So before i actually save space, the code size must be reduced by at least 25%. And this looks like the maximum i can achieve with my z80 backend. (though 'nothing saved' is only true for code in rom. Any program loaded into ram will see the full size reduction. And i neglect that sdcc pulls in some library code as well…)

The code shrink of 40% of the virtual code version looks much better, though it will have a slightly larger support code blob. And it will come at a price: Speed…

Before i go into details here a comparison of the compiler outputs of a simple function:

Vcc:

uint8 avail_out(SerialDevice¢ channel) 
{ 
    return obusz - (channel.obuwi-channel.oburi); 
}

uint8 sio_avail_out(SerialDevice* channel) 
{ 
    return obusz - (channel->obuwi - channel->oburi); 
}

This function determines how many free space is left in a sio output buffer. The Vcc function is a member function. 'channel' is a struct, 'obuwi' = output buffer write index, 'oburi' = output buffer read index, 'obusz' = output buffer size. I hope you get it.

sdcc: In the case of such a short function, sdcc creates very good code. But don't be fooled: if it can no longer keep everything in registers, the code becomes ugly… So this is actually not a representative example for sdcc. [25 bytes total]

_sio_avail_out::
    pop    de            ; return address
    pop    bc            ; 'channel'
    push   bc            ; everything back:
    push   de            ;     caller is responsible for cleaning up the stack…
    push   bc
    pop    iy            ; iy = 'channel'
    ld     e,15 (iy)
    ld     l, c          ; superfluous
    ld     h, b          ; superfluous
    ld     bc, #0x0010   ; load into hl instead
    add    hl, bc
    ld     c,(hl)
    ld     a,e
    sub    a, c
    ld     c,a
    ld     a,#0x40
    sub    a, c
    ld     l,a
    ret

hand-coded assembler: This is for the Vcc memory model with 'handles', so i must dereference a pointer to a pointer to the struct data. And as i see by the last instruction, it's for the virtual code machine: [19 bytes total]

sio_avail_out::          ; in: de -> -> channel    
    ex     hl,de         ; hl -> -> channel
    ld     e,(hl)
    inc    hl
    ld     d,(hl)        ; de -> channel
    ld     a,obusz       ; a=obusz
    ld     hl,obuwi
    add    hl,de         ; hl -> channel.obuwi
    sub    a,(hl)        ; a=obusz-obuwi
    inc    hl            ; hl -> oburi
    add    a,(hl)        ; a=obusz-obuwi+oburi
    ld     e,a
    ld     d,0           ; out: de = return value
    jp     next          ; jump to next opcode

Z80 code created by Vcc. It's an early state and there are some optimizations left. It looks poor when compared with the sdcc generated code, but as already said, things become different for functions with more than one line of code. Then this code is still representative but sdcc looks poor too. The first line is a program label, though a little bit longish. :-) But if you compare it with the function's signature then it hopefully makes sense. [total 36 bytes]

SerialDevice.avail_out__12SerialDeviceC_5uint8:
    pop     hl           ; move the return address to the VM's return stack
    call    pushr_hl    
    rst     ivalu8       ; push obusz: 'ivalu8' = immediate uint8 value
    db      64
    push    de
    ld      l,2+2        ; get local variable 'channel'
    rst     lget         ;    'lget' = get local variable
    ld      hl,15        ; get item 'obuwi' at offset 15
    rst     igetu8       ;    'igetu8' = get uint8 struct item
    push    de        
    ld      l,4+2        ; get local variable 'channel'
    rst     lget    
    ld      hl,16        ; get item 'oburi' at offset 16
    rst     igetu8
    pop     hl        
    and     a            ; subtract obuwi - oburi
    sbc     hl,de
    ex      hl,de
    pop     hl
    and     a            ; subtract obusz - (obuwi - oburi)
    sbc     hl,de
    ex      hl,de    
    pop     af           ; discard 2nd value on stack (the 'channel') 
    jp      return       ; get back the return address and return

Virtual code created by Vcc with minimum optimization: [29 bytes total]

SerialDevice.avail_out__12SerialDeviceC_5uint8:
    rst  p_enter         ; the proc is entered in z80 code: switch to virtual code
    dw   IVAL, 64        ; push obusz
    dw   LGET            ; get local variable 'channel'
    db   2
    dw   IGETu8          ; get item 'obuwi' at offset 15
    db   15
    dw   LGET            ; get local variable 'channel'
    db   4
    dw   IGETu8          ; get item 'oburi' at offset 16
    db   16
    dw   SUB             ; subtract obuwi - oburi
    dw   SUB             ; subtract obusz - (obuwi - oburi)
    dw   TOR             ; nip 2nd value on stack (the 'channel') 
    dw   DROP            ;     by temporarily moving the top value to the return stack
    dw   FROMR           ;    and droping the 'channel'
    dw   RETURN

Virtual code created by Vcc after proper optimization: [20 bytes total]

SerialDevice.avail_out__12SerialDeviceC_5uint8:
    rst  p_enter
    dw   IVALu8          ; uint8 opcode with 1-byte argument
    db   64
    dw   OVER            ; instead of LGET 2
    dw   IGETu8        
    db   15
    dw   OVER2           ; instead of LGET 4
    dw   IGETu8
    db   16
    dw   SUB
    dw   SUB
    dw   NIP0RETURN - 1  ; nip one value (the 'channel') and return

One astonishing difference between z80 code and virtual code is: optimization.

When you optimize z80 code, the following equation is true:

codesize = speed

The bigger your code, the higher the speed. Every effort to increase speed results in bigger code.

When you optimize virtual code, this equation is true:

codesize = 1 / speed

Whenever you reduce code size, the speed goes up. This is because the standard method to optimize virtual code is to create 'combi opcodes' for frequently occurring opcode pairs, which eliminates one opcode fetch. As a result it is much more fun to optimize virtual code because you are rewarded twice. :-) Though caveat: the size of the support code blob grows! :-(

One of the most useless features in C

2016-04-27T21:29:00.001+02:00

While writing code for a "file descriptor" which imposes an array of function pointers, i stumbled over a "problem" which i first thought was an error in sdcc. In order to report the error i simplified the source until it only consisted of these 4 lines:

typedef int (*MyFPtr)(struct Data*);
struct Data { int a; };
extern int bar(struct Data* f);
MyFPtr foo = bar;

➜ I make a typedef for a function pointer, because function pointers are so awkward in c (though they are a pretty compared to function pointers in c++ ... which resulted in the invention of the data type 'auto' ...)
Then at some point i actually define the struct.
Later i declare a function which matches the typedef.
Finally i try to assign this function to a function pointer variable.

compiling this source resulted in an error for the last line:

/foo-1.c:4: error 78: incompatible types
from type 'unsigned-int function ( struct Data generic* fixed) __reentrant fixed'
  to type 'unsigned-int function ( struct Data generic* fixed) __reentrant fixed'
/foo-1.c:5: error 78: incompatible types
from type 'unsigned-int function ( struct Data generic* fixed) __reentrant fixed'
  to type 'unsigned-int function ( struct Data generic* fixed) __reentrant fixed'

btw.: ignore the double error. Error messages in sdcc are always a little bit suboptimal.

This looked as if the compiler had a problem to see that too identical types are identical.

And indeed they aren't.

As i learned from my bug report, the first line implicitly declares a local data type. Local to – yes, i don't know exactly to what. But it's local. And so it's different to the later and globally defined struct.

One suggested solution was:

typedef unsigned int (*T)(struct Data*);
extern unsigned int foo(struct Data* f);
T bar = foo;

which compiled without error. But this now was actually an error in sdcc: This source is wrong too:
Line 1 and 2 both declare local data types which by that are different. Line 3 shouldn't work. That there was an error could be proved by actually trying to use the function pointer typedef:

typedef unsigned int (*T)(struct Data*); // local data type
extern unsigned int foo(struct Data* f); // local data type
T bar = foo;                             // works in sdcc but shouldn't
struct Data { unsigned int a; };
int main() { struct Data d = {0}; foo(&d); } // rejected

This lead me to the question:

For what is the implicit declaration of a local data type in a function's argument list good, anyway? I can't think of a real use case. It's near impossible to call such a function. You have to cast the function to a function which accepts the data type you actually have, because you cannot even cast your data to the local data type...

Second, it's just a pitfall: If you declared the data type before the typedef and before the function declaration or definition, then the global data type is used. If you didn't, then a local data type is used. Imagine a local variable a in a function body was local only if there was no global variable a defined before…

Firmware Download and Access to IDE Board

2016-04-20T19:55:00.001+02:00

Two topics in this post:

Firmware download
Access IDE board, IDE and CF devices

Firmware download

After my explorations into CRC generation, i worked on the firmware download code. This code has to run in RAM, because the eeprom can't be read while it is busy writing a block of data into it's cells.

I tried to keep things simple, and so the program flow looks like this:
write_eeprom.s

1. wait for SIO output to become empty
2. disable interrupts
3. receive magic header bytes
   if they are wrong: bail out
4. copy code from rom into ram
   jump to 6.

in ram:
5. Retry: receive bytes until magic header detected
6. in a loop:
7.    receive 64 bytes of data (last block may be shorter)
      update the crc after each byte
8.    write block of data into eeprom
9.    wait while eeprom busy
10. loop
11. receive and check crc:
    error: print a message, flush input, wait for a key and retry at 5
    ok:    print a message, flush input, wait for a key and reset

Already complicated enough.

ad 1: I wait for the SIO output to become empty, because there may be (and typically are) some bytes left in the output buffer, and as soon as i disable interrupts they will never be sent. This resulted in truncated "last messages".

ad 7: The program receives blocks of 64 bytes, which is the eeprom's block size, writes them into the eeprom and reuses the receive buffer for the next block. It does not keep the bytes around until all bytes are received, so the whole eeprom can be reprogrammed, though i have only 32k of RAM minus approx. 1k for code and buffers available.

The program actively polls the UART and retrieves the bytes as soon as they pop up in the UART queue. Then it immediately updates the CRC, so no extra loop is needed for CRC calculation.

The data is written into the eeprom despite the CRC is not yet checked, which can only be done at the end of the transmission.

ad 9: Waiting for the eeprom takes up to 10 ms (acc. to Atmel docs). During this time i do not poll the UART, for simplicity. And i do not use any flow control. Currently the UART is programmed to 9600 Baud, which means 960 characters per second or 9.6 characters per 10 ms. The UART has an input queue of 16 bytes: The UART is doing my job! :-)

ad 11: Up to now i never had a CRC error.

Overall workflow is now like this:

1. Work on the source, compile & assemble the rom.
2. Launch the new rom in the emulator
   The emulator silently creates a download file from the bare rom file.
3. Connect to the board (most times CoolTerm _is_ connected…)   
4. Select "download firmware" from the menu presented by the board
5. upload a "Textfile" in CoolTerm
6. type A letter to reboot

The upload including writing to the eeprom now takes exactly as long as uploading the rom itself: 1 second per 960 bytes. Currently roughly 20 seconds.

Kio: "See Stager, that's how it works!"

Next i made a modification to the rom, uploaded it and found, that it no longer booted.

Stager: "See Kio, you still need me…" :-(

Access the IDE board

Now that turn-around cycles are much faster and less painful, i started first tests to access the IDE board.

I talked to the i²c eeprom on the board... and it answered! :-)

Then i collected all my knowledge about IDE, which mostly centers around the DivIDE emulation in zxsp, and made up my mind on what to send to and read from the IDE device at first.

The test program looks roughly like this:

1. disable interrupts (in c: you remember the __critical bug?)
2. select the IDE board
3. read status and error register (from master, which is selected after reset)
4. print something
5. wait for ready and !busy in the status register
6. check that the device is not unexpectedly waiting for data
7. finally: issue command "IDENTIFY"
8. wait for data request in the status register, bail out on unexpected state
9. read 256 words into a sector buffer
10. read status and error register
11. print the sector data in hex
12. inspect and print various fields in the sector data

ad 1: Currently i'm back to the sdcc 3.6 nightly build, though it produced the so much slower code than version 3.4.

ad 2: As you may know, if you have followed this blog for the last years B-) boards attached to the K1 bus must be "selected" and from that on any i/o goes to that board. This really is a nice method and i'm really happy with it. I just must make sure that i'm not interrupted while i'm working with a device, as the interrupt (sic!) will probably leave some other board selected…

ad 9: I really read 256 words, not 512 bytes. (Though, in essence off course i do read 512 bytes.) The K1 bus is a 16 bit bus and the Z80 board contains two 8-bit buffers for sending and receiving the high word. Then reading 16 bit values works like this:

Implementation of a function for c:

;  uint16 in_w( uint8 addr ) __z88dk_fastcall;
;
_in_w::
    ld    a,l            ; a = register address
    or    a,k1_rd_data   ; add bits to access the bus
    ld    c,a            
    in    l,(c)       ; read the low byte
    ld    c,k1_rd_hi     ; access the high-byte register
    in    h,(c)          ; read the high byte from the high-byte register
    ret

The function is marked as __z88dk_fastcall, which is really funny. z88dk is the (only?) competitor of sdcc in the field of Z80. __z88dk_fastcall means, that the argument to the function, which must have exactly one argument, is passed in l, hl or hlde, depending on size, and not on the stack. In my opinion this should be the default.

PQI DOM, CF cards and Seagate ST1

For some reason it first didn't work, but then, out of a sudden, i got good looking data from a device. The only thing i did to make it happen was, to scrutinize the circuit diagram for errors. And as soon as i could prove there was no error, reality was modified to match my expectations. Check. ✓

Off course the ascii texts of model name etc. were byte-swapped in the first version. Probably most people fall into this pitfall. The 4-byte values were calculated wrong in one place but correctly in another. After a few iterations the output from the "built-in" PQI DiskOnModule looked like this:

$00: 045A 02EE 0000 0008 0000 0210 0020 0002 EE00 0000 2020 2020 2020 2020 2020 2020
$10: 2020 2020 2020 2020 0002 0002 0004 6462 3031 2E32 3061 5051 4920 4944 4520 4469
$20: 736B 4F6E 4D6F 6475 6C65 2020 2020 2020 2020 2020 2020 2020 2020 2020 2020 0001
$30: 0000 0200 0000 0200 0000 0001 02EE 0008 0020 EE00 0002 0100 EE00 0002 0000 0000
$40: 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
     ...
$F0: 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000

model name = PQI IDE DiskOnModule                    
serial number =                     
firmware revision = db01.20a
fixed disk
ATA version = 0
LBA supported
Default capacity (sectors) = 192000
default C/H/S = 750/8/32
default capacity (sectors) = 192000
current C/H/S = 750/8/32
current capacity (sectors) = 192000
device supports PIO mode 3 or DMA mode 1 or above

Next reading from the slave, a Seagate ST1 hard disk in the CF card slot, did not work. The ST1 always reported !ready.

I thought there could be a severe problem with the pin assignment of the CF card slot, but after double checking, it was ok. Next i suspected a missing pull-up on the /CS line, which discriminates between master and slave, but these inputs have an internal pull-up.
Then i checked the ST1: with the help of an USB-CF-Card adapter i attached it to my Mac and it spined up. I watched some very cool short videos which i had saved on the drive a few years ago:

Do ya know any of them?

Then i rummaged around for some Compact "Flash" cards and came up with a 256 MB and a 16 MB one (the latter one i actually don't own. Hi Axel, do you miss your 16 MB card? I just found it… :-))

I tested them.

And they worked:

$00: 848A 02B7 0000 000F 0000 0200 0030 0007 A2B0 0000 5830 3130 3220 3230 3033 3130
$10: 3237 3032 3536 3137 0002 0002 0004 5265 7620 332E 3030 4869 7461 6368 6920 5858
$20: 4D32 2E33 2E30 2020 2020 2020 2020 2020 2020 2020 2020 2020 2020 2020 2020 0001
$30: 0000 0200 0000 0100 0000 0001 02B7 000F 0030 A2B0 0007 0100 A2B0 0007 0000 0000
     ...
$F0: 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000

model name = Hitachi XXM2.3.0                        
serial number = X0102 20031027025617
firmware revision = Rev 3.00
removable medium
ATA version = 0
LBA supported
Default capacity (sectors) = 500400
default C/H/S = 695/15/48
default capacity (sectors) = 500400
current C/H/S = 695/15/48
current capacity (sectors) = 500400
device supports PIO mode 3 or DMA mode 1 or above

and

$00: 848A 00F4 0000 0004 4000 0200 0020 0000 7A00 0000 3932 3130 3336 3230 3130 3938
$10: 3939 3039 3132 3831 0002 0002 0004 5631 2E30 3220 2020 4C45 5841 5220 4154 4120
$20: 464C 4153 4820 2020 2020 2020 2020 2020 2020 2020 2020 2020 2020 2020 2020 0001
$30: 0000 0200 0000 0200 0000 0003 00F4 0004 0020 7A00 0000 0100 7A00 0000 0000 0000
     ...
$70: 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
$80: 0000 75D0 0075 A800 75B8 0078 0174 55F6 08F4 B800 FA08 F4F5 F0E6 B5F0 10F4 F5F0
$90: 08B8 00F5 7801 B455 E674 0001 A700 7455 01A7 90C0 5974 01F0 E4F0 90C0 2D8D E6A6
$A0: 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
     ...
$F0: 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000

model name = LEXAR ATA FLASH                         
serial number = 92103620109899091281
firmware revision = V1.02   
removable medium
ATA version = 0
LBA supported
Default capacity (sectors) = 31232
default C/H/S = 244/4/32
default capacity (sectors) = 31232
current C/H/S = 244/4/32
current capacity (sectors) = 31232
device supports PIO mode 3 or DMA mode 1 or above
device supports Ultra DMA

But the ST1 refused to become ready.

I played around with the master/slave setting and found, that the "jumper" on the PQI module probably enforced master for the device. This had to be taken into account when changing the master/slave jumper on the IDE board. (I first thought it had something to do with power supply, because this is an IDE module and the IDE bus normally provides no power, but i was wrong.)

Finally i jumpered the CF card adapter as master, pulled the master "master" jumper on the PQI module, and the CF card and the PQI module still answered, with roles swapped, and, unbelievable, the Seagate ST1 answered too! So, to make the ST1 work, i need to set the CF card adapter – the "removable" medium – to master and the "fixed" PQI module to slave? I tried with an empty CF card slot and the PQI module still answered - as slave. Is this IDE standard? (Actually i know very little about CF, IDE and so on, the official documents cost money for download or membership. So i get what can be found with the help of aunt Google.)

$00: 848A 12ED 0000 0010 7E00 0200 003F 004A 8530 0000 2020 2020 2020 2020 2020 344D
$10: 4430 3433 4B53 2020 0003 0100 0004 332E 3034 2020 2020 5354 3632 3532 3131 4346
$20: 2020 2020 2020 2020 2020 2020 2020 2020 2020 2020 2020 2020 2020 2020 2020 8010
$30: 0000 0B00 0000 0200 0000 0007 12ED 0010 003F 8530 004A 0100 8530 004A 0000 0407
$40: 0003 0078 0078 0078 0078 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
$50: 0000 0000 7069 500C 4000 7069 100C 4000 0007 0000 0000 4040 0000 400D 8080 0000
$60: 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
     ...
$90: 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
$A0: 814A 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
$B0: 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
     ...
$F0: 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000

model name = ST625211CF                              
serial number =           4MD043KS  
firmware revision = 3.04    
removable medium
ATA version = 0
command sets supported = $7069 500C
command sets enabled   = $7069 100C
LBA supported
Default capacity (sectors) = 4883760
default C/H/S = 4845/16/63
default capacity (sectors) = 4883760
current C/H/S = 4845/16/63
current capacity (sectors) = 4883760
device supports PIO mode 3 or DMA mode 1 or above
device supports Ultra DMA
max sectors per R/W MULTIPLE = 16
CFA power mode = 33098

Data transfer to and from the IDE disks is not really fast with a Z80 processor, the best what i can get is:

ld      c,k1_wr_data
    ...
    ; send 1 word / 2 bytes in a loop 
    ; or unroll as long as you like    
    ld      e,(hl++)
    ld      a,(hl++)
    out     (k1_wr_hi),a    ; store byte in the high-byte register
    out     (c),e           ; send 1 word / 2 bytes to the device

This takes 25 cc per byte, or, if inc hl can be replaced with inc l only, 21 cc; plus loop and setup overhead. The system has a clock frequency of 6 MHz, which means i can transfer 32 kB (the whole RAM) in 32k * 25 cc = 819200 cc or, at 6 MHz, in 1/7 sec. Ok, the CPU is slow, but the RAM is small as well. :-)

A single sector can be transferred in 512 * 25 cc = 12800 cc or, at 6 MHz, in 1/450 sec. This is also very fine because this means, that i can disable interrupts for a whole sector i/o without losing timer and SIO interrupts, as the timer interrupt is at 100 Hz currently. I could also increase the timer frequency to 300 or 400 Hz to allow a SIO speed of up to 38.4 kBaud, if i desired, but CPU time consumption will go up as well and with 100 Hz it's already around 2% with an idle SIO. And with very intelligent use of INI or OUTI a few cycles for at least one IN or OUT could be saved.

It should be noted that the CF cards can be operated in a byte-wide mode. Then INIR and OTIR should be usable and one byte transferred in 16 cc.

Conclusion

All boards work and now everything left to do is software.

p.s.: Still not yet written to the IDE devices. Surprise ahead? (Shiver…)

sdcc, crc and queues

2016-04-15T23:19:00.000+02:00

Hello,

this week i was busy working on my Z80 project. Though it always looks like moving in small circles, i made some progress.

In this post:

sdcc created broken code
Stager Electrics' programmer failed to program an eeprom
There is crc-16, crc-16 and crc-16
Stager Electrics' programmer failed to program an eeprom
Simple design of queues and how long can it take to spot an error
Stager Electrics' programmer failed to program an eeprom
sdcc could produce really fast code. could…

Stager Electrics

The code to detect whether i'm running on an e- or eeprom also write protects the eeprom (called SDP = software data protection) so that it cannot be overwritten when the program crashes. For that you just write certain bytes into certain addresses. Next i wrote a test message into the eeprom. This is done by writing the SDP sequence and then the bytes to program. Crashed at first as the destination buffer was calculated too small `:-) but worked on the second try.

Next i tried to overwrite an eeprom with the Stager Electrics programmer. Off course this did not work. It took 11 minutes to write the eeprom, and after that verify failed. The programmer knows the eeprom by name and manufacturer but cannot deactivate software data protection in the eeprom. And it can't erase the device as a whole. Luckily i have more than one of these eeproms.

In a later iteration of the rom i added code which, before doing anything else, tests whether an eeprom is inserted in the ram socket (you remember: they are pin-compatible); And if so, it disables software data protection and happily bails out with a blink code. Even later i added an option to disable SDP in the current eeprom. Now that Stager thing can program the eeproms again.

sdcc

I probably spent one full day (after work) on tracking down a not so reproducible crash when trying to read from the i2c eeprom on the SIO board. Finally i could prove it's an error in the C compiler. In a __critical function, which means that it is executed with interrupts disabled, on entry the state of the interrupt enable flip flop is pushed on the stack so the interrupts can be re-enable or not re-enabled on return. The generated code ignored this additional word on the stack and read everything from wrong local variables. By chance the address of the destination buffer was falsely taken from the i2c eeprom reading start address, which was 0, and so reading the eeprom overwrote ram from address 0x0000 onwards. Clearly not a good idea.

Stager Electrics

I forgot to disable SDP in the eeprom and had to add another 11 minutes after eeprom verification failed…

crc-16

I want to download new rom images to the Z80 system so that the system can reprogram itself, which takes 5 seconds (at most) and not 11 minutes. The current speed on the serial port is 9600 Baud, which means 960 bytes can be transmitted per second which means after roughly 17 seconds 16 kB are transmitted (which is the current rom size) or at most 34 seconds to overwrite the whole 32 kB of the eeprom. For the "protocol" i decided after some pros and cons to just wrap the rom image with a 2-byte start and stop prefix/postfix and to add a crc checksum for error detection.

I already have a CCITT crc-16 implementation in C at hand and googled for a Z80 version which was quickly found. Then i did some tests to compare the result and found ... nothing in common.

Ok, there are crc-16 and crc-16 and crc-16 and they are all different.

Let's look at the c implementation:

uint crc16_ccitt( uint8 const* q, uint count, uint crc )
{
   while(count--)
   {
      for( uint c = 0x0100/*stopper*/ + *q++; c>1; c >>= 1 )
      {
         crc = (crc^c) & 1  ?  (crc >> 1) ^ 0x8408  :  (crc >> 1);
      }
   }
   return crc;
}

And the Z80 version converted to a c function for easy understanding:

uint crc16_z80( uint8 const* q, uint count, uint hl )
{
  while(count--)
   {
      uint8 b;        
      hl ^= *q++ << 8;
      for(b=0; b<8; b++)
      {
         if((signed int)hl < 0) hl = (hl<<1) ^ 0x1021;
         else                   hl = (hl<<1);
      }
   }
   return hl;
}

First chance to make a difference is the input value for the crc. This must be 0xffff for the CCITT version and then the function may be called repeatedly to update the CRC as bytes arrive. Off course i called them both with the same starting value. Check. ✓

Next you see that both functions use different polynomials: 0x8408 and 0x1021. Off course they must be the same to produce the same result, and they _ARE_ the same: The c function shifts bits from left to right, the z80 version from right to left, so they just work bit-reversed. Check. ✓

Ok, they work bit-reversed when compared to each other. So the result must be bit reversed. But even when reverting one result the CRCs are completely different.

So what's the difference?

The bytes read from the data buffer must be bit reversed as well (in any one function) to make all data bit-reversed, then the result (of any one function) can be bit reversed and then they will be actually identical!

The fully bit-reversed version of the first function looked like this:

#define  R1(N) ((N<<7)&0x80)+((N<<5)&0x40)+((N<<3)&0x20)+((N<<1)&0x10) + \
               ((N>>7)&0x01)+((N>>5)&0x02)+((N>>3)&0x04)+((N>>1)&0x08)
#define R4(N)  R1(N),R1((N+1)),R1((N+2)),R1((N+3))
#define R16(N) R4(N),R4((N+4)),R4((N+8)),R4((N+12))
#define R64(N) R16(N),R16((N+16)),R16((N+32)),R16((N+48))
uint8 rev[256] = { R64(0), R64(0x40), R64(0x80), R64(0xC0) };

uint16 crc16r( uint8 const* q, uint count )
{
  uint crc = 0xffff;
  while(count--)
  {
    for( uint c = 0x0100 + rev[*q++]; c>1; c >>= 1 )
    {
      crc = (crc^c) & 1 ? (crc >> 1) ^ 0x8408 : (crc >> 1);
    }
  }
  return rev[crc>>8] + (rev[crc&0xff]<<8);
}

Now i have a C and a Z80 implementation for a CRC-16 checksum which work identical. `:-)

Note: To calculate the CCITT CRC-16 checksum with the first function, calculation must be started with CRC = 0xFFFF and the final CRC must be complemented. Then all sources say that you must swap the low and high byte. But that's not true, or, that's not the point. Whether you must swap the bytes depends on how you read the CRC from the data stream and what byte order your computer uses. I believe that the low byte is transmitted first. (to be tested somehow & somewhen…)

The Z80 version calculates the CRC-16 used in the XMODEM file transmission protocol. Here the CRC must be initialized with 0x0000, the final CRC must not be complemented and the high byte is sent first.

Stager Electrics

I forgot to disable SDP in the eeprom and after programming eeprom verification failed and i thought it was defective now…

Queues

I use a nice design for queues (in the sio implementation) which avoids the need for locks (or mutexes).

#define busize 64  // 2^N
#define bumask busize-1

uint8 bu[busize];
uint  ri;          // read_index
uint  wi;          // write_index

Normally writing to a queue works like this:
(I'll only describe writing, reading is similar.)

bu[wi++] = mybyte;
wi &= bumask;

Drawback:

You cannot distinguish between a full and an empty buffer, so you fill it up to at most busize-1 bytes.

This can be helped:

bu[wi++ & bumask];

Now the buffer is empty if wi==ri and full if (wi-ri)==busize.
ri and wi will at some time overflow but the integer arithmetics remain valid.

As not obvious, this implementation needs locking: wi is incremented before the byte is written and the buffer reader could interrupt between wi++ and writing the byte into the buffer, and read the not yet written byte. But this can be remedied like this:

bu[wi & bumask]; wi++;

Now the byte is stored first and then the write pointer is incremented, "releasing the semaphore".

how long can it take to spot an error?

These are the data structs containing the data for each channel:

struct SioData 
{ 
  bool  hw_handshake; 
  uint8 sw_handshake;   // bit.0: enabled  
  uint8 clk_handshake;  // bit.0: emit TX clock
  uint8 device;         // select mask
  uint8 channel;        // 0 = channel A; 1 = channel B
  uint8 baudrate;       // baudrate / 2400

  uint8 ibuwi;          // input  buffer write index
  uint8 iburi;          // input  buffer read index
  uint8 obuwi;          // output buffer write index
  uint8 oburi;          // output buffer read index

  uint8 ibu[ibusz];     // input  buffer
  uint8 obu[obusz];     // output buffer
};

These are two actual implementations in in my sio source:

uint sio_avail_in(struct SioData* channel)  
{ 
  return channel->ibuwi - channel->iburi; 
}
uint sio_avail_out(struct SioData* channel) 
{
  return obusz - (channel->obuwi - channel->oburi); 
}

Nice! :-)

And both wrong. :-?

When i tested transmission of data from my Mac to the Z80 system, i only got transmission errors. The Z80 system received all data when CoolTerm was at 50 .. 80%. I suspected CoolTerm. I suspected the USB-RS232 driver software. (Which actually _IS_ pretty buggy.) I suspected sdcc. I scrutinized the Z80 assembler interrupt routine. I examined the test routine itself. (A common place. Actually i started here… ;-)) I examined gets(…), which receives all available data into a buffer and which is written in C. I examined sio_avail_in(…). Not only once … My source and what sdcc compiled. And sio_avail_in(…) was buggy. But it took me hours to see the error. Do you spot the error? C'mon, it's only one line of code. A single subtraction of two values…

sdcc could produce really fast code. could…

I have written several versions of the CRC routine, two (similar versions) in Z80 and some in C. I timed them and i got interesting results.

CRC-16 ZMODEM of rom (asm1) dt=1180 ms
CRC-16 ZMODEM of rom (asm2) dt=1430 ms
CRC-16 CCITT  of rom (c)    dt=9500 ms
CRC-16 ZMODEM of rom (c)    dt=3020 ms

The C functions to calculate the XMODEM CRC is much faster than the function to calculate the CCITT CRC, though they both contain equivalent source.

That was with sdcc 3.4.

Due to the __critical error mentioned at the beginning of this post i looked for the latest version of sdcc. I thought, if i send in a bug report they'll surely complain that it's for version 3.4, which is 2 years old.

So i looked for the latest version: Version 3.5, which is 10 months old. (sigh).

It still had the __critical bug but i found the bug tracker and an entry for this bug: Fixed in 9'2015. Version 3.5 is from 6'2015. sigh…

So i searched and found the beta versions (more like nightly builds) and the latest OSX version was 13 minutes old. :-) It no longer has the __critical bug (tested), needs some other includes (copied) and produces slightly larger code. And i ran the CRC test again: (rom now slightly bigger)

CRC-16 ZMODEM of rom (asm1) dt=1200 ms
CRC-16 ZMODEM of rom (c)    dt=8500 ms

The C routine is now nearly 3 times slower?

So i reverted to sdcc 3.4 and reinstalled my workaround for the __critical bug…

... Kio !

p.s.: @ Google: The editor is crap. could you please fix it?

---- SPOILER WARNING ----

p.p.s.: the read and write indexes in the sio struct are (unsigned) bytes.
When they are subtracted in sio_avail_*(…) they are extended to 2-byte values.
If the write index has already overflowed and the read index not, then the difference is not limited to 8 bits as expected but the high byte of the result is 0xFF.

Z80 Microcomputer with SRAM and K1-Bus

2016-04-10T20:13:00.000+02:00

After suspending the project for a while, i'm now back to it. This is an update to the current state.

Hardware

There were some errors in the circuit, which i could fix. The V2.0 Eagle file on my website already contains these fixes.

2016-04-08 fixed board

Blue wires: A6 and A7 are used (beside A5) to select the target of an i/o operation. One of these is the access to the i2c bus on the k1 bus, where also my debugging LEDs are attached. When i2c is selected in an i/o operation, then A6 and A7 are used to set the i2c data and clock lines. – But wait, A6/7 are used to select i2c operation AND to select something within the i2c operation? Merde… So i rerouted the i2c lines to use A3 and A4 instead.
Yellow wires: As you can see by a look at the ram/rom address decoder in the last post, ram and rom selection is exchanged. First i fixed this by inserting eeprom and ram into each other's socket, which is possible with the eeprom, but not with the eprom.
For my (e)eproms i use a programmer from Stager Electric, Shenzhen, China. If you ever see something made by Stager Electric: run as fast as you can! It takes ~11 minutes to write a few bytes into an eeprom. The application has an option to "disinterest blanck" and eventually in the next version (which i never saw) it even worked… So it always programs full 32k and, since even this can be done in less than 32768/64*10ms = 5.12 seconds, and it actually takes 11 minutes, which is more than 100 times longer, i presume the eeprom is programmed byte by byte, making sure that it's write endurance of 10000 is in reachable distance... So i wanted to use eproms and fixed the circuit. Programming eproms is even faster as well.
Component side: A minor glitch is on the component side: I have carefully engraved "E" and "EE" for the eprom/eeprom selection pin header into the copper layer, and again, did it wrong: exchanged, as always…

Software

I was playing a little with my c-style compiler to add a Z80 target, and found: the Z80 is really bad suited to implement anything a compiler might try to create. Too few registers which frequently have special features. Deploying the second register set is near impossible. Using index registers is painfully slow. (you already knew that) Local variables on the stack are a pain to access.

Basically you have the choice to generate real machine code, which is not only slow but bloated as well, and some kind of Forth-style virtual code, which is short but even slower.

I finally came to the "fastest possible Forth-style" code model, which i will pursue later: It uses a jump table and opcodes which are 1-byte index into this table; which is faster (and shorter) than using 2-byte addresses in the program as Forth implementations typically do. Drawback: i need the table and the table can contain only ~256/3 entries. So there must be "prefix" opcodes which then are slower.

The jump table looks like this:

vector: macro $NAME
$NAME:: equ $ - vtable 
jp _$NAME 
endm                          

; ------------------------------------

vector RESET  ;( -- ) 
vector SHELL  ;( -- ) 
vector NATIVE ;( -- ) 
vector ABORT  ;( uint -- )   

vector MODs   ;( n n -- n ) 
vector DIVs   ;( n n -- n ) 
vector MODu   ;( n n -- n ) 
vector DIVu   ;( n n -- n ) 
vector MUL    ;( n n -- n )  

vector JP1    ;( n $dest -- )
vector JP0    ;( n $dest -- )
vector JP     ;( $dest -- )

and so on. You see, each entry is a JP opcode (by virtue of the macro), but "inline" code in the table is sometimes possible as well, e.g. if a variant of an opcode just needs a short mockup of the arguments, it's code can be put directly in the table before the other opcode, where it simply runs into. It's a trade-off of used space and gained speed.

A typical "word" looks like this:

_SUB: ;( n1 n2 -- n )           

pop hl     ; hl=n1 
and a      ; de=n2 
sbc hl,de  ; hl=result
ex de,hl   ; de=result
next

where next is a macro:

; fetch next virtual opcode and jump to handler
; 
next: MACRO 
ld h,hi(vtable) 
ld a,(bc) 
inc bc 
ld l,a 
jp hl 
ENDM

An alternative is to jump to any implementation of macro next, which is slightly slower (10 cc for the jump) but also shorter (just 3 bytes). If it can be done in a relative jump, then it's even shorter (2 bytes) and even slower as well…

As you can see i use register pair BC for the virtual program counter and DE as result register, which frees HL so that machine coded sub routines can pop the return address into HL, do some work, e.g. pop arguments, and finally return via JP HL, which is not possible if you use HL as result register.

If an opcode implementation does not modify the h register, then it does not need to reload h with the high byte of the vtable address. There are actually some (few) opcodes which can exploit this additional speed boost. :-)

As you can see, the interpreter reads just one byte from the program and jumps into the vtable which contains jumps to the actual implementation of the virtual opcodes. This is faster than reading 2 bytes from the program, the program is shorter, but i need the tables and implementations for all opcodes.

The alternative i'm currently working with – because the Z80 backend of my compiler is not yet completed – is sdcc, the "Small Devices C Compiler", which has a Z80 backend. I can really tell that the generated code is bloated, and sometimes suboptimal, the syntax of the generated code is "unusual" and sometimes the compiler even crashes for me. Especially when i use the "<<" operator.

Here an example of what sdcc generates:

;/Firmware-Sdcc/sio.c:394: if(this->clk_handshake)
8520: DD7EFE   ld  a,-2 (ix)
8523: DD77FB   ld  -5 (ix),a
8526: DD7EFF   ld  a,-1 (ix)
8529: DD77FC   ld  -4 (ix),a
852C: DD6EFB   ld  l,-5 (ix)
852F: DD66FC   ld  h,-4 (ix)
8532: 23       inc hl       
8533: 23       inc hl       
8534: 6E       ld  l,(hl)   
8535: 7D       ld  a,l      
8536: B7       or  a, a     
8537: 2814     jr  Z,00106$

The first line (the comment) is the compiled source line. As you can see the compiled code reads a word from (ix-2), which seems to be 'this' ( a valid variable name in C ;-) ) and stores it at (ix-5) which seems to be a scratch cell and immediately reads it back into HL. Then it reads the desired value into l and immediately moves it into a for testing. A wonder of elegance. (note: the scratch value is not used anywhere later, l is used later, but while the value in a is still valid too.)

Current State of the Project

Current setup

Last and this weekend i refitted all hardware, which is the CPU board, as SIO board and a (not yet tested) IDE board, as can be seen to the left, and hooked it up to a regulated power supply. Current consumption is pretty low, as it's all CMOS: only 50 to 80 mA (depending on how many LEDs are lit) for all three boards, including a 96MB IDE flash rom (hiding between the IDE and the SIO board) and a 2.5GB compact flash size hard drive (sticking out from the rear side so you can't see it as well).

Slowly iterating from one broken software step to the next, regularly erasing and reusing my eproms and finally even testing some steps in the emulator (erm, yes, i have written an emulator for the system too, using my Z80 emulation from zxsp and the SIO and a LCD display emulation from my K1 CPU project) i finally got the first text message from the board. I have attached the SIO port A to a RS232-to-USB converter and use CoolTerm on my Mac to receive the messages. I stepped back to an old version of CoolTerm, as the current versions very quickly use 100% of one CPU core.

For the SIO software i use a simplified approach: The SIO ports are polled on the system timer interrupt (which is generated by the UART as well) which is currently 100 Hz. The UART has 16 byte fifo queues, so 100 Hz is way enough for 9600 Baud. But i'll probably go up to 200 Hz for 19200 Baud at least. As sending data works, the system interrupt works as well.

Idle CPU usage for this interrupt is approx. 2% (calculated), and will be ~4% with 200 Hz, if i don't find a better solution. On the photo above you can see that the red LED in front is lit. This LED indicates WAIT state and currently the CPU waits approx. 98% of the time. (Or 96%, as it's also sending some text through some ugly compiled c code…) So i can say, this LED works as well. :-)

Today i have tested writing of data into the eeprom. (actually only detecting whether it's an eeprom or not, but that's quite similar.) This works too.

Next steps:

~~Actually write some bytes into the eeprom~~ ➞ done 2016-04-11
test reading of the i2c eprom on the SIO board
test writing to the i2c eeprom
receive data from the SIO port
receive program data from the sio port and write it into eeprom.
lock away the Stager Electrics programmer. ;-)

Final question is: what should i do with the board? hm hm…

Stay tuned.

p.s.: Today i wrote a test message into the eeprom. Off course it did not work right from the start – it crashed because the destination space was too short, and behind that in the eeprom was the sio interrupt handler which was then partly overwritten.
I also tried to overwrite the eeprom with the Stager Electrics programmer, – which could not overwrite it. It took 11 minutes to write, and after that verify failed. I had expected this: Off course the programmer cannot deactivate software data protection in the eeprom. And it can't erase the device as a whole. Luckily i have more than one of these eeproms. And i already have a plan to make them writable again (else they'd be nice ceramic bricks): I can insert them into the ram socket and write a short eprom which does the job…

68008 SRAM Microcomputer - Circuit Description V1.1

2014-08-15T12:52:00.003+02:00

While waiting for the Z80 project for co-production, the 68008 board circuit has evolved. A severe bug has been removed (connecting one input of a NAND gate to ground doesn't transform it into an inverter) and a lot of timings have been scrutinized by hand, resulting in some modifications; e.g. only using address lines to select the strobe signal for the K1-bus strobe decoder or swapping /CE and /OE of the EPROM. This will be covered in another article.

This post is about the logic of the circuit. It is very simple and covered in great detail, so i hope it's possible to really understand how the board works. The project web page contains some additional material and eventually reading the 68000 data sheet will provide deeper insight.

This is a simple 68008 CPU board for the K1-bus. It uses a 681000-70 128kB SRAM and one 27C010 128kB EPROM or one 27C512 64kB EPROM or similar. The board does not contain any i/o circuitry except for an unbuffered K1-bus. Even a system timer must be provided this way.

Main Circuit

Main Circuit with CPU, RAM and ROM

The connection of RAM and ROM is very straight forward. Data bus lines and address bus lines are connected 1:1. Only two standard logic chips are used to interface the control signals: one 74HCT00 quad NAND and one 74HCT139 dual 2-to-4 decoder.

CPU control outputs used are /AS (controlling all bus cycles) and /WR (discriminating between read and write cycles). CPU control inputs used are /DTACK (terminate a bus cycle), /VPA (terminate a slow bus cycle) and /IPL1 (interrupt request).

Let's start with the signals provided and received by the 68008 CPU:

CLK: a clock signal of 10 MHz is generated by a DIL clock generator.

FC0 to FC3: These outputs provide information about each bus cycle: Whether it's program or data and whether it's user mode or supervisor and whether it's an interrupt acknowledge cycle. I was thinking of using this to page-in ROM at address 0x000000 after reset but found it did not save me components and would make software more complicated. So these outputs are not used.

/BERR: signals a bus error to the CPU. Not used on this board. There will never be a bus error detected. All bus cycles terminate with /DTACK or /VPA.

/BR and /BG: Used for multi bus master control. Not used on this board. There is only one bus master: the CPU.

/RESET and /HALT: After power-up the CPU must be reset for 0.1 seconds (really!) by pulling these lines low. Both pins are outputs as well: /RESET can be asserted by software and will in return activate /HALT: this will halt the CPU after the current bus cycle, by when /RESET is already released again thus it should not cause problems to the CPU. Or /HALT can be asserted and will reset the CPU. /HALT is asserted if double bus errors are detected (and evtl. other conditions) and can eventually never happen with this board. To be tested when built :-) This signal is directly connected to the K1-bus /RESET line.

/AS: During a bus cycle /AS is activated while the address is valid. Basically /AS going low starts a bus cycle and /AS going high terminates it.

/DS is similar to /AS but in write cycles it is asserted slightly later than /AS. I thought that it was not asserted at all in 6800 cycles, because the signal is not shown in the 6800 timing chart, but it is, as can be seen in the autovector timing chart. This signal is not used.

/WR indicates that this bus cycle is a write cycle. The opposite signal /RD does not exist and is generated by an inverter.

/DTACK: Data Acknowledge is asserted by external circuitry to tell the CPU that it can finish the current bus cycle. It is possible to keep this signal low all the time so that the CPU will run without wait cycles. (Though the timing charts all tell you that this signal has to go up and down.) On this board /DTACK is asserted all the time except if /WAIT on the K1-bus is activated or a slow bus cycle is performed which will be terminated by /VPA.

/VPA is connected to a signal called /SLOW_IO: A bus cycle of the 68008 can be terminated normally in two ways: by /DTACK or by /VPA. /VPA means "valid peripheral address" and is used to interface old (very old!) 6800 peripherals. The 68008 then does an extremely slow bus cycle. I use this to do very slow i/o cycles on the K1-bus. /VPA has a second meaning: If an interrupt acknowledge cycle is terminated with /VPA, then the CPU does not read the interrupt vector from the data bus but generates an 'autovector' internally. This board does not use /VPA for interrupt acknowledge cycles. Instead it uses vectored interrupts and uses /DTACK to terminate interrupt acknowledge cycles.

IPL0/2 and IPL1: Interrupt inputs. The 68000 has 3 interrupt lines, the 68008 has only 2 due to pin shortage in the dip package and connects both IPL0 and IPL2 to one pin. Pulling low any combination of these lines forces an interrupt in the 68008. Pulling low all interrupt lines generates a non-maskable interrupt, pulling low less lines results in an interrupt of level 1 to 6. In the 68008 normal interrupts of level 2 (IPL1) and 5 (IPL0/2) can be generated. This board uses only interrupt IPL1, which is directly connected to the K1-bus interrupt line.

The Control Signals of the RAM and ROM:

/CE: This enables the memory chip. While enabled, it can be read or written. The RAM also has a positive CE2 input, but this is not used. /CE is enabled by a 2-to-4 address decoder. See below.

/OE: If the memory chip is enabled, asserting /OE enables it's output drivers and the currently addressed byte is put on the data bus, where it can be read by the CPU. /OE is connected to the inverted /WR signal of the CPU: so at any time either /OE or /WR is enabled.

/WE: The RAM can also be written: if /WE is asserted while the RAM is enabled, then it will read the byte from the data bus and write it into the currently addressed memory cell. /WE is connected directly to the CPU's /WR output.

/CE and /OE of the EPROM are not connected as expected but swapped: the result is the same – the EPROM puts data on the data bus if both signals are enabled – but memory access after /CE is much slower than after /OE and enabling the chip with /CE would always require one wait state, at least if i use one of my (well stocked) 27C512-250 EPROMs with 250 nano seconds access time from /CE. But the EPROM's access time from /OE is only 100 ns, and therefore i connect /OE to the timing critical output of the 2-to-4 address decoder and /CE to the less timing critical /RD signal and can access the EPROM with no wait cycles.

Glue Logics

Only two standard logic chips are used to interface the control signals: one 74HCT00 quad NAND and one 74HCT139 dual 2-to-4 decoder.

Bringing it all together

The 74HCT00 quad NAND provides one NAND, one inverter and one flip flop, as can be seen to the left.

The first gate generates /DTACK to terminate most bus cycles. /DTACK is permanently low except if /SLOW_IO or /K1_WAIT is low. /K1_WAIT originates from the K1-bus and /SLOW_IO is activated when access to a special address range is decoded.

The 2nd gate is used to invert the /WR signal of the CPU to generate the /RD signal required by the RAM and ROM.

Gate 3 and 4 construct a flip flop. After reset the CPU reads reset vectors from address 0x000000 which can only be provided by reading from ROM. But at runtime we'd like to have RAM at address 0x000000 to allow the running program to modify the vector table. To achieve this, this flip flop is set by the /RESET signal and cleared by the first access to an i/o address which activates signal /CLEAR_INIT. While set, the INIT output is used to temporarily map ROM at address 0x000000.

The two 2-to-4 decoders of the 74HCT139 are used to generate the RAM and ROM /CE signals, to clear the init_FF and to generate the /SLOW_IO signal.

The 1 MB address space of the 68008 CPU is divided into 4 regions: RAM, ROM, fast i/o and slow i/o.

The first decoder generates the RAM and ROM /CE signals. As explained above it actually generates /OE for the EPROM. For this it decodes the highest address lines A19 and A18: A19 must be low for both and A18 discriminates between RAM and ROM. This results in a memory map as follows:

0x000000 .. 0x03ffff  max. 256kB SRAM (actual size: 128kB)
0x040000 .. 0x07ffff  max. 256kB ROM  (actual size: 64kB or 128kB)

As explained above we need ROM at address 0x000000 after reset. Therefore signal INIT from the init_FF is used to pull A0 of the address decoder high, so that memory accesses in the RAM address range will activate the EPROM instead. Technically the resistor and the diode construct an OR gate.

The decoder is strobed with /AS from the CPU so that there are no spikes on the outputs when the address toggles between bus cycles. This is important, so that the RAM does not erroneously write some void data into random cells and that /CLEAR_INIT is not activated too early, e.g. immediately after reset before the CPU even read the first byte of the reset vector.

/CLEAR_INIT is generated when A19 is high and either A18 or INIT is high, and will clear the init_FF. A19 high means this is an i/o address.

The second decoder generates the /SLOW_IO signal, which is connected to the CPU's /VPA address input and to the first NAND gate to suppress /DTACK. If this signal is activated, the CPU will perform a very slow 6800 peripherals bus cycle. The signal is activated when A19 is high and A14 is low:

A19 must be high for any i/o access, because A19 low is used for memory access, either RAM or ROM.

A14 is used to discriminate between fast and slow i/o cycles. If we'd use A18 instead then the first decoder would have done the job. But A14 is used because it is in the low word of the address.

The 68000 uses 32 bit addresses but has a short addressing mode, where a 16 bit address is sign-extended to 32 bits. This saves program space and execution time, as only 2 address bytes must be read from memory.

If we want to use short addressing for i/o, then all address bits from A15 to A31 must be the same. And as A19 is high, they must all be high. (which besides means negative addresses for i/o). This makes A18 unusable for this task. On the other hand we cannot use A14 to discriminate between RAM and ROM as well, because that would break memory into 16 kB chunks. `:-)

Slow i/o is deliberately chosen to be activated when A14 is low:

When the CPU performs an interrupt acknowledge cycle, it puts the acknowledged interrupt level (which is always 2 on this board) on A1 to A3 and pulls all other address bits high. So during an interrupt acknowledge cycle A14 and A19 will be high and /SLOW_IO, and consequently /VPA at the CPU will not be activated and the CPU will do a fast bus cycle and will read a vector number from the data bus. This is explained in more details below.

K1-Bus I/O Circuit

This board uses a K1-bus for all peripherals. This is a hobbyist-grade 16-bit peripherals bus for CMOS devices which can be used unbuffered in small systems, as is done with this CPU board. The core K1-bus uses 16 data lines (8 may be sufficient for most cards), 6 address lines (4 mandatory) and 5 control lines, aka 'strobe lines'. It has one interrupt line, a wait request line and a reset line.

A unique feature of this bus is, that cards are not selected implicitly by the address in an i/o operation but must be selected beforehand instead. The address lines are only used to select registers inside the currently selected card.

Another unique feature of this bus is, that cards have an assigned data line which is used as their address. This data line is selectable with a jumpers on the i/o cards. This data line is used in conjunction with 3 of the strobe signals: /SELECT, /RD_IRPT and /WR_IRPT. The other 2 strobe signals /WR_DATA and /RD_DATA control data transfer to and from the card. This CPU board can only use data lines D0 to D6.

As can be seen in the circuit above, there are 3 more signals used: /RD_I2C accesses an I²C bus on the K1-bus, which is used to attach I²C EEPROMs which are used to detect cards automatically and which can provide driver code. Implementing the I²C bus interface is optional but recommended. /RD_HI and /WR_HI are used to control 2 data registers which pass data from the 8 bit data bus of the 68008 CPU to the high data byte of the 16-bit K1-bus. This is optional.

A 74HCT138 3-to-8 decoder is used to generate all these 8 strobe signals. A strobe signal is enabled when A19 is high, which on this board means an i/o access. A0 to A2 select which strobe signal to activate.

/AS is used to control output enable as well (to strobe the decoder outputs) so that strobe signals are only generated when the address is valid. Otherwise there would be spurious spikes on the strobe lines. The exact timing is tricky, due to wide timing windows in the 68008 timing charts, which i will discuss somewhere else.

There are 4 strobes for read cycles and 4 strobes for write cycles. They are assigned to the decoder so that using the CPU's /WR signal could have been used instead of A2. And originally it was. But the bus cycle timings for the 68000 are so lousy that the risk for spikes on the decoder outputs was too high and i decided to use A2 instead. Now the program must take care to access i/o addresses with bit A2 properly set, or there will be bus collisions when writing to a read address.

The /RD_IRPT strobe is connected to output number 5 to allow vectored interrupts:

The strobe signal decoder is also activated in an interrupt acknowledge cycle, because /AS is activated as in any bus cycle and A19 is high. A1 to A3 encode the acknowledged interrupt level, which is always 2 on this board.

So the address on the bus is %1……111110101, where the blue digits indicate the acknowledged interrupt level. So bits A0 to A2 are %101 which selects output number 5. This instructs all attached cards to put their interrupt state on their assigned data line: '1' if inactive and '0' if active. Additionally, the data bus has pull-up resistors (initially for helping the 68008 with pulling them up for the K1-bus) which will make all unconnected data bits read '1' as well.

So the vector read by the CPU is 0xff – x, where x is the active or potentially a combination of multiple active interrupts. If we never choose D7 for the assigned data line of a card, the vector will always be in range 0x80 to 0xFF, never conflicting with any other predefined vector. The vector table has to be filled with matching interrupt vectors. Wherever more than one bit is low in the vector address the program can decide which vector to store, e.g. always the vector of the interrupt with higher data bit number, thus implementing interrupt priorities.

The CPU's A0 is used for the strobe decoder's A0 to enable word write instructions. The 68008 first writes the high byte to the even address and then the low byte to the odd address. If writing to the right address, this automatically first stores the high byte in the low-to-high data bus latch and then writes both bytes in a 16-bit K1-bus /WR_DATA cycle. Unluckily the same magic does not work for read cycles: the word will be read byte-swapped and must be swapped programmatically, which is a little bit awkward, because the 68000 CPU has no opcode for this.

The low-to-high and the high-to-low data registers

Data is stored in the low-to-high data register when the /WR_HI strobe signal is active, which means, that the CPU writes to an appropriate i/o address, and it put's it's contents on the high byte of the data bus when /WR_DATA is active, while the CPU supplies the low byte.

There is a jumper option on the board to use /WR instead of /WR_DATA to enable the low-to-high latches outputs. This is to solve potential timing problems.

The high-to-low data register reads data from the high byte of the data bus when /RD_DATA is active while the CPU reads the low byte, and it put's it's contents on the low data bus when /RD_HI is active, which means, that the CPU is reading from an appropriate i/o address.

The I²C bus connection is centered around a 74HC367 2+4 bit driver IC. The circuit is described on my K1-bus page. A6 and A7 are chosen for data output to the I²C data and clock line. D7 is used to read the state of the I²C data line. All I²C signals and timings are generated by the CPU under pure software control.

The I²C bus is used to attach I²C EEPROMs which are used to automatically detect cards and which can provide driver code.

In addition, 2 LEDs are connected to the I²C driver for debugging the board. I spent some time to find a place to connect some lights, as this CPU board does not contain any i/o port pins.

A8 to A13 of the CPU are connected to A0 to A5 ot the K1-bus. Also, there are pull-up resistors on A7 to A13. This is because the 68008 CPU uses TTL levels for all signals and has only very poor high-driving capability, while the K1-bus is defined for symmetrical signals, as used by 74HCxx or 74ACxx series ICs.

The reset circuit was already discussed in great detail in January:

After power-up the capacitor is empty and current flows through it and through the base of transistor T1 which opens and pulls the /RESET line low. While /RESET is low T2 is closed. When C1 fills up the current decreases and at some point T1 does no longer drain all current from the /RESET line: the voltage rises and T2 will open and drain the remaining base current of T1 which will rapidly close: /RESET goes high and the system starts.

Z80 Microcomputer with SRAM and K1-Bus

2014-08-06T21:30:00.002+02:00

My 68008 system is waiting for production and this is, because it uses only 1/2 of an Euro board which i prefer to order, because this size is the cheapest. So finally i finished a Z80 board for the K1-bus as well.

The system consists of a CMOS Z80 CPU running at 6 MHz, one 32kB SRAM and one 32kB EPROM or EEPROM. It has no I/O except a K1-bus connector. The K1-Bus allows attachment of 16 bit peripherals. K1-bus card selection is restricted to D0 .. D7.

Memory Map

The Z80 uses separate instructions for I/O and memory access. Therefore the I/O address space is not part of the memory address space.

RAM is mapped to $0000 .. $7FFF and ROM is mapped to $8000 .. $FFFF. This allows modifying the RST vectors at run time.

After reset the ROM is mapped to the whole address space because the CPU starts execution at address $0000 and needs to find code there.

Pin header JP3 allows using an EPROM or an EEPROM. The EEPROM is writable by the Z80.

RAM / ROM select circuit

The RD and WR outputs of the CPU are directly connected to the RAM and ROM's corresponding inputs. Activation of the memory chips is done using their CE input.

RAM is enabled when IORQ is low and A15 is low,
ROM is enabled when IORQ is low and A15 is high.

There is a 'init state FF' which is set by RESET and cleared by any I/O operation. Thus the FF is set after reset, will be cleared by the first I/O instruction and remain cleared throughout the rest of it's life. While it is set it forces A15 high in the RAM/ROM select circuit so that the CPU will always read from ROM.

Reset circuit

The reset circuit was elaborated in great detail in my hardware blog. It is a little bit load dependent but should work up to 5mA pull-up current, which is much more than is to be expected, unless you attach a LED here. ;-)

With the selected value of the timing capacitor the reset pulse will be approx. 2 ms.

I/O

The CPU board does not contain any peripherals, not even a timer interrupt. All Peripherals are expected to be connected at the K1-bus.

The K1-bus provides an I²C bus to attach EEPROMs with driver code and is 16 bit wide. Therefore two data latches are required for buffering data from the Z80's 8 bit bus to the upper half of the K1-bus.
I/O therefore is divided into I²C bus access, high-to-low and low-to-high data latch access and actual bus access.

The K1-bus is specified for symmetrical signals (HC, AC not HCT or TTL) therefore some signals may not have required levels without help.

K1-bus IRPT is directly connected to the IRPT input of the CPU. The interrupt source is determined in software.

K1-bus WAIT is directly connected to the WAIT input of the CPU. Any peripheral board which can issue WAIT fast enough can be used with this CPU board. Since the Z80 is pretty slow this probably means all peripheral boards.

The K1-bus address lines A0 to A5 are directly connected to the corresponding CPU address lines. They are pulled up with 3.3kΩ resistors to help the CPU to pull them up: The CPU has very little driving capabilities and driving high is even less than driving low:

    IOL = 2.0mA     @0.4V
    IOH = -1.6mA    @2.4V (which is too low for HC inputs)
    IOH = -250µA    @4.2V

Eventually 4k7Ω is a better choice. To be tested.

The K1-bus data lines D0 to D7 are directly connected to the CPU as well, which is an allowed design for very small systems. Expect problems with the third card added! They are pulled up with 3.3kΩ resistors to help the CPU to pull them up for the same reason as above. The K1-bus data lines D8 to D15 don't need pull-ups because they are driven by the 74HCT574 which has symmetrical outputs (though TTL inputs) and slightly higher driving capabilities as well.

K1-bus select and strobe generation

Any I/O instruction addresses the K1-bus. This is centered around a 74HCT138 3-to-8 decoder. It is enabled by IORQ=0 && M1=1. It then activates one of 8 strobe lines, depending on A6, A7 and WR. Pin headers JP4 allow to use A6, A7 and A8 instead and IORQ has a "strong" pull-up resistor. The reason for these circuit options are explained below.

The following strobe signals are generated by the 74HCT138:

WR_SEL
Signal on the K1-bus to select a card for subsequent I/O operations. I/O cycles on the K1-bus do not contain the address of the talked-to card, instead a card must be 'selected'.

RD_IRPT
Signal on the K1-bus to read in the interrupt states of all cards to detect which one actually activates the IRPT line.

WR_IRPT
Signal on the K1-bus to mask off interrupts on some or all card. Can be used to implement interrupts with priority levels.

RD_DATA
Read data from a K1-bus card. The lower data byte is read by the CPU directly while the upper data byte (if present) is latched into the high-to-low data latch.

WR_DATA
Write data to a K1-bus card. The lower data byte is supplied by the CPU directly while the upper data byte (if required) is supplied by the low-to-high data latch.

WR_HI
Write one byte of data into the low-to-high data latch for use in the next (probably immediately following) WR_DATA cycle.

RD_HI
Read the upper data byte from the last RD_DATA cycle from the high-to-low data latch.

RD_I2C
Read or write to the I2C bus. Both is done by reading. The I2C data and I2C clock lines are set by A6 and A7, the value from the I2C data line is read on D7. The circuit is taken directly from my K1-bus page and was the only idea i had to use only one IC.

The two LEDs are for debugging. It took me some time to find a place for some debugging lights on a CPU board without any i/o circuitry (except K1-bus) on it. :-)

74HCT138 enable:

A Z80 I/O bus cycle is strobed with IORQ by the CPU, so this is the first signal used to enable the signal decoder. But the CPU issues this signal for interrupt acknowledge cycles as well. This can be distinguished by the M1 signal, which is activated for interrupts but not for I/O cycles, so M1 must be high to detect an I/O cycle. M1 starts 2.5 cycles (that is: long) before IORQ and remains approx. 10ns longer active (M1: 80ns max. and IORQ 70ns max. after T3↑. These are marked with reference number 20 and 52 in the timing chart below) so it should be possible to use M1 directly to mask the IORQ signal.

Next is a timing problem: WR (and RD) and IORQ go up simultaneously: 70ns max. after T3↓ as can be seen in the I/O timing chart below. This may result in spurious pulses at arbitrary outputs of the 74HCT138.
note: T3 in an I/O cycle is one cycle earlier than T3 in an int ack cycle: int ack inserts 2 automatic wait cycles before T3 and I/O only one.

There are some ideas to overcome this:

Make the IORQ signal end earlier:
Problem: edges can be delayed and not be brought forward. So some very clever circuitry would be needed to do this.
Make the WR signal longer:
Adding a delay to the signal would do the job. But it would also delay the starting edge of the signal: One HCT gate adds approx. 10ns, so the front edge would at least move from 60ns to 70ns, which is now after the 65ns of the IORQ signal. Whatever we do with the WR signal, it will always at least add one gate latency to the front edge as well! So this is no easy solution either.
Delay both signals to have time for front and end shaping:
Unfortunately we don't have any time available for delaying the end of the IORQ signal: in an output cycle data is only guaranteed to be stable for 30ns after IORQ goes up (time #35) which is just enough to propagate IORQ through the '138 to the strobe signal (HCT: typ. 19ns, 40ns max, HC: typ. 17ns, 30ns max)
Use an address line instead of WR:
This is what i do in the Z80 reference design on my K1-bus webpage. This is safe in respect to no spikes on the strobe lines but has a small risk of bus collisions, e.g. if a program crashes. Since all lower 8 address lines are currently used up, we'd need A8 as well, which will make block I/O opcodes impossible to use. But this restriction could be limited to K1-bus cards which actually use all 6 address lines if we reassign the address lines, so that the rarely used K1-bus A5 will be driven by Z80 A8. I will use this design as a fallback for safety. This is what pin-header JP4 in the "IO strobe signal decoder" image above is for.
Use a "strong" pull-up:
A method i have already used on my K1 CPU is "signal shaping" with "strong" pull-ups: If you add a low pull-up or pull-down resistor which draws "high" current, it will aid signal flipping into the pull-up's direction and delay signal flipping in the other. Applying a "strong" pull-up to the IORQ signal will slightly delay the starting edge of the pulse and slightly bring forward the ending edge. So what is "strong" here? The CMOS Z80 can only sink 2.0mA @ 0.4V, so a pull-up which supplies just this will be a reasonable choice: This is approx. 2.5kΩ. Disadvantage: The amount of time which can be shifted this way is tiny: some few ns only, but they may just be enough here.

Besides WR (or A8, if nothing else works) A6 and A7 are chosen for strobe selection, because they are in the low address half, making all I/O instructions usable and they are the only unused address lines because A0 to A5 are already used for addressing K1-bus card registers.

The board

This is a view of the component placement on the board. The board will be produced within the next few weeks in one go with the 68008 board because they can be placed on one Euro board (160x100mm). More material can be found on the project's web page on my web site.

Adding value to my Mac...

2014-04-27T16:54:00.006+02:00

Hi all,

i just wanted to write some words about my new system disk and since this is kind of hardware...
ok, here it goes.

I bought an OCZ Vector 150 480MB SSD for my Mac.

SSD
+ Speed
– Price

Pro model
+ write endurance: 5 years warranty x 50GB/day vs. 3 years warranty x 20GB/day
– Price

And now Apple enters the Scene...

I have a "late 2009 iMac" which means:
1. Apple does not like me to open it
2. There's a proprietary thermal sensor built in the current drive
which is not documented but required or the fans spin up to maximum
3. Apple deliberately does not support TRIM for non-Apple SSDs

ad 0: Copy data to new drive: I copied all data with Carbon Copy Cloner from bombich.com beforehand from the internal HDD to the SSD which i had attached to the USB interface of my external backup drive. This took exceptional long, probably because i have some GB of hard linked files on it.

ad 1: Open the iMac: find a pair of suction cups and a dust-free working place.

ad 2: Thermal sensor problem: I found sources which said it's a 2N3904 transistor which sounds reasonable but there were signs for doubt: Shorting these 2 pins should silence the fan as well though shorting would move the measured voltage in the wrong direction. On the other hand in the mid 2009 model it is a transistor. So it might be. I consulted some data sheets and prepared to attach a BC337 as an external sensor.

When my iMac was open i measured the diode voltage at the two pins of the HDD and found in one direction slightly more than 1V and in the other direction 0.36V which says: no, it's not a transistor. Nevertheless i attached the prepared transistor in the most likely direction. I can only guess now that it's in fact a 1-wire sensor.

After restarting my Mac the fans sped up. For that "worst case" i had already downloaded SSD Fan Control from exirion.net. I installed it.

Unluckily the OCZ Vector does not have an internal temperature sensor, so the "intelligent" mode of SSD fan control does not work. I had to set a fixed value.

ad 3: TRIM support: As in it's worst days when they were nearly broke Apple tests whether the found drive is one Apple provides itself and disables TRIM support for others. TRIM support informs the SSD which blocks are deleted in the file system and from now on do not need to be updated whenever neighbor blocks are written thus speeding up writing and reducing write wear on the SSD. So i ran the already downloaded TrimEnabler from cindori.org. This patches a kernel extensions, which means: it overwrites the test for the found model.

Unexpected Problem

When i wanted to fit the SSD in it's 3"5 adapter into the iMac i found that OCZ uses metric screws whereas Apple is still stuck to 'imperial' units.

Cast in Order of Appearance:

The Backup took exceptionally long: 3:30 hours for 232GB.

The SSD with the included 3"5 adapter

The SSD attached to the USB interface of my external backup HDD.

The sensor which did not work. Also additional wire would not have been required.

When lifting the front glass there was immediately lots of dust sucked in and settled on the display.
No chance to avoid this. (the screw is only to help auto focus)

My iMac with the old HDD, a Seagate Barracuda.

The SSD with the (not-working) sensor on it's 3"5 adapter fitted to the iMac's mounting bracket.
I had to find a way around that metric/imperial problem and placed the mounting bracket between adapter and SSD and pinched it with two screws.

The SSD placed in my iMac.

Conclusion

Especially program start now noticeably faster. I think it was worth it.

68008 SRAM Microcomputer - Software Toolchain

2014-01-31T21:46:00.001+01:00

Hello,

this is not nice. It's a PITA but everyone uses it: The Gnu Compiler Collection; here with a backend to produce m68k object code. I suffered for some hours, but then i had success: The first test program compiled! Let's see how it works. (and leave out most of whatever did not work.)

Compiling the GCC is not a trivial task, you need a GCC and lot's of other tools, they should match each other, find each other and work on my desktop machine.

Mac Ports (the port system i use on my Mac to port 'alien' stuff) was no help though right now i can see they provide a (very outdated) version of the rtems tool chain. Yesterday i installed a toolchain for ARM and the Mac Ports ARM GCC for Linux is so old that it even fails to install (it's from 2005, today is 2014!)

OSdev.org is a great site but still a little bit over-complicated. (Ok, actually it's the GCC which is over-complicated.)

Next was rtems.org, a project for real-time embedded micro systems. The project supports the 68k CPUs but it's not the bare GCC but with their added extra. Maybe i'll cherry-pick some sources from them... :-) In the first place i didn't realize that this project already comes with a whole OS, else i had eventually skipped it. But so i downloaded the whole stuff from rtems.

Starting from their start page it was only three clicks away from rtems.org/ftp/pub/rtems/people/chrisj/source-builder/source-builder.html.

Following the instructions i did:

$ cd
$ mkdir -p development/rtems/src
$ cd development/rtems/src
$ git clone git://git.rtems.org/rtems-source-builder.git
$ cd rtems-source-builder
$ source-builder/sb-check
$ cd rtems
$ ../source-builder/sb-set-builder --list-bsets
$ ../source-builder/sb-set-builder --log=l-m68k.txt \
  --prefix=$HOME/development/rtems/4.11 4.11/rtems-m68k

(Actually i installed into a different directory, but that doesn't make much difference.) After waiting roughly one hour a couple of new GCC instances entered this world.

The rtems people have configured their installation scripts so that the resulting tool names all start with "m68k-rtems4.11-" which is different to "m68k-elf-" which is normally used. Probably there is a good reason for that.

Next i stumbled over the bitsnbikes blogger site. J. Silva started a 68008 project in 2010 and described the sources for his first "project". He had borrowed from others and i borrowed from him:

For the first successful test i needed 4 files and two commands:
(You can find all files in my first project folder at little-bat.de)

• "main.c" the main source file with function main()
• "crt0.S" an assembler file with the bootstrapping code
• "ldscript.ld" a configure script for the loader
• "Makefile" ah well, a Makefile

$ export PATH=$HOME/development/rtems/4.11/bin:$PATH
$ make

This actually produced an S-Record file which i can download into an Eprom. Fine!

S00B0000746573742E53313949
S1130000000800000008000841F900000400203C3A
S113001000000400B1C06704421860F841F9000808
S1130020004E43F900000400203C00000400B089A5
S1130030670412D860F842A742A742A74EB9000845
S1110040004C4FEF000C6000FFFE00004E75F8
S9030000FC

I don't know whether this will work because i haven't even built the hardware. ;-)

The software toolchain for the 68008 board is up and running! :-)

... Kio !

68008 SRAM Microcomputer – K1-Bus Circuit v0.4

2014-01-27T19:28:00.000+01:00

The design phase is coming to an end now, if no problems show up during verification phase later, this will be the K1-Bus I/O circuit:

K1-Bus 68008 CPU board with SRAM – K1-Bus I/O Circuit

Short summary of the K1-Bus

The ➧ K1-Bus is a 16 bit peripherals bus designed for simplicity. The basic unique concept of this bus is that the peripheral board accessed is not encoded in the address of each I/O opcode but the board to talk to must be selected. During selection the boards are addressed with a low level on their assigned data line.

Interrupts are signaled on the !K1_IRPT collector line and during interrupt processing the CPU queries the interrupt state of all devices. The CPU can enable and disable interrupts for each device individually by writing a mask work on the data bus. This way all devices may be assigned an individual interrupt priority if the programmer desires.

There are 5 control lines (strobe signals):

!SELECT – The CPU writes a mask word on the bus to select a peripheral board for subsequent data transfer.
!WR_IRPT – The CPU writes a mask word on the bus to enable or disable interrupts on the attached boards.
!RD_IRPT – The CPU reads a mask word with the interrupt state of all attached boards.
!WR_DATA – Send data to the selected device.
!RD_DATA – Read data from the selected device.

The circuits

The 74HCT138 3-to-8 line decoder generates the strobe signals for the bus when A19 is '1' during a bus cycle when !AS is low. A0, A5 and the !WR signal are used to determine which signal to activate. For the reason to chose A0 see below.

IC 74HC367 with the surrounding resistors and diode implement an i2c interface. Every K1-Bus extension board can provide an i2c EEprom with software driver for the board. Also, the i2c EEprom is used to detect the presence of the board. The funny circuit design is directly taken from my ➧ K1-Bus homepage. It was the simplest circuit i could come up with if no port pins are available to be used for the i2c bus.

The i2c bus is always accessed by reading from the K1-Bus and one of the select lines of the 74HCT138 3-to-8 line decoder is used to address the 74HC367. A6 and A7 are forwarded to the i2c data and clock line while the i2c data line is also read and returned in D7. When the i2c bus is selected, the upper 4 drivers of the 74HC367 are enabled. Then A6 and A7 are forwarded into the lower two drivers. When the upper drivers are disabled the resistors R1 and R2 feed back the output of the lower drivers to their inputs thus making them state keepers. So the low or high state of the i2c data and clock outputs are preserved until the next access is made to the 74HC367. The timing and protocol of the i2c bus must implemented in software. The i2c bus is optional but recommended unless you want to build a one-task computer which does not support to be extended later.

The two LEDs D4 and D5 may be attached here during debugging. As the 68008 CPU board does not have any I/O pins whatsoever, i have spent some time searching for a place to attach some status lights for initial testing. The LEDs should be removed when an I/O board is attached, because the LEDs may disturb operation of the i2c bus.

The two 74HCT574 data latches expand the 8-bit 68008 data bus to 16 bit for the K1-Bus. This is optional and only required if 16 bit extensions are actually attached. A serial card will probably be only 8 bit wide, but an IDE interface will most likely use the full bus width.

When 16 bit data is written to the bus this is done in two stages: First the upper byte is written into the 74HCT574 low-to-high data latch. One output of the 74HCT138 3-to-8 line decoder provides the !WR_HI strobe. Next the low byte is written to the !WR_DATA address as usual. While the CPU provides the low byte the latch outputs are enabled simultaneously and put their data on the upper half of the bus.

Note that only A0 discriminates between !WR_HI and !WR_DATA. This way the program can perform a 16 bit write opcode to the even base address: The high byte will be written first to the even address (A0=0) and the low byte thereafter to the odd address (A0=1). That's really sweet! :-)

When 16 bit data is read from the bus this is also done in two stages: First the low byte is read using !RD_DATA as usual. This will also load the high byte from the K1-Bus into the high-to-low data latch. Then the high byte can be read from the latch. For this the 74HCT138 3-to-8 line decoder provides the !RD_HI strobe.

Reading a 16 bit word is not as convenient as writing. Again !RD_DATA and !RD_HI are chosen to differ only in A0 and the word can be read with one 16 bit read opcode, but the result will be byte-swapped. This cannot be avoided except with some more latches and drivers. So the program has to swap them thereafter and as the 68000 does not provide a byte-swap opcode (though it provides one for word swapping) the best is probably to do an 8-fold rotate left or right on the word data which is a 24 clock cycles time consuming operation.

There are some pull-up resistors and networks related to the K1-Bus:

The !K1_IRPT collector line has a pull-up because it must be driven with open drain outputs.
!K1_RESET line has a pull-up for the same reason.
!K1_WAIT has a pull-up in case no peripheral board is selected. Whenever a board which uses the wait line is selected it must drive the wait line with a tri-state driver which actively pulls high and low. When a board is not selected it must not drive the wait line.

D0..D7 are pulled high with 3.3kΩ resistors. The data bus of the CPU is connected without bus drivers to the K1-Bus. This is possible for very small projects because the K1-Bus is defined to attach CMOS devices which draw only very little current, though switching the level on a line draws some current for the line capacitance which increases with line length and every device attached.

The 68008 CPU outputs TTL levels with very poor high-driving capability: 0.4mA only. The K1-Bus is defined for symmetrical signals as used by 74HC or 74AC types. This is in general no problem, only some timing parameters will shift a little. But as the 68008 can only drive 0.4mA high (and still guaranteeing only 2.4V, so effectively driving even less than 0.4mA) these resistors are there to help the CPU with the '1' bits. 3.3kΩ result in 1.5mA when the outputs are low which is approx. 1/2 of what the CPU can drive low. Peripheral drivers also must provide this additional current for low bits, but 1.5mA should be ok with almost any IC.

A0..A5 (on the K1 Bus) are pulled high for the same reason.

A6 and A7 are pulled high for the same reason because they are attached to the 74HC367 which is not a 74HCT367. I'm too lazy to modify the circuit for a 74HCT367 and i have HCs in stock, HCTs none.

More random notes:

The 74HCT574 high-to-low and low-to-high data latches were chosen to use HCT because when the CPU can operate the bus with TTL levels then the latches can do this as well. No need to read some data lines with HC circuits – keep it consistent.

This CPU board can attach 16 bit K1-Bus extension boards. This means, it can read and write 16 bit data. But it cannot use the upper byte of the data bus for board selection: !SELECT, !RD_IRPT and !WR_IRPT all do not trigger a read or write of the high-to-low / low-to-high data latches. Therefore peripheral cards are limited to D0..D7 for selection.

68008 SRAM Microcomputer – Main Circuit v0.4

2014-01-26T13:43:00.000+01:00

The design phase is coming to an end now, if no problems show up during verification phase later, this will be the main circuit:

K1-Bus 68008 CPU board with SRAM – Main Circuit

Connection of the static Ram and Eprom are straight forward. Ram is mapped starting at address 0x00000 and the Eprom is mapped starting at 0x40000 by use of the 2-to-4 line decoder IC2A.

After power-up the Eprom is forced into the address range of the Ram so that the reset vectors can be read from Rom. This is done by diode D2 which pulls the input A0 of IC2A high while the two-NAND-gate flip flop is in the power-up state.

The address decoder ICB2 either activates !DTACK or !VPA to terminate a bus cycle. !VPA is activated for addresses with A19=1 and A14=1 which will perform a slow 6800-style memory cycle, for all other addresses !DTACK is activated. K1-Bus peripherals may add wait states by pulling !K1_WAIT low. By choice of A14 to discriminate between fast and slow I/O all peripherals can be accessed using short addressing with A15 to A31 all '1'.

The reset circuit has been described in all details in blog post 68008 SRAM Microcomputer – Reset circuit.

Multiple bus masters and single-stepping using the !HALT input are not supported. Bus errors are not detected. All bus cycles will terminate, unless someone pulls !K1_WAIT low. The CPU board does not have it's own timer interrupt and no own I/O pins. This must be provided by at least one attached K1-Bus peripheral board.

For PCB images and Eagle CAD schematics you can view the project page at k1.spdns.de/Develop/….

68008 SRAM Microcomputer – Windows-in-a-Box

2014-01-25T15:20:00.000+01:00

Now to something completely different: Software.

As i expect my 68008 computer not to work with an empty Eprom, i will probably have to program it. For this i have bought some years ago a Genius NSP universal programmer from Shenzhen Stagger Electric. No need to remember this name, they are probably folded. For good reason.

I had installed the software on some kind of old Pentium Desktop running Windows 98 which i kept for this purpose only. But it's sitting in the Loft and i'd need to find a free space to work with it. Frankly, i want to get rid of it.

The idea is to install a Windows in Virtual Box on my Mac. I have already a Linux Mint running in Virtual Box (every now and then) and i know there are free pre-built images of this or that operating system available in the net. I thought, i'd pick an XP image, get an XP license key from an unused pre-scrapped computer from my employer, install the image, register Windows and try whether i can attach the programmer through an USB-to-Serial connection.

Good news: Microsoft offers pre-built Windows images on it's web site for testing the IE and maybe non-commercial use (some websites say). They are not registered but they are freshly installed and the registration period has not yet started. Also, i got a Windows XP Home Edition key from my employer.

I downloaded one of the XP images available, figured out how to combine the split .rar archive which consisted of one .rar file and one .sfx file (how-to: make the .sfx file executable and execute it) and imported the resulting .ova Virtual Box Appliance into Virtual Box. Made a snapshot. Started it.

Step One: It was unregistered as expected and i tried to register my key. This did not work. It seems that the XP instance is a Windows XP Professional and probably this is the reason why the key is rejected. Ok, no problem, i have 30 days ahead.

Second step is to install an anti virus software. I downloaded it with OSX, dropped it in a shared folder and installed it in XP. Yeah!

Third step is to install Firefox. :-)

Fourth step is to install this piece of crap, erm, the software for the Genius NSP programmer. I was afraid it could be on floppy disk, but it was on CD, and, yes, important, on a full-sized CD. I have a slot-in CD drive in my Mac and this matters. Installation worked within few seconds. My Mac is SO fast! :-)

I attached the programmer with an USB-to-Serial adapter to my Mac and started the software. As expected the software did not find the programmer. Ok, let's figure out why.

Windows does not see the adapter. I investigate settings for the virtual machine and found a place to enable this specific USB device which instantly bothered me with the next problem: USB 2.0 controller enabled – you need the VirtualBox Extension Pack. Googled for it, downloaded it from Oracle, it automatically installed into Virtual Box – fine!

After starting XP again and waiting for some seconds, it found 'new hardware'. Do i have a driver disk? hmm, no. Shall i search online for a driver? – yes. Wait. wait. I didn't find a driver. Mist. (Did it ever find a driver this way?)

I figured out what is inside the USB adapter – a Prolific Technology Inc. USB-Serial Controller C, product ID 0x2303. So i downloaded the PL2303_Prolific_DriverInstaller_v1_9_0 driver from the Prolific website from OSX, moved it into the shared folder and installed it in XP.

Ok installed. What's next? The Stagger software still doesn't find it. Hmm, given my experience with Windows i restarted it. And yes, now there is a Prolific USB-to-Serial Comm Port in the Device Manager and – ta ta – the NSP software can talk to the programmer! That's not a matter of course, because even if the serial communication works the software may do some weird things with the serial port, like bit-banging, and that is likely to fail with an USB adapter.

I tried reading an Eprom and it looked like this:

Crappy NSP software reading my Eprom

Only partly translated, very poorly translated, window decoration around a transparent window. Buggy chip data base. Outdated now. >:-) But it works. Somehow.

Ok, i can throw away the old Windows 98 computer. Fine. :-)

68008 SRAM Microcomputer – Free Run

2014-01-23T18:06:00.000+01:00

Hello,

yesterday i searched for some of the parts and put the 68008 on the bread board. According to other projects it is possible to do a "free run" by pulling all data lines to low. Though some people say it's the opcode of NOP it actually is some kind of ORI.

I wanted to check some prerequisites of the project:

Does the CPU work?
Can !DTACK (data acknowledge) be held permanently low?
Can !VPA (valid peripheral address) be used for instruction fetch cycle?

ad 1: Yes the CPU works. It cycles through it's address space and toggles A19 with 2 Hz.
ad 2: Yes, as i could tell the CPU works. :-)
ad 3: For curiosity: Yes. I believe that bus cycles and internal logic are completely separated and !VPA can be used for any bus cycle.

Next interesting question: Can i put videos in my blog? It seems i can, but for a final verification i probably have to publish this page.

Update: They are just converted into poor animated GIFs. I'll have to find something better... ok, uploaded them to youtube and embedded. back to the roots...

Free Run using !DTACK-terminated bus cycles

To the left is the 68008 on my bread board and wired up to us !DTACK to terminate bus cycles. !DTACK is permanently low (active) and the CPU runs as fast as it can: At 8 MHz it does 2 opcode fetches per µsec or 2,000,000 opcode fetches per second. As the whole address space of the 68008 is 1 MB only, it cycles through it's address space 2 times a second. The most significant address bit A19 should blink with 2 Hz. A19 is the leftmost LED in the video and i hope you can verify that it blinks with 2 Hz. Thanks.

An important result is that the 68008 actually works without deactivating !DTACK after each bus cycle. Though in all timing diagrams bus cycles start with !DTACK high it is actually possible to keep it low the whole time.

Free Run using !VPA-terminated bus cycles

In the second video i used !VPA to terminate the bus cycles instead. This mode is intended to access old (really old!) 6800 peripherals but it seems true that you can terminate any bus cycle with !VPA, even an opcode fetch cycle. It's just slower. I was curious how slow actually, if every bus cycle uses !VPA, because the 68008 data sheet say it can be from 11 to 18 clock cycles long.

Actually the M68000 8-/16-/32-Bit Microprocessors User’s Manual Ninth Edition says 10 to 19 cycles, while the M68000 Family Reference Manual – MC68008 Technical Summary says 11 to 18 cycles.

Buggy timing diagram for the 'best case' !VPA-terminated bus cycle

The latter puzzled me, because of course i started with the 68008 documentation, because that's the CPU i'm using, and i was wondering how fast the 68008 could uninterruptedly access the bus using the !VPA mode as 11 cycles is slower than the period of the free running E signal to which !VPA bus cycles are synchronized. But 10 to 19 makes sense (while 11 to 18 makes not) and i found an unnamed cycle in the 68008 manual's 'best case' chart (between the last 'w' cycle and 'S5') and i believe that someone reviewed the charts, found that the 'worst case' chart was only 18 cycles instead of 19 cycles long, demanded a correction and the missing cycle was added ... to the wrong chart. That's how real world works.

In the second video one bus cycle takes 10 clock cycles instead of 4 and therefore A19 should blink with 2 Hz *4 / 10 = 0.8 Hz instead. I think this approximately true.

In my project the !VPA bus cycle is used to access slow peripherals on the K1 bus. But it is also used during interrupt acknowledge, in order to use an auto vectored interrupt. Now the interesting question is: Does the CPU actually perform a !VPA controlled bus cycle here or a dummy cycle, as it ignores the byte read?

!VPA used in interrupt vector read cycle

My guess was, that it actually does a !VPA controlled slow bus cycle if you activate !VPA, making interrupts approximately 10 clock cycles slower. And finally i found this chart on the last (!) page of the M68000 8-/16-/32-Bit Microprocessors User’s Manual Ninth Edition. The last pages are appendix B which is about interfacing 6800 devices and which are pasted into the document as bitmaps only. :-)

The bus interface performs a slow memory cycle in the (dummy) interrupt vector read cycle if !VPA is activated to request an auto vector interrupt.

68008 SRAM Microcomputer – Unused 2-to-4 Line Decoder Got a Job!

2014-01-19T13:05:00.000+01:00

browsing through some other 68008 projects in the web

daveho.github.io (unfinished)
wandel.ca (documentation of older project)
kiwi that's a really adorable project!

i was reminded to the fact, that the 68000 has something called short addressing: Instead of supplying a 4-byte long address you only supply a 2-byte short address which is sign-extended to 4 bytes. This saves space in program code and – more important – up to 8 CPU clock cycles. So i took a look at my current address layout:

v0.2 address decoder

This allows the first 32k of RAM to be accessed with short addressing as well as the slow I/O address range, but not the fast I/O range:

%xxxxxxxx,xxxx00xx,xxxxxxxx,xxxxxxxx selects RAM and
%00000000,00000000,0xxxxxxx,xxxxxxxx is a possible subset of this which fit's in a signed word.
%xxxxxxxx,xxxx01xx,xxxxxxxx,xxxxxxxx selects ROM and can never be accessed with short addressing.
%xxxxxxxx,xxxx11xx,xxxxxxxx,xxxxxxxx selects slow I/O and
%11111111,11111111,1xxxxxxx,xxxxxxxx is a possible subset of this which fit's in a signed word as a negative value.
%xxxxxxxx,xxxx10xx,xxxxxxxx,xxxxxxxx selects fast I/O and can never be accessed with short addressing.

In order to make all I/O short addressable, all I/O must have A31 .. A15 high. A18 cannot be used to select between slow and fast I/O. The first Address line which can be used for that is A14:

v0.3 address decoder

Now the memory map is as follows:

%xxxxxxxx,xxxx00xx,xxxxxxxx,xxxxxxxx selects RAM and
%00000000,00000000,0xxxxxxx,xxxxxxxx is a short addressable subset.
%xxxxxxxx,xxxx01xx,xxxxxxxx,xxxxxxxx selects ROM (no short addressable subset).
%xxxxxxxx,xxxx1xxx,x1xxxxxx,xxxxxxxx selects slow I/O and
%11111111,11111111,11xxxxxx,xxxxxxxx is a short addressable subset.
%xxxxxxxx,xxxx1xxx,x0xxxxxx,xxxxxxxx selects fast I/O and
%11111111,11111111,10xxxxxx,xxxxxxxx is a short addressable subset.

There is no need to apply the post-reset INIT line pull-up to A14 for the I/O address decoder and there is no need to strobe the outputs with !AS because the I/O control lines are strobed by !AS directly at the 74HC138 which generates them (see other sheet – next to come :-)). It's even better this way because now !SLOW_IO which is directly connected to the CPU's !VPA input to request slow I/O or an auto vector toggles before !AS is valid and not shortly thereafter.

Funny Note

Actually this second 2-to-4 line decoder could be replaced entirely by one NAND gate: !FAST_IO is not used anywhere (actually it is currently used to reset the INIT line, but this could have been !SLOW_IO as well) and !SLOW_IO becomes low when A14 and A19 are both high, so, yes, that's a NAND function. The NAND gate would even be faster (the 74HCT139 is pretty slow) but – i don't have a spare NAND gate, but i had a spare 2-to-4 line decoder. :-)

68008 SRAM Microcomputer – Reset circuit (Updated)

2014-01-18T15:25:00.001+01:00

Now to something very simple which i always have problems with: The reset circuit. It's only made from few parts but it's ANALOGOUS. (shiver!)

Let's start with the requirements:

For unknown reasons the 68008 CPU needs an excessively long reset pulse after power-up: 0.1 seconds! Because there are other circuits which may pull !RESET low, most namely the CPU itself, !RESET must be driven with an open collector or open drain output.

To the right is an image of the current circuit:

Let's discuss it.
This is very important because it's analogous and most times my analogous stuff doesn't work. :-/

Start with R9 + R4 and ignore the rest: These two resistors form a voltage divider for Vcc and the voltage in the middle is 2.5V. (5V/2)

Now add the capacitor: Initially it is discharged and behaves like a piece of wire. But as time goes by and current flows through it (and through the voltage divider) it charges and voltage across it increases which subtracts from the voltage present at the voltage divider. So the middle voltage of the voltage divider starts at 2.5V and drops over time to zero.

Next add transistor T1: The first effect is that the middle voltage is shorted by the base-emitter diode of T1 to ground, so the middle voltage is initially 0.7V, which is the break-through voltage of the diode, stays at that voltage for a while until it drops below 0.7V and resumes dropping as with no transistor present.

While there is current sinked through the base-emitter diode, the transistor switches on, sinking the !RESET line to ground. As soon as the base-emitter voltage drops below 0.7V the transistor will switch off and !RESET rises to Vcc. We only have to choose the capacitance and the resistors appropriately, so that this will happen after approximately 0.1 second.

But that's only half of the story: The transistor is not switched on or off depending on the voltage at it's base pin, instead the amount of current it can sink is a function of the current through the base-emitter diode. While this diode keeps the voltage at the base pin at 0.7V, the current through the diode needed to do this decreases as the voltage at the voltage divider decreases. So at some point in time the transistor is no longer fully able to sink all current from the !RESET line but only a part of it and the voltage on the !RESET line will not switch instantaneous to Vcc but will rise slowly.

This is totally unreliable. !RESET must go away very fast, ideally within one CPU clock cycle. This is where T2 and it's two associated resistors enter the game: While !RESET is low T2 is switched off and the circuit behaves as if T2 wasn't there. But when !RESET slowly rises above 0.7V T2 will start to sink current. This will subtract from the base current of T1 which will in return sink less current from the !RESET line and the voltage on the !RESET line rises even more which will open T2 more which will ... Yes, a positive feedback and !RESET will rise very fast once it has reached 0.7V.

Now let's examine power-up and power-down behavior.

At power-up C1 is empty. But the state of the !RESET line is unknown. If it rises with Vcc then T2 will be open right from the start and may finish the reset pulse before it has been activated at all. It's very likely that it will do this because of the pull-up resistor on the !RESET line. To prevent this the voltage at the base of T1 must be asserted to be clearly above 0.7V when C1 is empty: This is true with the given values for R4, R9 and R12 (all 50kΩ) because when C1 is empty the voltage at the voltage divider will be 1/3 Vcc which is approximately 1.65V and which will be sinked through the base-emitter diode of T1, opening T1 which will pull !RESET low. Check.✓

After a while capacitor C1 is loaded (nearly) to Vcc and the input of the voltage divider is Vcc minus Vcap which is (nearly) 0V. When power is switched off, Vcc drops to 0V but the capacitor is still loaded, so the voltage at the voltage divider drops to -Vcc. We'll have to check the circuit that this does no harm. Check.✓

Now the capacitor must be discharged, so that it is discharged when power is switched on again. This happens through the voltage divider R4+R9 in approximately the same time as was required to load the capacitor after power-on. Check.✓

Last step: calculate the values.

!RESET is pulled up by a 5kΩ resistor which sources 1mA at Vcc=5V and there is some more circuitry attached, so let's say T1 must sink 5mA.

T1 may have a current amplification of 100 (this is a value which widely varies even for transistors of the same type) so the base-emitter current of T1 must be ~ 0.05mA.

This current must be sinked through the base-emitter diode of T1 even after C1 has charged to – let's say – 1/2 and the remaining voltage at the voltage divider is 2.5V. This leaves a voltage drop of ~2V across R9. Using the formula U=R*I <=> U/I=R we calculate the value for R9 = 2/0.05e-3 = 40kΩ.

Next the capacitor voltage must rise 2.5V (see above) during 0.1s while being loaded with 0.05mA. The formula for the capacitance is: C = I*t/U. So C1 = 0.05e-3*0.1/2.5 = 2µF. Because the calculation is very rough (actually we ignore R4 which sinks some current as well and we should integrate the current over time because it's not const) so we double the capacitance. Fine adjustments will be made when it is built on the bread board. :-)

Note: You didn't remember the formula for the capacitance? Using ISO units (not inch, miles and gallons) you don't need to look up the formula, you can construct it by pure logics:

The capacitance depends on

Charging current: higher current => higher capacitance => C ~ I
Charging time: Current supplied for longer time => higher capacitance => C ~ t
Voltage increase during charging time: Higher voltage increase allowed => less capacitance required => C ~ 1/U
Using ISO units there will be no constant factor in the formula. Yeah!
therefore: C = I * t / U.

Alternatives

We could use a timer IC for the reset circuit, most likely a 555. But this would have a larger footprint than the discrete solution. Else we could use a mono-flop from the 74 series. But basically this increases the footprint even more (14 pin DIP instead of 8 pin DIP).

Or use a counter. But even if fed from the E output of the CPU (which is CLK/10) we'd need to count up to 100,000 for 0.1 seconds which is impractical.

I have also seen using a PIC for reset (basically because the project used a PIC for various control purposes) but using a CPU to generate the reset pulse for a CPU is a little bit ... over designed.

Update: Reset Circuit Test on the Bread Board

Today i tested the circuit and as expected it did not work as expected. Let's see why. For your convenience to the right is another image of the circuit.

As discussed above resistor R12 must be high enough so that T1 actually switches on at power-up.

Full flip requirement:

But there's another requirement for R12: It must be able to sink the whole current which flew through the base-emitter diode of T1 when T2 switches on. Else there will be some base current left at T1 and it will not fully close and therefore the voltage at the reset line will not fully rise to +5V.

To estimate this current is a little bit tricky: It depends on the pull-up current on the reset line and the transistor's current amplification. Worst case is high pull-up current and low current amplification:

• ICE = 5 mA
• hFE = 100

=> IBE ≥ 5mA/100 = 50µA

R12 ≤ UR12 / IR12 = (0.65V-0.2V) / 50µA = 9kΩ

where 0.65V = break-through voltage UBE of the transistor's base-emitter diode
and 0.2V = saturation voltage UCE between collector and emitter.

So R12 must be at most 9kΩ to ensure a full flip when T2 opens.

Power-up requirement:

Immediately after power-on we require that T1 opens, even if T2 is also open due to the pull-up resistor on the reset line. For practical reason we assume C1 not completely empty but discharged to 1V, which leaves 4V at R9. So the current across R9 is:

IR9 = (4V-0.65V) / 50kΩ = 67µA

where 0.65V = break-through voltage UBE of the transistor's base-emitter diode

This current now flows through R4, R12 and T1:

IR4 = 0.65V / 50kΩ = 13µA
IBE ≥ 50µA
IR12 ≤ (67µA-13µA-50µA) = 4µA
R12 ≥ (0.65V-0.2V) / 4µA = 112.5kΩ

where 0.65V = break-through voltage UBE of the transistor's base-emitter diode
and 0.2V = saturation voltage UCE between collector and emitter.

So R12 must be at least 112.5kΩ to ensure that T1 is initially open at power-up even if T2 is open.

Gotcha! We're trapped!

Can we solve this?

First, we were calculating with worst-case values. We could require better worst cases. Second, we can adjust R4.

Power-up requirement with new R4 value:

R4 is used to make the circuit a bit independent of the current amplification of T1 and of the pull-up current on the reset line and it is needed to discharge C1 when power is off. We can't remove it entirely but we could double it's value which will half the current through it and redo the above calculations:

IR4 = 0.65 / 100kΩ = 6.5µA

IBE ≥ 50µA

IR12 ≤ (67µA-6.5µA-50µA) = 10.5µA

R12 ≥ (0.65V-0.2V) / 10.5µA = 43kΩ

Now R12 must be at least 43kΩ to ensure that T1 is initially open at power-up even if T2 is open. Much better. :-)

Full flip requirement with reduced maximum pull-up current on the reset line:

The other two screws are the current amplification of T1 and the pull-up current on the reset line.

The allowed pull-up current was defined as 5mA which means a pull-up resistor as low as 1kΩ. Let's reduce this to 2.5mA which is still much more than we expect, because the actually used value is 5kΩ, but there may be some current added from the attached devices, though this should be negligible. Let's redo the calculations:

IBE ≥ 2.5mA/100 = 25µA

R12 ≤ UR12 / IR12 = (0.65V-0.2V) / 25µA = 18kΩ

So R12 must be at most 18kΩ to ensure a full flip when T2 opens.

Power-up requirement with reduced maximum pull-up current on the reset line:

IR4 = 0.65 / 100kΩ = 6.5µA

IR12 ≤ (67µA-6.5µA-25µA) = 35.5µA

R12 ≥ (0.65V-0.2V) / 35.5µA = 12.75kΩ

So R12 must be at least 12.75kΩ to ensure that T1 is initially open at power-up even if T2 is open.

Both requirements are met if we use 15kΩ for R12. Fine. :-)

Reset circuit with validated resistor values

To the left is the updated reset circuit as tested on the bread board. Actually i had still problems with the power-up reset if C1 was not completely empty, but that was due to a LED which i connected to the reset line to show it's state. It had clearly more than 5mA. :-)

68008 SRAM Microcomputer – Main Circuit v0.2

2014-01-17T19:28:00.000+01:00

Wow, version 0.2 of the circuit released!

i worked on the main circuit and removed 2 (two!) of the four glue logic ICs. Wow! This is near-Sinclair. I could remove one more IC and become equal-Sinclair. Or remove both remaining glue ICs and become super-Sinclair.

What does this ~~nonsense~~ mean?

You know i come from the Sinclair ZX Spectrum side of the universe (as opposed to the C64) and i have certain ideas about how Sir Sinclair worked. What he did was like this:

Use the cheapest components,
reduce the design to the absolute minimum
and then take away one more part.

That is equal-Sinclair. It seems that Sinclair is a measure for uselessness. Currently my design is only near-Sinclair, because i still could take away some parts. Let's take a look at the current circuit (version 0.2) and discuss it:

Hint: right-click on the image and open it in another window if you want to keep it visible while you read on!

Main circuit diagram with everything except the K1 bus and non-functional parts, e.g. capacitors.

As you can see there are only two glue ICs: one Quad NAND 74HCT00 and a Dual 2-to-4 line decoder 74HCT139 with only one decoder actually used.

The 2nd NAND IC7C is used as an inverter and constructs a !RD signal from the CPU's !WR signal, required by the RAM and the ROM.

The next two NANDs construct a flip flop, which is set by !RESET to indicate the initialization phase after system reset. The INIT signal from this flip flop is used to pull A18 high at the input of the 2-to-4 decoder IC2A. (The resistor R5 and the diode D2 actually construct an OR gate without wasting 3 unused gates in a 74HCT32.)

This is used to circumvent a design flaw in the 68000 microprocessor series: After reset the supervisor stack pointer and the program counter are read from addresses 0x00000.l and 0x00004.l respectively so there must be ROM mapped in. But then the whole vector table of the cpu is located here; actually roughly the first 1 kByte of memory is used for vectors. It is very desirable to have these in RAM because otherwise you cannot change them without a secondary vector table in RAM which is jumped to by the vectors in ROM. So you need ROM after reset but you prefer RAM here at any other time. The normal memory layout is RAM at address 0x00000 as you can see from the 2-to-4 decoder outputs, but during initialization A18 is pulled high so that the processor, when trying to read from address 0x00000 and 0x00004 reads from the ROM instead. After that the first memory access to the slow I/O address range will also reset the NAND flip flop and A18 is no longer forced high and the RAM can be accessed.

After we know where the !SLOW_IO signal comes from we can understand what the first NAND gate IC7D does: If !WAIT is high and !SLOW_IO is high then the !DTACK signal to the CPU is low (asserted). The !DTACK signal is used to terminate a normal bus cycle of the CPU, either memory or other bus access. !DTACK will not be asserted when !WAIT from the K1 bus is active, thus implementing the wait processing for the K1 bus, or when !SLOW_IO is low which means a memory access to an address with A18=1 and A19=1. This memory range is used for slow I/O (sic!) and shall use the 6800 peripherals slow addressing mode, which is signaled to the CPU by pulling it's !VPA entry low instead of activating !DTACK. You can see that !SLOW_IO is directly connected to !VPA of the CPU. So either !DTACK is asserted (for the first 3 memory ranges, eventually suppressed by !WAIT) or !VPA.

Trick: Connecting !SLOW_IO to !VPA has a second effect: This also requests the CPU to use an auto-vector from it's vector table for interrupts. When an interrupt is acknowledged and the CPU reads the vector number for this interrupt, most address lines are pulled high, most notably A18 and A19, which will activate !SLOW_IO which activates !VPA which requests an auto-vector if it is pulled low during an interrupt acknowledge cycle. We only have to take precautions that this not also performs spurious I/O on the K1 bus; but that's on the other sheet. :-)

That's all about the glue logics. Not supported are:

Bus arbitration for multiple bus masters
Bus error or any other exception signaling

How to become Equal-Sinclair

Let's remove the Quad NAND IC. Will it still work?

If we remove the first NAND IC7D then we'll lose wait handling for the K1 bus. Ok, well, that may be acceptable. But we'll still need an inverter here to invert !SLOW_IO to !DTACK.

To solve this, we could connect A19 directly to !DTACK, so whenever the CPU accesses RAM or ROM !DTACK will be asserted. But then we'll lose !FAST_IO because whenever !DTACK is not asserted !VPA must be asserted instead to finish the bus cycle. So !VPA must be connected to !A19. How can we invert A19? We can use the unused gate from the Dual 2-to-4 decoder. If we have no fast I/O we also need no wait cycle handling. :-) Check.✓

If we remove the second NAND IC7C then we'll lose the !RD signal. This is acceptable: !OE of the ROM can be tied high so whenever the ROM is enabled it puts it's data on the bus. Disadvantage: If the CPU writes to the ROM then there'll be a collision on the data bus. We'll have to be cautious.

!OE of the RAM can also tied high. When !WR is enabled this will supersede the !OE signal. (eventually this is not true for all RAMs, but RAMs exist which can be operated in this way). Check.✓

If we remove the NAND flip flop, we'll no longer have the INIT signal. Ok, let's remove the resistor-diode OR gate as well and let's swap !ROM_CE and !RAM_CE. ROM has to be at address 0x00000 and we can't modify the vector table in ROM. Check.✓

Summary: Yes, we can become Equal-Sinclair! We just have no fast I/O and we'll have to live with the vector table in ROM. That's easy.✓✓✓

How to become Super-Sinclair

Obviously we must remove the Quad 2-to-4 line decoder as well. Will it still work? Ok, that's real hard, but Super-Sinclair IS real hard. You get a "Sir" for Equal-Sinclair, right?

First let's remove the gate used to construct the inverter for address line A19 which we just have added to become Equal-Sinclair. Now we have the choice: We could build an inverter from two resistors and one transistor or we could ... leave it out. Yes, that's what Sir Sinclair had done. :-)

We either need !VPA to be asserted during interrupt acknowledge or we must to supply a vector address. Both is possible:

Supply !VPA: Tie !VPA fixed low and !DTACK fixed high and the CPU will do all bus cycles in 6800 mode. Will be a little slow though. We'll have slow memory access and only slow I/O.

Supply !DTACK: Tie !VPA fixed high and !DTACK fixed low and the CPU will do all bus cycles in standard mode. We'll have fast I/O (with no wait cycles) but no slow I/O. During an interrupt acknowledge cycle we'll have to provide a vector number on the data bus. We can use a resistor network to pull up all data lines if no one else drives the bus, then the vector number will be 0xFF (255). Check.✓

Now – shiver! – let's remove the address range decoder. Can we provide !RAM_CE, !ROM_CE and !FAST_IO (no need for !SLOW_IO) somehow else?

Yes, we can! We can use 3 address lines directly to do that:
• Use A17 for !RAM_CE,
• A18 for !ROM_CE and
• A19 for !FAST_IO.

Drawbacks:
Each range is limited to 128 kByte. (A0..A16) Accepted.
There will be be bus collisions if the program accesses addresses with more than one of A17 .. A19 low. Accepted.
There may be short bus collisions when the address lines toggle between bus cycles. Accepted.
After reset the CPU will read from address 0x00000.l and 0x00004.l Uh uh... This will read from RAM, ROM and IO simultaneously. And any other vector will be read from this page with triple-collision as well. That's bad.

I'm a programmer and i'm here to find solutions: We have to disable RAM and IO when the ROM is selected. This will also reduce the forbidden address ranges with bus collisions.

We can get the RAM out of the way by its positive CE input. Yeah, it has one. Look at the circuit diagram. Just connect RAM.CE to A18 (!ROM_CE). When the CPU reads any vector from the first kByte of memory then A18 will be low and the RAM is not enabled. Check.✓

Next we can suppress the K1 bus control signals in a similar way: As you not yet know they are generated with a 3-to-8 line decoder 74HC138 which has two negative and one positive enable input. We can connect A19 (!FAST_IO) and !AS to the negative enables, and A18 (!ROM_CE) to the positive enable. Check.✓

Summary: Yes, we can become Super-Sinclair! We just have only fast I/O with no wait cycles, we'll have the vector table in ROM and we are limited to 128k ROM and 128k RAM and there is a risk of bus collisions if the program accesses forbidden address ranges and there may be regular very short bus collisions between each bus cycle.

Accepted. Design finished, let's ship it. ;-) Check✓ Check✓ Check.✓

Update: For completeness here is the main circuit of the Super-Sinclair design. Of course i won't build it this way, because, as said above, Super-Sinclair means reduced beyond usability.

Super-Sinclair 68008 Processor Board – no glue logics required

New Project: 68008 Microcomputer

2014-01-14T18:41:00.001+01:00

Servus,

one of my weird ideas is to build a microcomputer with every CPU i own. Ok, maybe not really *every*, but some of them i memorize with nostalgia. These are:

Z80 with SRAM
Z80 with DRAM and paged memory
68008, one with SRAM and one with ~ 2MB DRAM SIMMs
68000, let's see
68020, eventually with FPU
68040, with ~ 64MB DRAM PS2-SIMMs probably

All of them will connect to the K1 Peripheral Bus, so that half of the work is already done. ;-) Beyond that, no I/O will be implemented on these boards.

You see, in essence these are two processors. I started with a ZX Spectrum and proceeded with an Atari ST before i entered the world of Apple, Linux and not Windows.

68008 SRAM Microcomputer

Let's start with the 68008. I have both, the DIL and the PLCC variant. I'll use the DIL version for the SRAM board and the PLCC version for the DRAM board, because it can address more memory: 4MB instead of 1 MB only.

The 68008-SRAM board will the half sized – 79x100mm – pretty tight, but it will fit. The 68k CPUs are a little bit nasty to integrate into a system, because they have quite a lot of requirements, especially the 68008 which implements an asynchronous bus model. But i'll use any trick, cheat and simplification i can find to make it fit. :-)

Basic Requirements

68008 PDIP CPU
Eprom 32 .. 256 kByte
SRAM 128 or 256 kByte
K1-Bus half-sized card

Project Page

k1.spdns.de/../K1-Bus 68008 CPU board with SRAM

The Board

Layout v0.1 2014-01-14

To the left is a first layout of the board with all components required for the current circuit.

The top row ICs are glue logics.
To the left is the K1-bus connector.
The circuits to the left of the CPU connect the CPU to the bus.
The two 74HC573 registers expand the 8 bit bus of the 68008 to the 8 or 16 bit K1-bus.
The 74HC367 hex driver plus some resistors implements the i2c interface.
The 74HC138 decodes the strobe signals for the K1-bus.
In the center of the board is the CPU
and right of it the SRAM and the Eprom.

Requirements and Simplifications

• Memory Map. Memory is divided into 4 sections: RAM, ROM, fast I/O and slow I/O each of which is 256 kByte in size. This limits the size of the RAM and Eprom.

• CPU Clock. This is generated by a 10 MHz clock module. I avoid the hassle of generating the clock signal "by hand" with a quartz and inverters.

• CPU Reset. The 68000 family needs 0.1s low on Reset and on Halt for power-up initialization. This is currently done with some R and C and a Schmitt Trigger inverter.

• CPU DTACK. The 68008 needs an acknowledge for every memory read or write access. By delaying this signal you can add wait cycles. By never asserting this signal you can make the CPU hang for ever. This signal will be handled in the most simple way possible: It is tied to ground and this way always asserted. According to some other 68008 projects, where they do this for the test run on a bread board, this should work. Drawback: No wait cycles possible. I'll need a reasonably fast Eprom and SRAM. This also applies to the fast I/O.

• Bus Error. DTACK is always asserted and BERR is tied to Vcc. There will never be a bus error.

• Bus arbitration. The K1-bus does not support multiple bus masters and so does this board: bus request BR and bus grant BG are not used.

• Interrupt control. The 68008 PDIP has two interrupt input lines which can encode 4 states: no interrupt, 2 normal, prioritized interrupts and a non maskable interrupt. The only source for interrupts on this board is the K1-bus and so i need only one normal interrupt. The K1-bus supports prioritized interrupts by enabling/disabling interrupts directly on the attached extension cards.
Now the nasty thing: Devices must provide an interrupt vector during the interrupt acknowledge cycle. An automatic vector can be requested by asserting VPA and so we need to know when a bus cycle is an interrupt acknowledge cycle. For this we must decode the Function Code outputs FC0, 1 and 2 which are all '1' during an interrupt acknowledge bus cycle.

• K1-bus access. As said above there are two address ranges for the K1-bus: fast and slow.
The idea is to use a standard memory cycle for fast I/O where wait cycles are not supported because DTACK is permanently asserted. This is suitable for very fast peripheral cards and for switching interrupts and i2c on the K1-bus.
For slow devices i want to use 6800 peripheral I/O cycles by asserting VPA (valid peripheral address) in this address range. This will do a bus cycle synchronized with the free-running E output of the CPU (which has a fixed period of 10 CPU clock cycles) with at least 11 and at most 18 CPU clock cycles due to synchronizing; and no wait states because of the fixed alignment to the E signal. Eventually i'll come up with something better here.

• CPU VPA. This input was already discussed in two requirements above: Interrupt control and K1-bus slow access. During an interrupt acknowledge cycle it is pulled low to request an automatic vector (interrupt routine start address) and in slow I/O it is pulled low to request a slow 6800 peripheral bus cycle.

• K1-bus 16-bit I/O. Peripherals on the K1-bus may use 16 bit I/O. The 68008 has only an 8 bit data bus. There are two possibilities: I use it 'as is' and attach only 8-bit extension cards. Or i add 2 latches to store and receive the upper byte during a 16 bit I/O. I have some K1-bus cards which use the 16 bit bus, most namely the IDE board because IDE is 16 bit wide, and so i'll invest in two '573 data latches.

• K1-bus i2c. Peripheral cards on the K1-bus can have i2c EEproms to signal presence of and identify the card and to provide a universal byte-coded drivers. This costs one '367 hex driver IC plus some resistors. This makes it possible to add arbitrary cards to the microcomputer.

• ROM and RAM. The EPROM and the SRAM may be up to 256 kByte in size each. Due to space constraints only one SRAM IC is possible. Memory access cycles of the CPU are without wait states (see DTACK above) and therefore the memory ICs must be fast enough: Scrutinizing the bus cycle timing charts i expect that 150 ns access time will do it; eventually up to 200 ns will work.

• System Timer. There is no system timer on the board. Instead it is expected that one of the K1-bus cards supplies one. This is fairly easy, because my serial cards with one (or more) 88C192 dual UARTs can supply this.

• Serial and Parallel Ports. Any connection to the outer world requires a K1-bus extension card.

Version 1.0 of i2c driver EEprom specification

2013-02-03T12:05:00.001+01:00

Hello,

Version 1.0 of the specification for the driver i2c Eproms on the peripheral cards is finished.

I have also finished translation of the K1-Bus documentation, which was initially written in German.

Here are the links:

K1-Bus documentation: http://k1.spdns.de/.../K1-Bus/
Driver EEprom layout and bytecode: http://k1.spdns.de/.../K1-Bus/i2c-eeprom.pdf

... Kio !

K1 Bus Update

2013-01-21T19:47:00.000+01:00

Hello,

after i have worked for some months on my ZX Spectrum emulator, i'm now back for a while to the K1 CPU.

I worked on the draft for the driver i2c eeproms on the peripherial cards. They are going to version 1.0 soon. In the course i've started translating the K1 bus documentation, which was initially written in German. There is quite a lot of text to translate.

If you are interested, here are some links:

K1 bus documentation: http://k1.spdns.de/.../K1-Bus/
Driver eeprom layout and bytecode: http://k1.spdns.de/.../K1-Bus/i2c-eeprom.pdf

XVGA TFT update

2012-07-18T19:30:00.001+02:00

Hello,
i made some progress with the XVGA controller board.

Current State of the Board

First, it becomes more expensive, because it had to buy a minimum of 5 of the FPD-Link transmitter chips (note: i sell the others. Interested? ;-). Then the layout is very tight. See here:


2012-07-18 autorouted board

ICs placement is nearly final, then i'll add some more hand-routed wires, auto-route again and then hand-optimize. This will take a week or two.
I had real problems with the rams and FPD-link controller, they actually just fit between the '245 bus transceivers and the VGA connector. I tested a couple of arangements, but this produced the least vias.
FYI: bottom left is the 16 bit K1 bus, directly above two 74245 bus transceivers, the SMD ICs are RAM and the FPD-Link transmitter. 6 chips next to the right are drivers and drivers with latches, which are used to select between an externally supplied address (for the CPU reading/writing the video RAM) or internal address from the counter cascade, used to address the video RAM for display. Next 'column' of chips are the external address registers/counters, next are the internal address counter cascade, the vertical chips at the right are clock and ATtiny. Bottom right 6 ICs are the control logic. There's an I2C EEPROM sitting on the rear side of the PCB underneath the bus connector.
Though i us a 15-pin VGA SUB-D connector, the signal is not VGA but transmits 4 LVDS signal lanes, each 3 wires: pos. and neg. differental signals and associated GND. In addition one PWM signal is transmitted on pin 15 which will control the LCD backlight brightnes. According to what i found in the net this is a 5V 125kHz PWM signal.

Current Circuit

Here's an update to the circuit as well:


Circuit 2012-07-18 - Data Paths

Circuit 2012-07-18 - Control Logics

XVGA TFT

2012-07-04T12:36:00.002+02:00

Though i should finish the built LCDs first, i've already begun with 3rd display. I's a 1024x768 pixel TFT from my old iBook. It has a LVDS FPD-Link connection and i have searched the web for info about the panel and FPD-Link. I think i've got enough info to build it.

Major problems:

I need a special transmitter chip, preferably in 5V. These chips are generally hard to find (never used by hobbyists) and 5V is even harder. But i'll get a quote today. :-)
Timing is at the upper end of any hobbyists project: Pixel clock is 65 MHz, may be eventually lowered down to ~62 MHz.
This requires at least 15ns RAM, which will result in very tight timing, or better 12ns. And i need 1.5 MByte of it. Though i have plenty of RAM in stock, i opted to buy three 256Kx16Bit 12ns RAMs. Head count of ICs on the PCB is already very high.


Data flow on the XVGA controller

Control signals

Project Page

The project page is .../IO-Boards/VGA/ on my home site. This is on my private computer and everything i do here is directly visible on this page. Currently it contains a collection of spec sheets and the current state of the controller board design.

The Plan

The design ideas are as follows:
• The VRAM is addressed by a 20 bit counter cascade. Due to timing problems, the address is buffered by a set of 74574 latches, so the address is always one clock cycle delayed. The RAM output data is directly fed into the FPD-Link transmitter, which is clocked by the same clock signal. All running on 64MHz with a clock cycle of ~15ns.
• The slow signals, VSYNC, HSYNC and DE (Display Enable) are generated by an ATtiny. It also controls count enable of the address counters, to stop them during HSYNC and VSYNC (or, when DE is false). The ATtiny will be clocked with 16 MHz synchronously with the 64MHz pixel clock.
The ATtiny will also generate the FFB (frame fly back) interrupt signal, which is very important:
• VRAM access from the CPU will be completely asynchronously with the pixel access for the display. It will simply override the signals for the display, resulting in 'snow'. Each access will 'destroy' the display of approx. 3 pixels. This allows me accessing the VRAM without asserting the !WAIT signal on the bus. I have already checked the timing, writing is safe, reading is tight, but should work.
Accessing the VRAM requires sending an address and then one data i/o. The address is 20 bit, so it has to be transferred in two chunks. I opted to split the address in two 10 bit packages, which will directly translate into X and Y pixel address. To reduce the required bus transfers, i designed the address registers as counters as well. They will provide an auto-increment feature, so that i only need to set the start address and then can read or write in burst mode, hopefully with the full bandwidth of the bus of 16Mwords/sec. The X address can auto-increment, the Y address can auto-decrement as well. I probably can't make the X address easily auto-decrement, because i simply have not enough control lines to control this easily. The 'control lines' are the bus's address lines, and it has 6 of them.
To avoid the 'snow' effect when accessing the VRAM, i plan to do most i/o during the vertical frame flyback, which may be up to 10% of the total frame time. The exact maximum number of lines during ffb of my display will be determined when it is all built, therefore it's nice to have it programmable, because it's done by the ATtiny. It will be slightly tricky to align the control signals of the ATtiny with the 4-pixel boundary (ATtiny clock is Pixel clock ÷ 4) because the DE (display enable) signal for the FPD transmitter and the count enable signal for the address counter must not start and stop somewhere in the middle of a 4-pixel package but exactly at the start or end. Else the image on the TFT will be shifted some pixels, missing some at the left side and displaying garbage at the right side.

Let's see how it all works!

The K1-16/16 CPU

2012-06-27T19:06:00.001+02:00


The self-built K1-16/16 CPU, built with standard 74xx CMOS ICs

The K1-16/16 CPU is the heart of the self-designed and home-built K1-16/16 Computer.
It is built with CMOS ICs from the 74AC series and fits on 5 Euro boards (160 x 100 mm).

Sometimes you are struck by an idea...

Due to depressions programming became harder and harder. So i thought, why don't do something more simple, with more manual work? Electronics, for instance. And, thanks to the internet, i have already read from other maniacs, who built a 6502 CPU. Or a Z80 in FPGA. Or Dennis Kuschel's myCPU. And there's a web ring about it. If others can do this, it can't be that hard. Basically...
Of course my CPU should be Different. Better. And Simple, so that i can understand it myself. B-)
For symmetry i settled with a 16/16 bit design: 16 data bits and 16 address bits.

Unusual and Generally Interesting Parameters

• Combined Harvard and Von Neumann architecture

• 16 MHz system clock
Front panel with slow motion clock for exhibitions et. al.
Full static design down to 0 Hz
• 16 bit internal data bus
• 16 bit internal address bus
• 64k x 16 bit internal ram
• 32k x 24 bit microcode
organized as 2 code planes à 16k for conditional execution and branching.
the microcode is copied from eproms to rams during boot for increased speed.
it is also possible to load the microcode from an external source instead.
the microcode implements:
boot code, BIOS, kernel
100++ assembler opcodes for ram-based programs
100++ millicode opcodes for microcode-based forth or c-style programs
• No flag register. (but flags)
• Built with discrete logics using 74HCxx and 74ACxx ICs
CPU fits on 5 "Euro" printed circuit boards (160 x 100mm)
• Manual circuit design
Manual routing of the PCBs (with EagleCAD)
Professional made double-layer circuit boards

Harvard Architecture

Programs can be written directly in microcode. Adopting this view, the K1 CPU has separated program and data memory. This is the Harvard Architecture.

Von Neumann Architecture

More likely, the CPU can also use a fixed microcode, which reads opcodes from the ram and executes them. Seen this way it has a combined program and data memory. This is the Von Neumann Architecture.

Start on blogger.com

2012-06-27T18:13:00.000+02:00

Hello,
I'm building a CPU for 4 years now (more or less) and accompanied this on my home page k1.spdns.de. This worked quite well but i wanted to separate the blog from the project documentation itself and i wanted to enable some feed back. So i started this blog on blogger.com. I will move some stuff in here which previously was on my website; i'll see whether i can fix the dates.

... Kio !

2nd Display

2012-06-12T00:00:00.000+02:00

Going into mass production. ;-) I built a second, very similar LCD display which uses the same controller board. This one has a backlight, but it was CCFL. I had no inverter and building one and adapting it to the CCFL wold have taken too long and so i replaced it with an array of LEDs. Not good but working. See the photos on the LM64K101 - LCD Display 640x480 project page. Next is to debug the terminal software a little bit more and use it as output for the computer.

Debugging the LCD Display Driver and Hardware

2012-05-26T00:00:00.000+02:00

This week i worked on the 640x480 pixel b&w LCD terminal. Soldering was easy, but fixing all the bugs took some time. I also had to do some changes to the terminal code because i realized that i was using the LCD upside down.

I connected the LCD and powered the board through the programming header. Off course nothing worked, except for the display refresh routine, which brought up a picture of the erased DRAM cells. Step by step i brought up more functions: Erase screen, print characters, read and write whole pixel lines and scrolling. I had to add some nops to the DRAM read and write timing, because the data goes through series resistors which create some delay. For the next board i'll reduce them slightly. Then printing of standard-size characters with 4 attributes in all combinations works.

Finally one last important step: Attach it via serial line to USB to my Mac. Nothing worked. I adjusted the Baudrate on both sides. I printed text from the LCD terminal on the serial line in an endless loop. There was no signal on the TxD line. Why? But there seemed to be a signal on the RxD handshake line. ... ???

I had connected data lines to handshake and handshake to data lines on the board. :-(. Fixed it with a cutter, solder and wire. Tested. Worked. :-)