It’s a long way to the memory top

In preparation for the build of our new CPU-Board, we purchased two WDC 65c02 in PLCC44 package from some eBay vendor.
On arrival, the first interesting thing is the way they were packaged. No anti esd packaging, only a plastic bag, which we found sketchy enough to post on Twitter.

Next, WDC reacted to that tweet, stating that these might be not genuine or be at least very old.

At least they did arrive in time so we finally could assemble the first new CPU board. This board marks a lot of firsts for us:

  • our first 4 layer board
  • first time using PLCC chips
  • first autorouted board
  • first time using a CPLD (Xilinx XC9572 (PLCC84))
  • first time making a board without having breadboarded everything first
  • first time following a lot of people’s advice to put bus transceivers between the CPU board and the rest of the system
4 layers, CPLD, PLCC chips

With so many degrees of freedom, we could not be sure what to expect at first power up. We modified the BIOS to only inititalize the on board UART (another first, CPU board with UART..) and output some characters on the serial interface.
Ideally, the BIOS would start up and send something.

But living in a non ideal world, nothing visible happened.
To detect any signs of life, we had to proceed using oscilloscope and logic analyzer only to find the bus was completely dead. We spent quite a while troubleshooting, and identified the RDY generator inside the CPLD as the first culprit. After disabling the latter, and continuing with a slower clock speed, we at least had activity on the bus, but still nothing on the serial interface.

Scope and logic analyzer readings still were inconclusive at best. When trying to observe the VPB pin with the scope, we noticed that this pin seemed to be dead. Which is odd, because it shoud have shown some activity at least after the RESET-pulse. Also, when observing the address bus, A0 did seem to be always high.. highly unlikely for an address line, especially for such a low one.

Back to the sketchy CPUs. Are they 65C02s at all?
Measuring Pin 43(PHI2 OUT) and Pin 4 (PHI1 out) showed us a clock signal there. Also, the SYNC Signal (Pin 8) showed some activity. So the chip did seem to be something 6502ish..
We then compared the PLCC44 pinouts from the WDC datasheet to one from Rockwell.

Pin 2 (VPB on WDC) on the Rockwell chip is NC.. which explains why we did not measure anything there. Also, on the Rockwell chip, Pin 10 is Vcc, while the same pin on WDC is A0..
So, our WDC is indeed not genuine, and appears to be a relabeled Rockwell 65C02 (or another non-WDC manufacturer, we only had the Rockwell datasheet to compare with). The CPU itself seems to be ok, but the (subtle) pinout differences make it impossible to use it on our board. Time to order some fresh ones from

To be continued…

Posted in Allgemein | Tagged , , , , , , , | Comments Off on It’s a long way to the memory top

512k ought to be enough for anybody

The biggest limitation of any 8 bit CPU such as our beloved 65C02 is the amount of memory that the CPU can address. With 16 address lines, the addressable memory is maxed out at 64k. All ROM and RAM has to be crammed into there. With the 6502 being a memory mapped architecture, IO devices need their addresses there, too.

In order to expand the amount of usable memory, some trickery is necessary. For example, the developers of the C64 came up with a rather clever hack to cram 20k of ROM and full 64k of RAM and IO area into 64k address space by introducing a register that enables the programmer to switch off the ROM, giving access to the underlying RAM. When the ROM is enabled, writes to the addresses go into the RAM below.
We decided to mimic this behaviour in our current implementation of the Steckschwein glue logic.

But it’s time to move on. More sophisticated 8bit machines such as the C128 or the CPC6128 have a more clever banking logic to give the CPU more than 64k to work with. The C128 even has a MMU. The next logical step for the Steckschwein is to have a MMU, too.
We decided to reimplement our glue logic from scratch using a CPLD and expand our memory space in the process.

We decided to go for 512k RAM. In order to address that much memory, the first thing we need to add are three more address lines. That’s where the CPLD comes in. In order to cut the 512k ram into smaller banks, so we can easier address them, the CPLD does not only provide the address lines A16 – A18, but also doubles the address lines A14-A15. Address lines A0-A15 from the CPU go to the CPLD to be decoded. RAM and ROM only see A0-A13 from the CPU and A14-A18 from the CPLD.
This way we can split the 512k into 32 banks of 16k each. The 64k address space of the Steckschwein is now organized as four “slots” with 16k each.
Four registers in the CPLD, which are mapped into the IO area contain the values for the extra address lines and are used as selectors for which bank is mapped into which slot. Bit 7 will select ROM instead of RAM, so that ROM banks are being handled just like another memory page. Also, this means a departure from our 8k ROM bank size.
We might upgrade the 32k 28C256 EEPROM to a 512k Flash EEPROM in one of the next iterations, giving us 32 RAM and 32 ROM banks. Also, adding another address line is not a big deal, so upgrading to 1MB will be easy.

The 4 Slots within the 64k address space
Slot 0   Slot 1   Slot 2    Slot3
|Bank 81*|
|Bank 80*|
|  ....  |Bank 81 |
|Bank 4  |  ...   |
|Bank 3  |Bank 3  |Bank 81*|
|Bank 1  |Bank 2  |Bank 80*|
|Bank 0  |Bank 1  |Bank 30 |Bank 81*|
         |Bank 0  |Bank 29 |Bank 80*|
                  |Bank 28 |Bank 30 |
                  |Bank 27 |Bank 29 |
                  |Bank 26 |Bank 28 |
                  |Bank 25 |Bank 27 |
                  |  ....  |Bank 26 |
                  |Bank 0  |Bank 25 |
                           |  ....  |
                           |Bank 0  |


The above illustration shows how the Slot selection scheme works. It is also possible to map the same bank into all four slots.

In order to be able to execute the RESET Vector the CPU requires ROM being present in Slot 3 at system start time. So the default bank assignment looks like this:

Slot 0Bank 0
Slot 1Bank 1
Slot 2Bank 2
Slot 3Bank 80
Default bank assignment at boot

We do not need the ROMOFF mechanism anymore, so the loading of steckOS will follow a different procedure:

  1. System bootup with bank $80 (ROM) in slot 3
  2. Bootloader switches slot 2 to bank 3 (or whatever bank the OS shall be in)
  3. Bootloader writes steckOS to slot 2 ($8000)
  4. Bootloader switches slot 3 to bank 3
  5. Bootloader jumps to steckOS init

This memory banking scheme is rather simple, but provides a lot of flexibility in order to use more than 64k of memory. Also, all kinds of memory (ROM, RAM) are being treated the same way, so it’s much more cleaner than the ROMOFF approach. Being able to remap the area containing the zero page and stack will also help implementing some sort of task switching or even multitasking.
On the downside, it’s flexible but pretty dumb as the software has to keep track of what has been put in which bank.
We a really eager to explore this idea, so the VHDL code for the XC9672 CPLD has been written, the board has been designed and waiting to be delivered.

Posted in Allgemein | Comments Off on 512k ought to be enough for anybody

Loading ASCII sources in EhBasic

Since our implementation of FAT32 now supports reading a file byte for byte, a little rework of the file handling in our version of EhBasic is in order.

In the past, we only could read or write a file as a whole, relative to the location in memory where the according pointer pointed to. We used this in EhBasic to save and load BASIC programs by dumping and reloading it’s binary representation from memory. While this works well, this approach has the major disadvantage that the saved program will be incompatible with other versions of EhBasic or even with our own when the token list is changed, which happens when adding new commands.

So clearly, the better approach is to read the BASIC program as source in it’s ASCII representation. This is the way EhBasic’s late creator, Lee Davison, preferred, and suggested how to implement this:

To load an ASCII program redirect the character input vector to read from your filesystem and return to the main interpreter loop. The input vector should be restored and the file closed when the file end is reached or an error is encountered.

Basically the interpreter would read characters and interpret them, just like them being typed in, but instead they will be read from the file. So, our LOAD command is implemented like this:

    lda #O_RDONLY 
    jsr openfile

    lda #<fread_wrapper 
    sta VEC_IN 
    lda #>fread_wrapper 
    sta VEC_IN+1

    lda #<outvec_dummy 
    sta VEC_OUT 
    lda #>outvec_dummy 
    sta VEC_OUT+1 

    JMP LAB_1319 ; reset and return

All it does is changing the in and output vectors and then returning back to the interpreter, which then begins to read characters from VEC_IN until the file is read. But then what? The input vector still points to fread_wrapper, how do we get control back?
That’s the reason we did not point the vector directly to fat_fread_byte. Instead, we implemented a wrapper, which will read a byte from the file and pass it to EhBasic, and restore the vectors when EOF is reached:

    ldx _fd 
    jsr krn_fread_byte 
    bcs @eof 
    cmp #KEY_LF ; replace with "basic end of line" 
    bne :+ 
    lda #KEY_CR
:   ply   
    cmp #0 
    jsr krn_close
    jsr init_iovectors
    SMB7 OPXMDM ; set upper bit in flag (print Ready msg) 
    jmp LAB_1319 ; cleanup and Return to BASIC

    lda #<krn_chrout 
    sta VEC_OUT 
    lda #>krn_chrout 
    sta VEC_OUT+1
    lda #<krn_getkey 
    sta VEC_IN 
    lda #>krn_getkey 
    sta VEC_IN+1 

Also, outvec dummy ist just an empty subroutine which we set the output vector VEC_OUT to, in order to suppress output while loading. Otherwise, the input would be echoed by the interpreter loop, resulting in the program being listed during load.


Now we’re ready to feed almost any BASIC source to EhBasic, which will make porting existing BASIC software pretty easy.

ASCII based LOAD in action

The next step will be to save BASIC programs in ASCII format by setting VEC_OUT accordingly and triggering a LIST command.

Posted in basic, ehbasic, FAT, FAT32 | Tagged , , , , | Comments Off on Loading ASCII sources in EhBasic

Fixing PS/2 Keyboard handling (Part I)

The way the PS/2 keyboard is handled has always been something we were never quite happy with. The key points being:

  • The PS/2 controller had no way of signalling that there has been a new keystroke, the buffer had to be polled via SPI.
  • The PS/2 controller had no way of talking to the keyboard and had to rely for the keyboard to initialize itself properly. Also, typematic rate and delay could not be set, as couldn’t the states of the keyboard LEDs.

Although mid- to long term, we likely might “upgrade” to USB anyway, but not without having done PS/2 right first. So, I will talk about integrating IRQ handling, and in a follow up post Marko will talk about how he got the PS/2 controller talking to the keyboard.

Luckily, during the design of the IO-board, we have been clever enough to hook IO-pins PC0 to PC2 to RESET_TRIG, NMI and IRQ, respectively. So on the hardware-side, we are very much ready.
First problem to solve is how to emulate an open collector output on the AVR controller. As it seems, a common way to do that is to disable the internal pullup of the pin, and have it configured as input to be “tri state”. When active, the pin gets activated as an output, and will pull the IRQ line low.

// pull IRQ line
DDRC |= (1 << IRQ);

// release IRQ line
DDRC &= ~(1 << IRQ);

Now that we know how to handle the IRQ-line, we need to figure out, WHEN to pull it. Obviously when a key was hit. And when to release it?

Finally, we decided to go the most simple way. The PS/2 controller will pull the IRQ line as long as there are more than 0 chars in the buffer. Once the buffer is empty, the IRQ-line will be released. This way, we do not need an interrupt register and hence no time consuming check of the latter, but need to do a little buffering on the steckOS-side.

This is all the code that’s needed on the PS/2 controller side:

    if (kb_buffcnt > 0)
        DDRC |= (1 << IRQ);     // pull IRQ line
        DDRC &= ~(1 << IRQ); // release IRQ line

Now, we need to add a little handling code to the steckOS IRQ-handler. Since we do not have an interrupt register, we just check the keyboard last, after every “known” interrupt source has been handled.
To get around having to implement another keyboard buffer, we just use a single memory location, labelled “key”. The IRQ handler will only fetch a byte from the keyboard when the target location is zero (0), otherwise it will just exit.
The system getkey-routine will load the contents from that location into the A register, and overwrite the location with 0 again to enable fetching the next char from the buffer.

The SPI check code is the last bit in the IRQ-handler routine:

lda key
bne @exit
jsr fetchkey
bcc @exit
sta key


That’s basically all that’s needed. The former getkey-routine has been renamed to fetchkey, and the new getkey routine only handles the ZP buffer location while retaining the old behaviour including setting the carry flag when a byte has been received. This way, existing programs using the keyboard do not have to be modified.

Now, we finally have a chance of reacting to keystrokes during program execution without having to explicitly poll the keyboard. This enables us to handle Ctrl-C and such much more elegantly. Also, any REPL-like program (like the shell) does not have to constantly poll the SPI bus.

Posted in Allgemein | Comments Off on Fixing PS/2 Keyboard handling (Part I)

Connecting SNES Controller to the Steckschwein

Recently, Michael Steil published a blog post about connecting NES and SNES Controller to a 6502-based system
showing how to use NES and SNES controllers on a C64 without the need for any special hardware, by just connecting them to the C64’s user port.

Why not use his approach and adapt it to the Steckschwein? The Steckschwein has a User Port, too, albeit a very different one as the C64. Basically, the Steckschwein-User-Port consists of the complete Port A of the VIA, plus the /RESET and /IRQ lines. Also of course, VCC and GND.

User Port:
      | |-------PA6  
      | | |-----PA4
      | | | |---PA2 (DATA1)
      | | | | |-PA0 (CLK)
o o X o o o o o
o o X o o o o o
      | | | | |-PA1 (LATCH)
      | | | |---PA3 (DATA2)
      | | |-----PA5
      | | |-----PA7

SNES Controller:
| 7  6  5 | 4  3  2  1 |

Pin Description
1   +5V
2  CLK
5  –
6  –
7  GND


Simple adapter to connect one SNES controller

As for the code, we use Michael’s code with only a few modifications respective to the different pinout, and with a handful of optimizations. Having a 65c02 instead of the 6510 in the C64 gives us the STZ instruction, also using PA0 as clock pin takes just an INC instruction followed by STZ to pulse the clock line.

nes_data = via1porta
nes_ddr = via1ddra
; zero page
controller1 = $00 ; 3 bytes
controller2 = $03 ; 3 bytes

bit_clk   = %00000001 ; PA0 : CLK (both controllers)
bit_latch = %00000010 ; PA1 : LATCH (both controllers)
bit_data1 = %00000100 ; PA2 : DATA (controller #1)
bit_data2 = %00001000 ; PA3 : DATA (controller #2)

    lda #$ff-bit_data1-bit_data2
    sta nes_ddr
    lda #$00
    sta nes_data

    ; pulse latch
    lda #bit_latch
    sta nes_data
    ;lda #0
    ;sta nes_data
    stz nes_data

    ; read 3x 8 bits
    ldx #0
l2: ldy #8
l1: lda nes_data
    cmp #bit_data2
    rol controller2,x
    and #bit_data1
    cmp #bit_data1
    rol controller1,x
    ;lda #bit_clk
    ;sta nes_data
    inc nes_data
    ;lda #0
    ;sta nes_data
    stz nes_data

    bne l1
    cpx #3
    bne l2

Small test program to output a different character for each button:


Also, instead of the original Nintendo SNES controller, I use an 8bitdo SN30 Bluetooth controller with the SNES receiver. One could say this is the first time a Bluetooth device has been connected to the Steckschwein.


Bluetooth SNES receiver from 8bitdo

Up next: Patching our games!

Posted in Allgemein, experiment, joystick | Comments Off on Connecting SNES Controller to the Steckschwein

Chuck Peddle, 1937 – 2019

Chuck Peddle, the main designer of the 6502, has passed away on Dec. 15th, 2019.

Peddle was one of the engineers that developed the 6800 at Motorola. He later went to MOS in order to implement his vision of an 8bit CPU for way less than $300, which was Motorola’s price for the 6800.

This idea of a cheap but powerful CPU materialized as the 6501, and finally the 6502.
That very chip, which started the microcomputer revolution, and on which both Marko and myself began to write our first code ever at an early age. BASIC at first, followed by assembly language later.

Learning to code assembly on this small and elegant CPU provided the both of us with profound knowledge and experience about the inner workings of a computer. Knowledge which is still valuable in our respective careers in IT, and also of course when working on our pet project, the Steckschwein. Things would have gone quite different without Chuck Peddle’s elegant little CPU.

Thanks, Chuck!



Posted in 6502, 65c02, assembly, basic, urschleim, wdc | Comments Off on Chuck Peddle, 1937 – 2019

Steckschwein emulator

Back from the VCFB (Vintage Computer Festival Berlin) 2019 where we had good talks, met interesting people and got new ideas. Especially from Michael Steil who just asked the simple question “How you can develop software for the Steckschwein without an emulator?” Continue reading

Posted in Allgemein | Comments Off on Steckschwein emulator

Markos Pacman Talk at VCFb

Marko talked about his Pacman port to the Steckschwein at VCFb. Basically it’s the same talk he did at VCFe in April, but this time, there’s a video. Enjoy!

Posted in Allgemein | Comments Off on Markos Pacman Talk at VCFb

Weird bug in SD card code

Frank van den Hoef, who is adapting the Steckschwein SPI & FAT32 code for his tiny65 machine made me aware of a classic mistake for a 6502 assembly coder to make. Namely in our sdcard driver, when waiting for the “proper” response from the card (which should have bit 7 cleared). The routine handling this looked like this:

1  sd_cmd_response_wait:
2 	ldy #sd_cmd_response_retries
3 @l:	dey
4         beq sd_block_cmd_timeout ; y already 0? then invalid response or timeout
5         jsr spi_r_byte
6         bit #80	; bit 7 clear
7         bne @l  ; no, next byte
8         cmp #$00 ; got cmd response, check if $00 to set z flag accordingly
9         rts
10 sd_block_cmd_timeout:
11        debug "sd_block_cmd_timeout"
12        lda #$1f ; make up error code distinct from possible sd card responses to mark timeout
13        rts

Classic. Obviously, line 6 should read:

          bit #$80 ; bit 7 clear

With that fixed, the sd card init routine now fails, which is odd since we fixed something that was obviously broken.


Ok, now what? Enabling Marko’s mighty debugging macros, it becomes apparent that the sd card init fails right after sending CMD0 to the card. This command is the first command of the init sequence and is supposed to put the card into “idle mode”. Which the card confirms with an answer of $01. Which is what the init code is expecting, and not getting. Instead, we get $3F, which does not make a lot of sense.

But why did it work before the fix?
Assuming that the card did not change it’s behaviour at the same time I fixed the code, let’s check what actually happened. Before the fix, we were ANDing $3F with 80:

00111111 $3f
01010000 80 (no $, decimal)

In this case, the BNE after the BIT #80 would take the branch to @l, causing the next byte being read, until finally the card responds with $01:

00000001 $01
01010000 80 (no $, decimal)

Now the BNE does not take the branch, and the routine exits.

Now, with the fixed code,  ANDing $3F with $80, to check if bit 7 is clear, which it is:

00111111 $3F
10000000 $80

Alright, exit the loop and return $3f as response of the card. Which isn’t $01, so init failed.

At this point, I have no explanation for the card responding $3F. I assume that the card might be not ready to process commands at this point, so I added code to repeat sending CMD0 until we get $01 or we run out of retries.



Posted in assembly, code, debugging, murphy, SD-Karte, SPI | Comments Off on Weird bug in SD card code

Forth Benchmarks

The main motivation to get Forth up and running on the Steckschwein was to participate at The Ultimate Benchmark, in order to crush all 8bit competition to dust.

So the plan was to benchmark the Steckschwein live at the VCFe. Unfortunately, Carsten could not be there, so no Forth benchmark competition this year.
Recently, Carsten presented his benchmark results using TaliForth2, which led us to run the same benchmarks he did and send the results to Carsten, who was kind enough to include them on his site:

Posted in Allgemein | Comments Off on Forth Benchmarks