CHIP-8 in Common Lisp: Disassembly

Posted on January 2nd, 2017.

In the previous posts we looked at how to emulate a CHIP-8 CPU with Common Lisp. After adding a screen, input, and sound the core of the emulator is essentially complete.

I've been guiding you through the code step by step and it might look simple, but that's only because I went down all the dead ends myself first. In practice, when you're writing an emulator for a system you'll need a way to debug the execution of code. The first step is to be able to read the code, so let's look at how to add a disassembler to our simple CHIP-8 emulator.

The full series of posts so far:

The full emulator source is on BitBucket and GitHub.

Disassembling Single Instructions
Disassembling Entire ROMs
Sprites
Result
Future

Disassembling Single Instructions

The first thing we'll need is a way to take a single instruction like #x8055 and turn it into something we can read. The easiest way to do this seemed to be to copy the dispatch loop from the CPU emulator and turn it into a disassembly function:

(defun disassemble-instruction (instruction)
  (flet ((v (n) (symb 'v (format nil "~X" n))))
    (let ((_x__ (ldb (byte 4 8) instruction))
          (__x_ (ldb (byte 4 4) instruction))
          (___x (ldb (byte 4 0) instruction))
          (__xx (ldb (byte 8 0) instruction))
          (_xxx (ldb (byte 12 0) instruction)))
      (case (logand #xF000 instruction)
        (#x0000 (case instruction
                  (#x00E0 '(cls))
                  (#x00EE '(ret))))
        (#x1000 `(jp ,_xxx))
        (#x2000 `(call ,_xxx))
        (#x3000 `(se ,(v _x__) ,__xx))
        (#x4000 `(sne ,(v _x__) ,__xx))
        (#x5000 (case (logand #x000F instruction)
                  (#x0 `(se ,(v _x__) ,(v __x_)))))
        (#x6000 `(ld ,(v _x__) ,__xx))
        (#x7000 `(add ,(v _x__) ,__xx))
        (#x8000 (case (logand #x000F instruction)
                  (#x0 `(ld ,(v _x__) ,(v __x_)))
                  (#x1 `(or ,(v _x__) ,(v __x_)))
                  (#x2 `(and ,(v _x__) ,(v __x_)))
                  (#x3 `(xor ,(v _x__) ,(v __x_)))
                  (#x4 `(add ,(v _x__) ,(v __x_)))
                  (#x5 `(sub ,(v _x__) ,(v __x_)))
                  (#x6 `(shr ,(v _x__) ,(v __x_)))
                  (#x7 `(subn ,(v _x__) ,(v __x_)))
                  (#xE `(shl ,(v _x__) ,(v __x_)))))
        (#x9000 (case (logand #x000F instruction)
                  (#x0 `(sne ,(v _x__) ,(v __x_)))))
        (#xA000 `(ld i ,_xxx))
        (#xB000 `(jp ,(v 0) ,_xxx))
        (#xC000 `(rnd ,(v _x__) ,__xx))
        (#xD000 `(drw ,(v _x__) ,(v __x_) ,___x))
        (#xE000 (case (logand #x00FF instruction)
                  (#x9E `(skp ,(v _x__)))
                  (#xA1 `(sknp ,(v _x__)))))
        (#xF000 (case (logand #x00FF instruction)
                  (#x07 `(ld ,(v _x__) dt))
                  (#x0A `(ld ,(v _x__) k))
                  (#x15 `(ld dt ,(v _x__)))
                  (#x18 `(ld st ,(v _x__)))
                  (#x1E `(add i ,(v _x__)))
                  (#x29 `(ld f ,(v _x__)))
                  (#x33 `(ld b ,(v _x__)))
                  (#x55 `(ld (mem i) ,_x__))
                  (#x65 `(ld ,_x__ (mem i)))))))))

There are a lot of other ways we could have done this, like making a proper parser or adding functionality to define-opcode, but since there's not that many instructions I think this is reasonable. Now we can pass in a raw, two-byte instruction and get out something readable:

[SBCL] CHIP8> (disassemble-instruction #x8055)
(SUB V0 V5)

[SBCL] CHIP8> (disassemble-instruction #x4077)
(SNE V0 119)

Disassembling Entire ROMs

Disassembling a single instruction will be useful, but it would also be nice to disassemble an entire ROM at once to see what its code looks like. Let's make a little helper function to handle that:

(defun dump-disassembly (array &optional (start 0) (end (length array)))
  (iterate
    (for i :from start :below end :by 2)
    (print-disassembled-instruction array i)
    (sleep 0.001)))

The sleep is there because Neovim's terminal seems to shit the bed if you dump too much text at it at once. Computers are garbage.

Other that than, dump-disassembly is pretty straightforward: just iterate through the array of instructions two bytes at a time and print the information. Let's look at the printing function now:

(defun print-disassembled-instruction (array index)
  (destructuring-bind (address instruction disassembly)
      (instruction-information array index)
    (let ((*print-base* 16))
      (format t "~3,'0X: ~4,'0X ~24A~%"
              address
              instruction
              (or disassembly "")))))

Once again we'll delegate to a helper function. print-disassembled-instruction just handles the string formatting to dump an instruction to the screen. Running it for a single instruction would print something like:

    200: 8055 (SUB V0 V5)
     ^    ^   ^
     |    |   |
     |    |  Disassembly
     |    |
     |   Raw instruction
     |
    Address

The helper function instruction-information is simple, but we'll be using it in the future for something else, so it's nice to have:

(defun instruction-information (array index)
  (let ((instruction (retrieve-instruction array index)))
    (list index
          instruction
          (disassemble-instruction instruction))))

It just takes an address and memory array and returns a list of:

The address
The raw instruction at the address
The disassembly for that instruction

retrieve-instruction is simple (for now):

(defun retrieve-instruction (array index)
  (cat-bytes (aref array index)
             (aref array (1+ index))))

These functions could be combined into a single bigger function, but I'm a strong believer in having each function do exactly one thing. And as we'll see, each of these "simple" tasks is going to get more complicated later.

Now we can dump the disassembly for a ROM to see how it works:

(run "roms/ufo.rom") ; stores the current chip struct in *c*

(dump-disassembly (chip-memory *c*) #x200 #x220)
200: A2CD (LD I 2CD)
202: 6938 (LD V9 38)
204: 6A08 (LD VA 8)
206: D9A3 (DRW V9 VA 3)
208: A2D0 (LD I 2D0)
20A: 6B00 (LD VB 0)
20C: 6C03 (LD VC 3)
20E: DBC3 (DRW VB VC 3)
210: A2D6 (LD I 2D6)
212: 641D (LD V4 1D)
214: 651F (LD V5 1F)
216: D451 (DRW V4 V5 1)
218: 6700 (LD V7 0)
21A: 680F (LD V8 F)
21C: 22A2 (CALL 2A2)
21E: 22AC (CALL 2AC)

Sprites

Take a look at print-disassembled-instruction again:

(defun print-disassembled-instruction (array index)
  (destructuring-bind (address instruction disassembly)
      (instruction-information array index)
    (let ((*print-base* 16))
      (format t "~3,'0X: ~4,'0X ~24A~%"
              address
              instruction
              (or disassembly "")))))

Notice how we used (or disassembly "") instead of just passing in the disassembly. If you look back at disassemble-instruction you'll see it uses normal case statements, not ecase, so if the instruction doesn't match any valid opcodes it will return nil.

The CHIP-8 (like most computers) uses the same memory to hold both program code (instructions) and data. Data includes things like player health, score, location, and most importantly: the sprites that will be drawn on the screen.

Unfortunately there's no way to know for sure whether a given memory location contains an instruction (and thus needs to be disassembled) or is intended to be a piece of data. Indeed, someone like Mel could conceivably figure out a way to make a particular sequence of instructions perform double duty as a sprite! So our disassembler will just always show the disassembly for anything that might be an instruction.

But with that said, we can probably make some educated guesses. If we had some way to visualize what a hunk of memory would look like if it were rendered as a sprite, we could probably figure out where most of the program's sprites are kept. It's unlikely that any given sequence of instructions would just happen to look like a ghost from Pac Man or something.

We could add a separate function to draw out the sprite data, but the CHIP-8's sprites are so simple that we can just tack it on to the disassembly output. Remember that each byte of memory defines one eight-pixel-wide row of a sprite, and that DRW X, Y, Size will draw Size rows of a sprite using contiguous bytes in memory. So if memory contains something like this (at the location specified by the index register):

Address Data
#x300   #b11110000
#x301   #b00010000
#x302   #b11110000
#x303   #b00010000
#x304   #b11110000

A DRW X, Y, 5 instruction would draw a 3 sprite to the screen:

    ████
       █
    ████
       █
    ████

It would be trivial to simply render the bits of any given instruction as spaces and some other ASCII character and tack it onto the end of the disassembly, but there's a snag: instructions are two bytes each, but each row in a sprite is one byte long. Our sprites would get pretty mangled if we printed two of their rows per line of disassembly — for example, 4 would look like this:

                                       byte 1  byte 2
                                       1111111122222222
    064: 9090 (SNE V0 V9)              █  █    █  █
    066: F010                          ████    █
    068: 10F0 (JP F0)                     █    ████

Not ideal. One option would be to make every instruction of disassembly two lines long, but that's painful to read when trying to look at the code portions of the ROM. We can get around this with a delightful little hack: using characters from Unicode Block Elements to cram two rows of sprite data into a single line of output. Let's start by defining a bit-diagram function that will take a two-byte-wide integer and return an ASCII diagram of its bits:

(defun bit-diagram (integer)
  (iterate (for high-bit :from 15 :downto 8)
           (for low-bit :from 7 :downto 0)
           (for hi = (logbitp high-bit integer))
           (for lo = (logbitp low-bit integer))
           (collect (cond ((and hi lo) #\full_block)
                          (hi          #\upper_half_block)
                          (lo          #\lower_half_block)
                          (t           #\space))
                    :result-type 'string)))

; Example rows of sprite data:
; 11110000
; 11001100

(bit-diagram #b1111000011001100)
"██▀▀▄▄  "

Now that we've got this we can easily add it into our disassembly functions:

(defun instruction-information (array index)
  (let ((instruction (retrieve-instruction array index)))
    (list index
          instruction
          (disassemble-instruction instruction)
          (bit-diagram instruction))))           ; NEW

(defun print-disassembled-instruction (array index)
  (destructuring-bind (address instruction disassembly bits)
      (instruction-information array index)
    (let ((*print-base* 16))
                                     ; NEW
      (format t "~3,'0X: ~4,'0X ~24A ~8A~%"
              address
              instruction
              (or disassembly "")
              bits))))              ; NEW

Now when we dump the disassembly of a ROM we'll also see what each instruction would look like if drawn as a sprite. For program code this will tend to look like garbage (unless some crazy person has managed to write code that also works as sprites):

    200: A2CD (LD I 2CD)               █▄▀ ▄▄▀▄
    202: 6938 (LD V9 38)                ▀█▄█  ▀
    204: 6A08 (LD VA 8)                 ▀▀ █ ▀
    206: D9A3 (DRW V9 VA 3)            █▀▄▀▀ ▄█
    208: A2D0 (LD I 2D0)               █▄▀▄  ▀
    20A: 6B00 (LD VB 0)                 ▀▀ ▀ ▀▀
    20C: 6C03 (LD VC 3)                 ▀▀ ▀▀▄▄
    20E: DBC3 (DRW VB VC 3)            ██ ▀▀ ██

But when we look at areas of the ROM that do contain sprites, they look pretty recognizable:

    050: F090                          █▀▀█
    052: 9090 (SNE V0 V9)              █  █
    054: F020                          ▀▀█▀
    056: 6020 (LD V0 20)                ▀█
    058: 2070 (CALL 70)                 ▄█▄
    05A: F010                          ▀▀▀█
    05C: F080                          █▀▀▀
    05E: F0F0                          ████
    060: 10F0 (JP F0)                  ▄▄▄█
    062: 10F0 (JP F0)                  ▄▄▄█
    064: 9090 (SNE V0 V9)              █  █
    066: F010                          ▀▀▀█
    068: 10F0 (JP F0)                  ▄▄▄█
    06A: 80F0 (LD V0 VF)               █▄▄▄
    06C: 10F0 (JP F0)                  ▄▄▄█
    06E: F080                          █▀▀▀

Human eyes are pretty good at picking out patterns, so when you're scrolling through a disassembled ROM it's pretty easy to tell which sections are sprites and which are data, even if it's not perfectly rendered.

Result

We've now got a way to dump the disassembly of a ROM to see what its code and data look like.

We can also inspect the rest of our emulator's state at runtime with NREPL or SLIME by running things like (chip-program-counter *c*).

Future

Manually querying for information and dumping the disassembly isn't very ergonomic, so in the future we'll look at adding:

Debugging infrastructure like pausing and breakpoints
A graphical debugger/disassembly viewer

As well as a few other niceties like menus for loading ROMs, etc.