CHIP-8 in Common Lisp: Disassembly
Posted on January 2nd, 2017.
In the previous posts we looked at how to emulate a CHIP-8 CPU with Common Lisp. After adding a screen, input, and sound the core of the emulator is essentially complete.
I've been guiding you through the code step by step and it might look simple, but that's only because I went down all the dead ends myself first. In practice, when you're writing an emulator for a system you'll need a way to debug the execution of code. The first step is to be able to read the code, so let's look at how to add a disassembler to our simple CHIP-8 emulator.
The full series of posts so far:
- CHIP-8 in Common Lisp: The CPU
- CHIP-8 in Common Lisp: Graphics
- CHIP-8 in Common Lisp: Input
- CHIP-8 in Common Lisp: Sound
- CHIP-8 in Common Lisp: Disassembly
- CHIP-8 in Common Lisp: Debugging Infrastructure
- CHIP-8 in Common Lisp: Menus
The full emulator source is on BitBucket and GitHub.
Disassembling Single Instructions
The first thing we'll need is a way to take a single instruction like #x8055
and turn it into something we can read. The easiest way to do this seemed to be
to copy the dispatch loop from the CPU emulator and turn it into a disassembly
function:
(defun disassemble-instruction (instruction)
(flet ((v (n) (symb 'v (format nil "~X" n))))
(let ((_x__ (ldb (byte 4 8) instruction))
(__x_ (ldb (byte 4 4) instruction))
(___x (ldb (byte 4 0) instruction))
(__xx (ldb (byte 8 0) instruction))
(_xxx (ldb (byte 12 0) instruction)))
(case (logand #xF000 instruction)
(#x0000 (case instruction
(#x00E0 '(cls))
(#x00EE '(ret))))
(#x1000 `(jp ,_xxx))
(#x2000 `(call ,_xxx))
(#x3000 `(se ,(v _x__) ,__xx))
(#x4000 `(sne ,(v _x__) ,__xx))
(#x5000 (case (logand #x000F instruction)
(#x0 `(se ,(v _x__) ,(v __x_)))))
(#x6000 `(ld ,(v _x__) ,__xx))
(#x7000 `(add ,(v _x__) ,__xx))
(#x8000 (case (logand #x000F instruction)
(#x0 `(ld ,(v _x__) ,(v __x_)))
(#x1 `(or ,(v _x__) ,(v __x_)))
(#x2 `(and ,(v _x__) ,(v __x_)))
(#x3 `(xor ,(v _x__) ,(v __x_)))
(#x4 `(add ,(v _x__) ,(v __x_)))
(#x5 `(sub ,(v _x__) ,(v __x_)))
(#x6 `(shr ,(v _x__) ,(v __x_)))
(#x7 `(subn ,(v _x__) ,(v __x_)))
(#xE `(shl ,(v _x__) ,(v __x_)))))
(#x9000 (case (logand #x000F instruction)
(#x0 `(sne ,(v _x__) ,(v __x_)))))
(#xA000 `(ld i ,_xxx))
(#xB000 `(jp ,(v 0) ,_xxx))
(#xC000 `(rnd ,(v _x__) ,__xx))
(#xD000 `(drw ,(v _x__) ,(v __x_) ,___x))
(#xE000 (case (logand #x00FF instruction)
(#x9E `(skp ,(v _x__)))
(#xA1 `(sknp ,(v _x__)))))
(#xF000 (case (logand #x00FF instruction)
(#x07 `(ld ,(v _x__) dt))
(#x0A `(ld ,(v _x__) k))
(#x15 `(ld dt ,(v _x__)))
(#x18 `(ld st ,(v _x__)))
(#x1E `(add i ,(v _x__)))
(#x29 `(ld f ,(v _x__)))
(#x33 `(ld b ,(v _x__)))
(#x55 `(ld (mem i) ,_x__))
(#x65 `(ld ,_x__ (mem i)))))))))
There are a lot of other ways we could have done this, like making a proper
parser or adding functionality to define-opcode
, but since there's not that
many instructions I think this is reasonable. Now we can pass in a raw,
two-byte instruction and get out something readable:
[SBCL] CHIP8> (disassemble-instruction #x8055)
(SUB V0 V5)
[SBCL] CHIP8> (disassemble-instruction #x4077)
(SNE V0 119)
Disassembling Entire ROMs
Disassembling a single instruction will be useful, but it would also be nice to disassemble an entire ROM at once to see what its code looks like. Let's make a little helper function to handle that:
(defun dump-disassembly (array &optional (start 0) (end (length array)))
(iterate
(for i :from start :below end :by 2)
(print-disassembled-instruction array i)
(sleep 0.001)))
The sleep
is there because Neovim's terminal seems to shit the bed if you dump
too much text at it at once. Computers are garbage.
Other that than, dump-disassembly
is pretty straightforward: just iterate
through the array of instructions two bytes at a time and print the information.
Let's look at the printing function now:
(defun print-disassembled-instruction (array index)
(destructuring-bind (address instruction disassembly)
(instruction-information array index)
(let ((*print-base* 16))
(format t "~3,'0X: ~4,'0X ~24A~%"
address
instruction
(or disassembly "")))))
Once again we'll delegate to a helper function.
print-disassembled-instruction
just handles the string formatting to dump an
instruction to the screen. Running it for a single instruction would print
something like:
200: 8055 (SUB V0 V5) ^ ^ ^ | | | | | Disassembly | | | Raw instruction | Address
The helper function instruction-information
is simple, but we'll be using it
in the future for something else, so it's nice to have:
(defun instruction-information (array index)
(let ((instruction (retrieve-instruction array index)))
(list index
instruction
(disassemble-instruction instruction))))
It just takes an address and memory array and returns a list of:
- The address
- The raw instruction at the address
- The disassembly for that instruction
retrieve-instruction
is simple (for now):
(defun retrieve-instruction (array index)
(cat-bytes (aref array index)
(aref array (1+ index))))
These functions could be combined into a single bigger function, but I'm a strong believer in having each function do exactly one thing. And as we'll see, each of these "simple" tasks is going to get more complicated later.
Now we can dump the disassembly for a ROM to see how it works:
(run "roms/ufo.rom") ; stores the current chip struct in *c*
(dump-disassembly (chip-memory *c*) #x200 #x220)
200: A2CD (LD I 2CD)
202: 6938 (LD V9 38)
204: 6A08 (LD VA 8)
206: D9A3 (DRW V9 VA 3)
208: A2D0 (LD I 2D0)
20A: 6B00 (LD VB 0)
20C: 6C03 (LD VC 3)
20E: DBC3 (DRW VB VC 3)
210: A2D6 (LD I 2D6)
212: 641D (LD V4 1D)
214: 651F (LD V5 1F)
216: D451 (DRW V4 V5 1)
218: 6700 (LD V7 0)
21A: 680F (LD V8 F)
21C: 22A2 (CALL 2A2)
21E: 22AC (CALL 2AC)
Sprites
Take a look at print-disassembled-instruction
again:
(defun print-disassembled-instruction (array index)
(destructuring-bind (address instruction disassembly)
(instruction-information array index)
(let ((*print-base* 16))
(format t "~3,'0X: ~4,'0X ~24A~%"
address
instruction
(or disassembly "")))))
Notice how we used (or disassembly "")
instead of just passing in the
disassembly. If you look back at disassemble-instruction
you'll see it uses
normal case
statements, not ecase
, so if the instruction doesn't match any
valid opcodes it will return nil
.
The CHIP-8 (like most computers) uses the same memory to hold both program code (instructions) and data. Data includes things like player health, score, location, and most importantly: the sprites that will be drawn on the screen.
Unfortunately there's no way to know for sure whether a given memory location contains an instruction (and thus needs to be disassembled) or is intended to be a piece of data. Indeed, someone like Mel could conceivably figure out a way to make a particular sequence of instructions perform double duty as a sprite! So our disassembler will just always show the disassembly for anything that might be an instruction.
But with that said, we can probably make some educated guesses. If we had some way to visualize what a hunk of memory would look like if it were rendered as a sprite, we could probably figure out where most of the program's sprites are kept. It's unlikely that any given sequence of instructions would just happen to look like a ghost from Pac Man or something.
We could add a separate function to draw out the sprite data, but the CHIP-8's
sprites are so simple that we can just tack it on to the disassembly output.
Remember that each byte of memory defines one eight-pixel-wide
row of a sprite, and that DRW X, Y, Size
will draw Size
rows of a sprite
using contiguous bytes in memory. So if memory contains something like this (at
the location specified by the index register):
Address Data
#x300 #b11110000
#x301 #b00010000
#x302 #b11110000
#x303 #b00010000
#x304 #b11110000
A DRW X, Y, 5
instruction would draw a 3
sprite to the screen:
████ █ ████ █ ████
It would be trivial to simply render the bits of any given instruction as spaces
and some other ASCII character and tack it onto the end of the disassembly, but
there's a snag: instructions are two bytes each, but each row in a sprite is
one byte long. Our sprites would get pretty mangled if we printed two of
their rows per line of disassembly — for example, 4
would look like this:
byte 1 byte 2 1111111122222222 064: 9090 (SNE V0 V9) █ █ █ █ 066: F010 ████ █ 068: 10F0 (JP F0) █ ████
Not ideal. One option would be to make every instruction of disassembly two
lines long, but that's painful to read when trying to look at the code portions
of the ROM. We can get around this with a delightful little hack: using
characters from Unicode Block Elements to cram two rows of sprite data
into a single line of output. Let's start by defining a bit-diagram
function
that will take a two-byte-wide integer and return an ASCII diagram of its bits:
(defun bit-diagram (integer)
(iterate (for high-bit :from 15 :downto 8)
(for low-bit :from 7 :downto 0)
(for hi = (logbitp high-bit integer))
(for lo = (logbitp low-bit integer))
(collect (cond ((and hi lo) #\full_block)
(hi #\upper_half_block)
(lo #\lower_half_block)
(t #\space))
:result-type 'string)))
; Example rows of sprite data:
; 11110000
; 11001100
(bit-diagram #b1111000011001100)
"██▀▀▄▄ "
Now that we've got this we can easily add it into our disassembly functions:
(defun instruction-information (array index)
(let ((instruction (retrieve-instruction array index)))
(list index
instruction
(disassemble-instruction instruction)
(bit-diagram instruction)))) ; NEW
(defun print-disassembled-instruction (array index)
(destructuring-bind (address instruction disassembly bits)
(instruction-information array index)
(let ((*print-base* 16))
; NEW
(format t "~3,'0X: ~4,'0X ~24A ~8A~%"
address
instruction
(or disassembly "")
bits)))) ; NEW
Now when we dump the disassembly of a ROM we'll also see what each instruction would look like if drawn as a sprite. For program code this will tend to look like garbage (unless some crazy person has managed to write code that also works as sprites):
200: A2CD (LD I 2CD) █▄▀ ▄▄▀▄ 202: 6938 (LD V9 38) ▀█▄█ ▀ 204: 6A08 (LD VA 8) ▀▀ █ ▀ 206: D9A3 (DRW V9 VA 3) █▀▄▀▀ ▄█ 208: A2D0 (LD I 2D0) █▄▀▄ ▀ 20A: 6B00 (LD VB 0) ▀▀ ▀ ▀▀ 20C: 6C03 (LD VC 3) ▀▀ ▀▀▄▄ 20E: DBC3 (DRW VB VC 3) ██ ▀▀ ██
But when we look at areas of the ROM that do contain sprites, they look pretty recognizable:
050: F090 █▀▀█ 052: 9090 (SNE V0 V9) █ █ 054: F020 ▀▀█▀ 056: 6020 (LD V0 20) ▀█ 058: 2070 (CALL 70) ▄█▄ 05A: F010 ▀▀▀█ 05C: F080 █▀▀▀ 05E: F0F0 ████ 060: 10F0 (JP F0) ▄▄▄█ 062: 10F0 (JP F0) ▄▄▄█ 064: 9090 (SNE V0 V9) █ █ 066: F010 ▀▀▀█ 068: 10F0 (JP F0) ▄▄▄█ 06A: 80F0 (LD V0 VF) █▄▄▄ 06C: 10F0 (JP F0) ▄▄▄█ 06E: F080 █▀▀▀
Human eyes are pretty good at picking out patterns, so when you're scrolling through a disassembled ROM it's pretty easy to tell which sections are sprites and which are data, even if it's not perfectly rendered.
Result
We've now got a way to dump the disassembly of a ROM to see what its code and data look like.
We can also inspect the rest of our emulator's state at runtime with NREPL or
SLIME by running things like (chip-program-counter *c*)
.
Future
Manually querying for information and dumping the disassembly isn't very ergonomic, so in the future we'll look at adding:
- Debugging infrastructure like pausing and breakpoints
- A graphical debugger/disassembly viewer
As well as a few other niceties like menus for loading ROMs, etc.