CHIP-8 in Common Lisp: The CPU
Posted on December 19th, 2016.
A while back I decided to try to write a Game Boy emulator in Common Lisp based on this series of articles. I made some good progress but eventually got bogged down because I was trying to learn a bunch of complex new things at once:
- How to write an emulator
- How to use Qt with Common Lisp
- How the Game Boy works internally
Instead of dragging on, I decided to take a break and try something simpler: a CHIP-8 emulator/interpreter. The CHIP-8 is much simpler than the Game Boy, which made it easier to experiment with the rest of the infrastructure.
In this post and a couple of future ones I'll walk through all of my CHIP-8
emulator implementation. To give you a rough idea of the size of the project,
cloc
reports:
- Basic emulator: 415 lines
- Debugging/disassembling infrastructure: 142 lines
- Screen GUI: 153 lines
- Graphical debugger: 295 lines
This first post will deal with emulating the CHIP-8's CPU.
The full series of posts so far:
- CHIP-8 in Common Lisp: The CPU
- CHIP-8 in Common Lisp: Graphics
- CHIP-8 in Common Lisp: Input
- CHIP-8 in Common Lisp: Sound
- CHIP-8 in Common Lisp: Disassembly
- CHIP-8 in Common Lisp: Debugging Infrastructure
- CHIP-8 in Common Lisp: Menus
The full emulator source is on BitBucket and GitHub.
Libraries
The emulator uses a few Common Lisp libraries to make things easier:
- bordeaux-threads to handle threading.
- cl-arrows for an implementation of Clojure's
-<>
threading macro. - cl-losh is my own personal utility library.
- cl-portaudio for audio.
- iterate for a much nicer version of
loop
callediterate
. My utility library contains severaliterate
drivers (some of which I've written about before). - qtools to handle creating a GUI with Qt.
- quickutil for some miscellaneous utility functions from Alexandria and elsewhere.
I'll try to remember to mention whenever I use a function that's not built-in to Common Lisp, but I might forget, in which case it's probably in one of these.
CHIP-8 References
There's a good amount of information available about the CHIP-8 online. The references I used most often were:
- Cowgod's "CHIP-8 Technical Reference"
- Laurence Muller's "How to write an emulator (CHIP-8 interpreter)"
- Matthew Mikolay's "Mastering CHIP-8"
- (Super)CHIP 8 Secrets
The Main Data Structure
Let's dive into the code. We'll start with the main data structure that will hold an instance of a CHIP-8 for emulation:
(defstruct chip
(running t :type boolean)
; ...more to come later
)
We're using a Lisp struct instead of a CLOS class because we're going to be accessing the fields of this thing a lot and CLOS accessors can be comparatively slow. This is one of the very few concessions we'll make to performance.
The first field is running
, which is just a boolean that represents whether
the emulator is currently running.
We'll eventually have multiple threads poking at this struct, and they'll
generally be doing something like (loop :while (chip-running chip) :do ...)
.
That way when we want to quit the emulator we can just set running
to nil
and everything will (eventually) stop.
Registers
The CHIP-8 has sixteen main registers, as well as a few other special ones:
(defstruct chip
; ...
(registers (make-array 16 :element-type 'int8)
:type (simple-array int8 (16))
:read-only t)
(index 0 :type int16)
(program-counter #x200 :type int12)
; ...
)
We'll keep the main registers in an array, and the others will just be separate slots. I've added type declarations in the struct for two reasons:
- With a high
safety
declaration many implementations (including SBCL, the one I'm using) will type check at runtime to make sure we're setting things appropriately. - With low
safety
and highspeed
declarations, SBCL can generate much faster code if it knows the types of the struct slots.
To make the integer types a bit less wordy I've defined some simple synonyms:
(deftype int4 () '(unsigned-byte 4))
(deftype int8 () '(unsigned-byte 8))
(deftype int12 () '(unsigned-byte 12))
(deftype int16 () '(unsigned-byte 16))
So registers
is a 16-element simple array of (unsigned-byte 8)
s. It's
read-only
because we'll be changing the elements of the array, but we should
never be swapping out the entire array itself.
The index register is 16 bits, and the program counter can store up to 12 bits (we'll never need more than that because of the size of the CHIP-8's memory).
Also note that the program counter starts at address #x200
, because that's
where the ROM data eventually gets loaded into the CHIP-8 memory.
The Stack
The CHIP-8 also has an internal stack to store return addresses when calling procedures. We'll just use a Lisp vector with a fill pointer for this:
(defstruct chip
; ...
(stack (make-array 16 :element-type 'int12 :fill-pointer 0)
:type (vector int12 16)
:read-only t)
; ...
)
Memory
The CHIP-8 has 4 kilobytes of main memory:
(defconstant +memory-size+ (* 1024 4))
(defstruct chip
; ...
(memory (make-array +memory-size+ :element-type 'int8)
:type (simple-array int8 (#.+memory-size+))
:read-only t)
; ...
)
Pretty simple. Note the #.
reader macro trick/hack to be able to use
a variable where we normally need a raw type specifier.
Currently-Loaded ROM
Finally we'll add a slot for keeping track of the path to the currently-loaded ROM, for easy resetting later:
(defstruct chip
; ...
(loaded-rom nil :type (or null string))
; ...
)
That's it for now. We'll need a few more slots once we get to things like graphics and sound, but I'll introduce them when we need them.
The Flag Register
Register number 15, or #xF
in hex, is special. It's nicknamed the "flag"
register and gets set specially by certain instructions. We could just access
it like the rest of the registers in our code:
(setf (aref (chip-registers chip) 15) 1) ; set flag to 1
(print (aref (chip-registers chip) #xF)) ; print the flag
But even with the #xF
hex index to get that mnemonic "F" this is a bit too
hard to read. Let's define some extra reader and writer functions to clean
things up:
(defun-inline chip-flag (chip)
(aref (chip-registers chip) #xF))
(defun-inline (setf chip-flag) (new-value chip)
(setf (aref (chip-registers chip) #xF) new-value))
defun-inline
is from my utility library — it just defun
s the
function and declaim
s it inline all in one step. Now things are much nicer:
(setf (chip-flag chip) 1) ; set flag to 1
(print (chip-flag chip)) ; print the flag
Removing Tedium
What we've got so far works, but I want to add one more piece of syntactic sugar before moving on.
We're going to be accessing the slots of the chip
struct a lot, and it's
going to get tedious to write (chip-SLOT chip)
over and over again. For
example, when we're resetting the emulator we'll have to do something like this:
(defun reset (chip)
(fill (chip-memory chip) 0)
(fill (chip-registers chip) 0)
(setf (chip-running chip) t
(chip-program-counter chip) #x200
(fill-pointer (chip-stack chip)) 0))
This is annoying. Languages like Javascript and Python use .
for slot
access, so it ends up being a bit more concise: chip.memory()
instead of
(chip-memory chip)
.
We could use Lisp's with-accessors
to clean up the actual usage a bit:
(defun reset (chip)
(with-accessors ((memory chip-memory)
(registers chip-registers)
(running chip-running)
(program-counter chip-program-counter)
(stack chip-stack))
chip
(fill memory 0)
(fill registers 0)
(setf running t
program-counter #x200
(fill-pointer stack) 0)))
The usage looks much nicer now ((fill memory 0)
is wonderfully readable) but
we haven't actually fixed anything. We've just shifted all the typing a few
lines up. But this is Lisp, we can do better!
Ideally what we'd like is to be able to do say something like (with-chip (chip)
...)
to mean the giant with-accessors
form above:
(defun reset (chip)
(with-chip (chip)
(fill memory 0)
(fill registers 0)
(setf running t
program-counter #x200
(fill-pointer stack) 0)))
This is nice and readable. Some folks will dislike the fact that it introduces new variable bindings that are "hidden" in the macro definition, but I like the concision you get from it, especially for a small project like this.
We could write with-chip
ourselves, if we wanted to. It would look something
like this:
(defmacro with-chip ((chip) &body body)
`(with-accessors ((memory chip-memory)
(registers chip-registers)
; ...
)
,chip
,@body)`
)
It's not hard to write, just tedious. But we're using Lisp, so anything tedious
calls out for abstraction. Another macro from my utility library is
define-with-macro
. This is a macro-defining macro that
we can use to define with-chip
for us:
(define-with-macro chip
running memory stack registers index program-counter flag)
If the hairiness of a macro-defining macro scares you, don't worry about it. We
could have just written it by hand as shown above. I've just needed with-FOO
macros like these often enough that it's been worth it to abstract away the
tedium of writing them.
So one way or another we've got a nice with-chip
macro that will let us access
the fields of our struct with bare names.
Infrastructure
We'll need to define a bit of infrastructure for running things before we jump into implementing the CHIP-8 CPU instructions.
Resetting
Before we can emulate a ROM, we need to read it into our memory array. I've
chosen to do this in the reset
function so we can just (reset my-chip)
to
reload everything at once — it wouldn't make much sense to load a new ROM
without resetting the rest of the emulation state (though the results could
be... "interesting").
reset
looks like this:
(defun reset (chip)
(with-chip (chip)
(fill memory 0)
(fill registers 0)
(replace memory (read-file-into-byte-vector loaded-rom)
:start1 #x200)
(setf running t
program-counter #x200
(fill-pointer stack) 0))
(values))
This is pretty self explanatory, except for the actual ROM-loading bit:
(replace memory (read-file-into-byte-vector loaded-rom)
:start1 #x200)
replace
is just the standard Common Lisp replace
function that copies the
contents of one sequence into another. read-file-into-byte-vector
is from
Alexandria, and will just read the file at loaded-rom
into a byte vector.
Then all we need to do is say that we want to start the copying at index #x200
in the destination, because that's where the CHIP-8 ROM data is supposed start
(there's other internal data (like font sprites) before it, which we'll see
later).
One last thing to note is that reset
just returns (values)
, i.e. it returns
nothing at all. This is something I sometimes do for functions I call at the
REPL for side effects, to avoid returning a meaningless result. I think of it
as an application of The Rule of Silence for Lisp.
Loading ROMs
Now that we've got a reset function, it's trivial to define a function to load a new ROM into our emulator:
(defun load-rom (chip filename)
(setf (chip-loaded-rom chip) filename)
(reset chip))
I didn't bother with the with-chip
macro here because we're only accessing
a single field.
The Main Loop(s)
Now let's wrap things up into a nice interface. We'll define a top-level run
function that will be what we call to fire up the emulator:
(defparameter *c* nil)
(defun run (rom-filename)
(let ((chip (make-chip)))
(setf *c* chip)
(load-rom chip rom-filename)
(run-cpu chip)))
This will get more complicated in the future, but for now it's pretty simple.
Make a new chip
object, load the specified ROM, and start emulating.
I've added a *c*
global variable that's bound to the currently-running
emulator so we can poke at it in NREPL or SLIME as it's running.
Now to write run-cpu
:
(defconstant +cycles-per-second+ 500)
(defconstant +cycles-before-sleep+ 10)
(defun run-cpu (chip)
(iterate
(while (chip-running chip))
(emulate-cycle chip)
(for tick :every-nth +cycles-before-sleep+ :do
(sleep (/ +cycles-before-sleep+ +cycles-per-second+)))))
CHIP-8 was designed to run on much weaker hardware than we have these days. Even though we're emulating it, if we just run as fast as possible we'll be running far too fast to be playable.
+cycles-per-second+
is a constant describing our ideal emulation speed.
Anything from 300 to 800 or so is playable, depending on how fast you want games
to run. We'll just use 500 for now as a happy medium.
It would be wasteful to call (sleep)
after every single cycle, so instead
we'll batch them together: run 10 instructions, sleep for a bit, repeat. This
is the job of the (for ... every-nth N)
iterate driver, which is also in my
utility library. Every 10 iterations through the loop it will sleep for the
appropriate amount of time.
The size of the cycle batches is arbitrary — larger batches will result in fewer
(sleep)
calls, but if you go too large you'll start noticing the emulator
getting "jumpy". 10-cycle batches at 500 cycles per second means that the
emulator will sleep for about 1/50 of a second each time, which isn't too
noticeable.
Now we've got the main loop all set up and just need to emulate each individual cycle.
Individual Cycles
CHIP-8 instructions are each two bytes long, and are stored big-endian in memory. So to emulate a single cycle we:
- Read the two bytes starting at the program counter and concatenate them to get the instruction.
- Advance the program counter (to avoid having to do it inside every single instruction).
- Dispatch to the appropriate instruction code.
We'll define emulate-cycle
and a couple of helper functions for this:
(defun-inline chop (size integer)
(ldb (byte size 0) integer))
(defun-inline cat-bytes (high-order low-order)
(dpb high-order (byte 8 8) low-order))
(defun emulate-cycle (chip)
(with-chip (chip)
(let ((instruction (cat-bytes (aref memory program-counter)
(aref memory (1+ program-counter)))))
(zapf program-counter (chop 12 (+ % 2)))
(dispatch-instruction chip instruction))))
chop
truncates an integer to the given number of bits. cat-bytes
concatenates two bytes. These are pretty simple, but I prefer the more
descriptive names over writing dpb
(deposit byte) and ldb
(load byte)
everywhere.
zapf
is from my utility library. I've written an entire post about
it.
What happens when the program counter is at the final index into memory? None
of the CHIP-8 references I found specify what should happen. We could signal an
error, or just wrap around to 0. I've chosen the latter by chopping (+
program-counter 2)
to 12 bits, but signaling an error would be easy too.
Instruction Dispatch
The CHIP-8's instruction scheme is a bit different than most others that I've seen.
For systems like the Game Boy instructions have a one- or two-byte opcode, with
arguments following. For systems like this you can dispatch on opcode by doing
a giant case
statement:
(case opcode
(#x00 (op-foo ...))
(#x01 (op-bar ...))
; ...
)
But a more efficient way to do it is often to shove all the op-
functions into
an array, and then just use the opcode itself as the index into the array to
find the function:
(funcall (aref opcodes opcode) ...)
But the CHIP-8 differentiates instructions a bit strangely. All instructions
are two bytes long, including their arguments. Instead of just using the
first N bits as the opcode, some instructions use a combination of high- and
low-order bits. For example, the logical AND
, OR
, and XOR
are specified
by starting the instruction with an 8
nibble and ending it with 1
, 2
, or
3
, with the "arguments" being the two nibbles in the middle:
8xy1 - OR Vx, Vy
8xy2 - AND Vx, Vy
8xy3 - XOR Vx, Vy
This makes it rather annoying to use some kind of function table approach.
There are only a few dozen opcodes, so instead of trying to hack something
together we'll just use a big old case
instead:
(defun dispatch-instruction (chip instruction)
(macrolet ((call (name) `(,name chip instruction)))
(ecase (logand #xF000 instruction)
(#x0000 (ecase instruction
(#x00E0 (call op-cls))
(#x00EE (call op-ret))))
(#x1000 (call op-jp-imm))
(#x2000 (call op-call))
(#x3000 (call op-se-reg-imm))
(#x4000 (call op-sne-reg-imm))
(#x5000 (ecase (logand #x000F instruction)
(#x0 (call op-se-reg-reg))))
(#x6000 (call op-ld-reg<imm))
(#x7000 (call op-add-reg<imm))
(#x8000 (ecase (logand #x000F instruction)
(#x0 (call op-ld-reg<reg))
(#x1 (call op-or))
(#x2 (call op-and))
(#x3 (call op-xor))
(#x4 (call op-add-reg<reg))
(#x5 (call op-sub-reg<reg))
(#x6 (call op-shr))
(#x7 (call op-subn-reg<reg))
(#xE (call op-shl))))
(#x9000 (ecase (logand #x000F instruction)
(#x0 (call op-sne-reg-reg))))
(#xA000 (call op-ld-i<imm))
(#xB000 (call op-jp-imm+reg))
(#xC000 (call op-rand))
(#xD000 (call op-draw))
(#xE000 (ecase (logand #x00FF instruction)
(#x9E (call op-skp))
(#xA1 (call op-sknp))))
(#xF000 (ecase (logand #x00FF instruction)
(#x07 (call op-ld-reg<dt))
(#x0A (call op-ld-reg<key))
(#x15 (call op-ld-dt<reg))
(#x18 (call op-ld-st<reg))
(#x1E (call op-add-index<reg))
(#x29 (call op-ld-font<vx))
(#x33 (call op-ld-bcd<vx))
(#x55 (call op-ld-mem<regs))
(#x65 (call op-ld-regs<mem)))))))
The macrolet
at the beginning saves us a bit of typing, but otherwise this is
just a big boring table to figure out which instruction to call.
That's all the boring infrastructure work we need — on to implementing the actual instructions.
Instructions
The CHIP-8 supports thirty-six instructions, all of which we'll need to implement. We'll start with the simpler ones, and we'll be leaving some of the others (the graphics/sound related ones) for later articles.
(In contrast, the Game Boy has roughly five hundred instructions, so now you might see why I thought CHIP-8 would be simpler!)
Random Numbers
We'll start with the RND
instruction, which generates random numbers. This
may seem like an odd place to start, but I want to describe some extra syntactic
sugar and RND
is a nice standalone instruction to use as an example.
Cowgod's reference describes RND
like so:
Cxkk - RND Vx, byte
Set Vx = random byte AND kk.
The interpreter generates a random number from 0 to 255,
which is then ANDed with the value kk. The results are
stored in Vx.
A first stab at the implementation in Lisp might look like this:
(defun op-rand (chip instruction)
(let ((reg (logand #x0F00 instruction))
(mask (logand #x00FF instruction)))
(with-chip (chip)
(setf (aref registers reg)
(logand (random 256) mask)))))
This would work, but it's an awful lot of code for something that just takes a sentence or two to describe. We're going to be defining a lot of instructions, so let's take a few minutes and make our lives easier.
Define-Instruction
We'll create a define-instruction
macro that will abstract away some of the
boring bits. The goal is to end up with instruction definitions that read as
close to the documentation as possible.
We'll start by removing the need to use the with-chip
macro. Every
instruction needs to deal with the chip
struct, so let's not repeat ourselves
thirty-six times. Every instruction will also take chip
and instruction
arguments, so we can remove those too:
(defmacro define-instruction (name &body body)
`(defun ,name (chip instruction)
(declare (ignorable instruction))
(with-chip (chip)
,@body)
nil))
(define-instruction op-rand
(let ((reg (logand #x0F00 instruction))
(mask (logand #x00FF instruction)))
(setf (aref registers reg)
(logand (random 256) mask))))
That's a little better. We're going to be calling these instructions a lot, so it wouldn't hurt to add a type declaration for each function. This is easy because they all take and return the same types:
(defmacro define-instruction (name &body body)
`(progn
(declaim (ftype (function (chip int16) null) ,name)) ; NEW
(defun ,name (chip instruction)
(declare (ignorable instruction))
(with-chip (chip)
,@body)
nil)))
Now all of our instruction functions will have type hints, and SBCL can check
them (if safety
is high) or use them to generate faster code (if speed
is
high).
We'll be accessing the values of registers quite often ((aref registers ...)
)
so let's add a little macrolet
to make that read nicer:
(defmacro define-instruction (name &body body)
`(progn
(declaim (ftype (function (chip int16) null) ,name))
(defun ,name (chip instruction)
(declare (ignorable instruction))
(with-chip (chip)
(macrolet ((register (index) ; NEW
`(aref registers ,index))) ; NEW
,@body))
nil)))
(define-instruction op-rand
(let ((reg (logand #x0F00 instruction))
(mask (logand #x00FF instruction)))
(setf (register reg) ; NEW
(logand (random 256) mask))))
Now instead of (aref registers ...)
we can just say (register ...)
. Cool.
The one thing that still bothers me is having to manually pull the instruction arguments out of the instruction with bitmasking. It would be much nicer if we could just declare what the arguments look like and have the computer generate the appropriate masking code to deal with them. The goal is to be able to write something like this:
(define-instruction op-rand (_ r (mask 2)) ;; RND
(setf (register r)
(logand (random 256) mask)))
The (_ r (mask 2))
is the "argument list" of the instruction. It says:
- A single-nibble value which we don't care about:
_
- A single-nibble value should be bound to
r
:r
- A two-nibble value should be bound to
mask
:(mask 2)
Compare the Lisp code and Cowgod's documentation:
Cxkk - RND Vx, byte
Set Vx = random byte AND kk.
(define-instruction op-rand (_ r (mask 2)) ;; RND
(setf (register r)
(logand (random 256) mask)))
Cowgod uses V
as a shorthand for register
, and calls the argument x
instead of r
, but otherwise I think this is pretty damn close to the
documentation for actual running code.
So let's implement the argument-parsing macro. Feel free to skip this if you're not really comfortable with macros. But it's a good demonstration of how you can use them to get a nice clean language for a simple project.
First we'll wrap the body
in a let
:
; NEW
(defmacro define-instruction (name argument-list &body body)
`(progn
(declaim (ftype (function (chip int16) null) ,name))
(defun ,name (chip instruction)
(declare (ignorable instruction))
(with-chip (chip)
(macrolet ((register (index)
`(aref registers ,index)))
(let ,(parse-instruction-argument-bindings argument-list) ; NEW
,@body))
nil))))
We'll use a helper function to parse the argument list into a list of let
bindings, instead of trying to cram it all into this one macro. It might look
scary, but it's not actually too bad:
(defun parse-instruction-argument-bindings (argument-list)
(flet ((normalize-arg (arg)
(destructuring-bind (symbol &optional (nibbles 1))
(ensure-list arg)
(list symbol nibbles))))
(iterate
(for (symbol nibbles) :in (mapcar #'normalize-arg argument-list))
(for position :first 3 :then (- position nibbles))
(when (not (eql symbol '_))
(collect `(,symbol (ldb (byte ,(* nibbles 4)
,(* position 4))
instruction)))))))
normalize-arg
takes each argument from the list and turns it into (symbol
nibbles)
, so we don't have to write the lengths for single-nibble arguments
like this: ((_ 1) (r 1) (mask 2))
.
Then we loop through each (symbol nibbles)
pair in the argument list. We
start at position 3
in the byte (the highest-order nibble) because we want to
write the argument list from left to right, high-order to low-order, like the
documentation.
For each pair, if the symbol is anything other than _
we collect a let
binding with the appropriate ldb
call for it. So for our example of (_
r (mask 2))
we end up with two let
bindings:
(r (ldb (byte 4 8) instruction))
(mask (ldb (byte 8 0) instruction))
Notice that we only bother parsing out the bindings we need from the
instruction. For instructions that don't need any arguments we'll end up with
an empty (let () ...)
which the compiler will optimize away.
This might seem a bit hairy. The nice thing about using a helper function like this is that we can call it on its own with a variety of argument lists to see what gets generated:
(map nil #'print
(parse-instruction-argument-bindings '(_ r (mask 2))))
(R (LDB (BYTE 4 8) INSTRUCTION))
(MASK (LDB (BYTE 8 0) INSTRUCTION))
(map nil #'print
(parse-instruction-argument-bindings '(_ x y _)))
(X (LDB (BYTE 4 8) INSTRUCTION))
(Y (LDB (BYTE 4 4) INSTRUCTION))
(map nil #'print
(parse-instruction-argument-bindings '(_ (foo 3))))
(FOO (LDB (BYTE 12 0) INSTRUCTION))
And that wraps up define-instruction
. So instead of writing RND
as
a vanilla function:
(defun op-rand (chip instruction)
(let ((reg (logand #x0F00 instruction))
(mask (logand #x00FF instruction)))
(with-chip (chip)
(setf (aref registers reg)
(logand (random 256) mask)))))
We can do something much cleaner:
(define-instruction op-rand (_ r (mask 2)) ;; RND
(setf (register r)
(logand (random 256) mask)))
This is going to pay off nicely as we implement the other thirty-five instructions.
Jumps and Calls
The CHIP-8 has two instructions for jumping directly to an address. The first
jumps to an immediate, literal address. The second adds an immediate value to
whatever is in register #x0
and jumps to the result:
(define-instruction op-jp-imm (_ (target 3)) ;; JP addr
(setf program-counter target))
(define-instruction op-jp-imm+reg (_ (target 3)) ;; JP V0 + addr
(setf program-counter (chop 12 (+ target (register 0)))))
Our define-instruction
macro is already paying off. The immediate values of
these arguments are three nibbles each, but our macro doesn't care.
There are also two instructions for implementing traditional function calls and returns using the CHIP-8's stack:
(define-instruction op-call (_ (target 3)) ;; CALL addr
(vector-push program-counter stack)
(setf program-counter target))
(define-instruction op-ret () ;; RET
(setf program-counter (vector-pop stack)))
We just use the built in vector-push
and vector-pop
functions to manage the
stack's fill pointer trivially.
Note that we're NOT using vector-push-extend
because the CHIP-8 stack is
defined by the spec to only hold at most sixteen addresses — any ROM that tries
to push more is broken.
Binary-Coded Decimal
Let's take another self-contained instruction next: BCD
, which stands for
binary-coded decimal (some references call this instruction LD B, Vx
but I think it's different enough to deserve its own name).
The problem this instruction is designed to solve goes something like this. Let's say you've got a game where the player has a score that changes over time. You can store the player's score in a register or memory somewhere, and add/subtract to/from it with the normal arithmetic opcodes no problem.
But now you want to display the score to the player, and you probably want to
do that in base 10 because humans are generally bad at reading binary or hex.
So how do you take a byte like #xA5
and determine the hundreds, tens, and ones
sprites to draw on the screen to read "165 points"?
This is what the BCD
instruction does. It takes a single argument (the
register) and stores the hundreds, tens, and ones digits of that register's
value into three separate bytes of memory, starting at wherever the index
register is currently pointing.
Register V3 Index Register ┌───┐ ┌───┐ │135│ │...│ └───┘ └─┬─┘ ┌─────────────┘ │ ▼ ┌───┬───┬───┬───┬───┐ │ │ │ │ │ │ Memory └───┴───┴───┴───┴───┘ Run BCD V3 instruction ================= Register V3 Index Register ┌───┐ ┌───┐ │135│ │...│ └───┘ └─┬─┘ ┌─────────────┘ │ ▼ ┌───┬───┬───┬───┬───┐ │ 1 │ 3 │ 5 │ │ │ Memory └───┴───┴───┴───┴───┘
We'll split this into two parts to make it easier to read. First we'll make
a digit
function to retrieve a digit of a number:
(defun-inline digit (position integer &optional (base 10))
(-<> integer
(floor <> (expt base position))
(mod <> base)))
We could have hard-coded the base 10, but part of the Common Lisp tradition is making flexible functions that can be reused in the future when it's not much harder to do so.
Let's make sure it works on its own:
(digit 0 135)
5
(digit 1 135)
3
(digit 2 135)
1
(digit 0 #xD6 16)
6
(digit 1 #xD6 16)
13 ; 13 in base 10 == D in base 16
And then we can define the actual operation:
(define-instruction op-ld-bcd<vx (_ r _ _) ;; LD B, Vx
(let ((number (register r)))
(setf (aref memory (+ index 0)) (digit 2 number)
(aref memory (+ index 1)) (digit 1 number)
(aref memory (+ index 2)) (digit 0 number))))
Arithmetic
Next up are the arithmetic instructions. The CHIP-8 only supports addition and subtraction — there are no multiplication or division instructions.
The first two instructions are ADD
and SUB
:
ADD Vx, Vy → Vx = Vx + Vy
SUB Vx, Vy → Vx = Vx - Vy
Once again we'll start by creating two helper functions to perform the 8-bit addition/subtraction with overflow/underflow. These functions will return a second value that represents the carry/borrow bit.
(defun-inline +_8 (x y)
(let ((result (+ x y)))
(values (chop 8 result)
(if (> result 255) 1 0)))) ; carry
(defun-inline -_8 (x y)
(let ((result (- x y)))
(values (chop 8 result)
(if (> x y) 1 0)))) ; borrow
And now the instructions themselves are trivial:
(define-instruction op-add-reg<reg (_ rx ry) ;; ADD Vx, Vy (8-bit)
(setf (values (register rx) flag)
(+_8 (register rx) (register ry))))
(define-instruction op-sub-reg<reg (_ rx ry) ;; SUB Vx, Vy (8-bit)
(setf (values (register rx) flag)
(-_8 (register rx) (register ry))))
This takes advantage of the fact that you can use (setf (values ...) ...)
to
assign the multiple values returned by a function, without binding them to local
variables.
Notice how we just assign to flag
. Under the hood that flag
has been bound
with our with-chip
macro to mean (chip-flag chip)
, which we defined way back
in the beginning to mean (aref (chip-registers chip) #xF)
. But isn't it much
nicer to just say (setf flag ...)
?
There's also a SUBN
instruction for subtracting the operands in reverse order
(but still storing the result in the first):
(define-instruction op-subn-reg<reg (_ rx ry) ;; SUBN Vx, Vy (8-bit)
(setf (values (register rx) flag)
;; subtraction order is swapped for SUBN
(-_8 (register ry) (register rx))))
Next is an ADD
instruction that takes an immediate value. Unlike the other
instructions this one does not set the flag for some reason (that was a fun
bug to track down):
(define-instruction op-add-reg<imm (_ r (immediate 2)) ;; ADD Vx, Imm
;; For some weird reason the ADD immediate op doesn't set the flag
(zapf (register r) (+_8 % immediate)))
Because Common Lisp will just ignore extra return values if we don't use them,
we can just use our +_8
helper function here too and ignore the carry result.
There's also a single 16-bit ADD
instruction that adds the value in
a particular register to the index register. It too ignores the flag
:
(define-instruction op-add-index<reg (_ r) ;; ADD I, Vx (16-bit)
(zapf index (chop 16 (+ % (register r)))))
Shifting
The CHIP-8 has two bit-shifting instructions: SHR
and SHL
which shift
a register's contents right or left by a single bit. Both of these instructions
also set the flag to the bit that got shifted "off the end" of the register.
We'll define two more helpers to do the actual 8-bit shifting and keep track of the bit that falls off the end:
(defun-inline get-bit (position integer)
(ldb (byte 1 position) integer))
(defun-inline >>_8 (v)
(values (ash v -1)
(get-bit 0 v)))
(defun-inline <<_8 (v)
(values (chop 8 (ash v 1))
(get-bit 7 v)))
The instructions themselves are trivial:
(define-instruction op-shr (_ r) ;; SHR
(setf (values (register r) flag)
(>>_8 (register r))))
(define-instruction op-shl (_ r) ;; SHL
(setf (values (register r) flag)
(<<_8 (register r))))
Update: as fernly pointed out, this may not have been the original intended behavior. If we want to make our emulator match the original intent, it's pretty simple:
(define-instruction op-shr (_ rx ry) ;; SHR
(setf (values (register ry) flag)
(>>_8 (register rx))))
(define-instruction op-shl (_ rx ry) ;; SHL
(setf (values (register ry) flag)
(<<_8 (register rx))))
But I suspect that this will probably break some ROMs that rely on the incorrectly-documented behavior.
Logical Operations
Next up are the logical AND
/OR
/XOR
instructions. We could define these
like so:
(define-instruction op-and (_ destination source _)
(zapf (register destination) (logand % (register source))))
(define-instruction op-or (_ destination source _)
(zapf (register destination) (logior % (register source))))
(define-instruction op-xor (_ destination source _)
(zapf (register destination) (logxor % (register source))))
This works, but aside from the name and the operation they're all identical. The next few groups of instructions are also going to be similar, so let's step back for a moment and see if we can abstract away the tedium.
Macro-Map
Instead of typing the same thing over and over, we'd like to just say what we
want once. The traditional way to do this is with macrolet
:
(macrolet
((define-logical-instruction (name function)
`(define-instruction ,name (_ destination source _)
(zapf (register destination)
(,function % (register source))))))
(define-logical-instruction op-and logand)
(define-logical-instruction op-or logior)
(define-logical-instruction op-xor logxor))
Now we're only writing out the actual functionality once instead of three times.
That's better, but I'm still not satisfied. Using macrolet
means I need to
think of a name for the macro that I'm just going to use within this block and
throw away, and naming things is hard.
What I really want to do here is just "map" a macro over a bunch of arguments
and be done with it. I've played around a bit and ended up with a macro called
macro-map
that does just that:
(defmacro macro-map (lambda-list items &rest body)
(with-gensyms (macro)
`(macrolet ((,macro ,(ensure-list lambda-list) ,@body))
,@(iterate (for item :in items)
(collect `(,macro ,@(ensure-list item)))))))
macro-map
takes a lambda list, list of arguments, and a body for the macro and
builds the macrolet
we wrote by hand earlier using a gensym
for the name so
we don't have to waste brain cells thinking of one. Now we can define our
logical operations all at once:
(macro-map ;; AND/OR/XOR
(NAME OP)
((op-and logand)
(op-or logior)
(op-xor logxor))
`(define-instruction ,name (_ destination source _)
(zapf (register destination) (,op % (register source)))))
This actually ends up being one line of code longer than the copy/pasted version because I like to linebreak liberally, but the important thing is that we only say each thing we need to say once.
Notice how this almost starts to look like a table of data rather than code. I've taken advantage of the fact that the Lisp reader reads everything as uppercase by default and uppercased the "header" row (the lambda list) to make it stand out a bit more.
If you squint a little bit you might imagine how defining a set of related instructions could almost look like a page from a CPU manual.
Branching
Now that we've got a way to define batches of similar instructions without going crazy we can tackle the branching instructions.
Most CPUs have some form of "jump to some location if zero, otherwise continue" instruction to implement branching. The CHIP-8 needs to fit every instruction into two bytes, so it does something a bit simpler. Instead of an arbitrary jump on a condition, it has a series of skip instructions.
For example, SE Vx, 0
means "skip the next instruction if register X is zero".
Skipping the next instruction is a simple as incrementing the program counter by
an extra two bytes.
There are four variants of this instruction:
SE Vx, Immediate → Skip when Vx equals Immediate
SNE Vx, Immediate → Skip when Vx does not equal Immediate
SE Vx, Vy → Skip when Vx equals Vy
SNE Vx, Vy → Skip when Vx does not equal Vy
Let's use macro-map
to write this out as a table of code:
(macro-map ;; SE/SNE
((NAME TEST X-ARG X-FORM Y-ARG Y-FORM)
((op-se-reg-imm = (r 1) (register r) (immediate 2) immediate)
(op-sne-reg-imm not= (r 1) (register r) (immediate 2) immediate)
(op-se-reg-reg = (rx 1) (register rx) (ry 1) (register ry))
(op-sne-reg-reg not= (rx 1) (register rx) (ry 1) (register ry))))
`(define-instruction ,name (_ ,x-arg ,y-arg)
(when (,test ,x-form ,y-form)
(incf program-counter 2))))
After the macroexpansion we end up with things like:
(define-instruction op-sne-reg-imm (_ (r 1) (immediate 2))
(when (not= (register r) immediate)
(incf program-counter 2)))
We'll also need not=
itself to avoid having to do messy things inside the
macro body:
(defun-inline not= (x y)
(not (= x y)))
Just one more group of instructions left (for this post).
Loads
The final group of instructions we'll look at for now is the LD
family of
loads. Normally we'd implement these first, but I wanted to introduce
macro-map
as gently as possible.
Most of the LD
instructions simply take a value from a source and stick it
into a destination, and we can implement them as a single setf
form:
(macro-map ;; LD
(NAME ARGLIST DESTINATION SOURCE)
((op-ld-i<imm (_ (value 3)) index value)
(op-ld-reg<imm (_ r (value 2)) (register r) value)
(op-ld-reg<reg (_ rx ry _) (register rx) (register ry))
(op-ld-reg<dt (_ r _ _) (register r) delay-timer)
(op-ld-dt<reg (_ r _ _) delay-timer (register r))
(op-ld-st<reg (_ r _ _) sound-timer (register r)))
`(define-instruction ,name ,arglist
(setf ,destination ,source)))
We haven't talked about the timers yet, so don't worry about them. I'm leaving
them in so you can see how nice the macro-map
is when you need to define lots
of similar operations at once.
There are two other more interesting LD
instructions which move data between
multiple registers and memory.
LD [I], n
loads consecutive bytes of memory with the contents of registers
V0
through Vn
, starting at wherever the index register is pointing. For
example, LD [I], 2
would be:
V0 V1 V2 V3 V4 ... ┌──┐ ┌──┐ ┌──┐ ┌──┐ ┌──┐ │ │ │ │ │ │ │ │ │ │ ... └─┬┘ └─┬┘ └─┬┘ └──┘ └──┘ └───┐└──┐ └─┐ │ │ │ ▼ ▼ ▼ ┌───┬───┬───┬───┬───┐ │ │ │ │ │ │ Memory └───┴───┴───┴───┴───┘ ▲ ┌───┐ └──│...│ Index Register └───┘
LD n, [I]
does the opposite: it loads the contents of memory into the
registers V0
through Vn
.
Because we've used Lisp arrays for both the registers and memory, these
instructions are really lovely to implement — they end up being just a single
call to replace
:
(define-instruction op-ld-mem<regs (_ n _ _) ;; LD [I] < Vn
(replace memory registers :start1 index :end2 (1+ n)))
(define-instruction op-ld-regs<mem (_ n _ _) ;; LD Vn < [I]
(replace registers memory :end1 (1+ n) :start2 index))
That's it for the LD
instructions (for now) and for the instructions in
general!
Future
With these instructions implemented (and with some stubs for the rest) we can load and run a ROM and it will push bytes around in memory and mostly work. In the next few posts we'll look at the next steps to getting a fully-functional emulator up and running, including:
- Graphics and input
- Sound
- Debugging
Thanks to James Cash for reading a draft of this post.