CHIP-8 in Common Lisp: The CPU

Posted on December 19th, 2016.

A while back I decided to try to write a Game Boy emulator in Common Lisp based on this series of articles. I made some good progress but eventually got bogged down because I was trying to learn a bunch of complex new things at once:

How to write an emulator
How to use Qt with Common Lisp
How the Game Boy works internally

Instead of dragging on, I decided to take a break and try something simpler: a CHIP-8 emulator/interpreter. The CHIP-8 is much simpler than the Game Boy, which made it easier to experiment with the rest of the infrastructure.

In this post and a couple of future ones I'll walk through all of my CHIP-8 emulator implementation. To give you a rough idea of the size of the project, cloc reports:

Basic emulator: 415 lines
Debugging/disassembling infrastructure: 142 lines
Screen GUI: 153 lines
Graphical debugger: 295 lines

This first post will deal with emulating the CHIP-8's CPU.

The full series of posts so far:

The full emulator source is on BitBucket and GitHub.

Libraries
CHIP-8 References
The Main Data Structure
Removing Tedium
Infrastructure
Instructions
Future

Libraries

The emulator uses a few Common Lisp libraries to make things easier:

bordeaux-threads to handle threading.
cl-arrows for an implementation of Clojure's -<> threading macro.
cl-losh is my own personal utility library.
cl-portaudio for audio.
iterate for a much nicer version of loop called iterate. My utility library contains several iterate drivers (some of which I've written about before).
qtools to handle creating a GUI with Qt.
quickutil for some miscellaneous utility functions from Alexandria and elsewhere.

I'll try to remember to mention whenever I use a function that's not built-in to Common Lisp, but I might forget, in which case it's probably in one of these.

CHIP-8 References

There's a good amount of information available about the CHIP-8 online. The references I used most often were:

The Main Data Structure

Let's dive into the code. We'll start with the main data structure that will hold an instance of a CHIP-8 for emulation:

(defstruct chip
  (running t :type boolean)
  ; ...more to come later
  )

We're using a Lisp struct instead of a CLOS class because we're going to be accessing the fields of this thing a lot and CLOS accessors can be comparatively slow. This is one of the very few concessions we'll make to performance.

The first field is running, which is just a boolean that represents whether the emulator is currently running.

We'll eventually have multiple threads poking at this struct, and they'll generally be doing something like (loop :while (chip-running chip) :do ...). That way when we want to quit the emulator we can just set running to nil and everything will (eventually) stop.

Registers

The CHIP-8 has sixteen main registers, as well as a few other special ones:

(defstruct chip
  ; ...
  (registers (make-array 16 :element-type 'int8)
             :type (simple-array int8 (16))
             :read-only t)
  (index 0 :type int16)
  (program-counter #x200 :type int12)
  ; ...
  )

We'll keep the main registers in an array, and the others will just be separate slots. I've added type declarations in the struct for two reasons:

With a high safety declaration many implementations (including SBCL, the one I'm using) will type check at runtime to make sure we're setting things appropriately.
With low safety and high speed declarations, SBCL can generate much faster code if it knows the types of the struct slots.

To make the integer types a bit less wordy I've defined some simple synonyms:

(deftype int4 () '(unsigned-byte 4))
(deftype int8 () '(unsigned-byte 8))
(deftype int12 () '(unsigned-byte 12))
(deftype int16 () '(unsigned-byte 16))

So registers is a 16-element simple array of (unsigned-byte 8)s. It's read-only because we'll be changing the elements of the array, but we should never be swapping out the entire array itself.

The index register is 16 bits, and the program counter can store up to 12 bits (we'll never need more than that because of the size of the CHIP-8's memory).

Also note that the program counter starts at address #x200, because that's where the ROM data eventually gets loaded into the CHIP-8 memory.

The Stack

The CHIP-8 also has an internal stack to store return addresses when calling procedures. We'll just use a Lisp vector with a fill pointer for this:

(defstruct chip
  ; ...
  (stack (make-array 16 :element-type 'int12 :fill-pointer 0)
         :type (vector int12 16)
         :read-only t)
  ; ...
  )

Memory

The CHIP-8 has 4 kilobytes of main memory:

(defconstant +memory-size+ (* 1024 4))

(defstruct chip
  ; ...
  (memory (make-array +memory-size+ :element-type 'int8)
          :type (simple-array int8 (#.+memory-size+))
          :read-only t)
  ; ...
  )

Pretty simple. Note the #. reader macro trick/hack to be able to use a variable where we normally need a raw type specifier.

Currently-Loaded ROM

Finally we'll add a slot for keeping track of the path to the currently-loaded ROM, for easy resetting later:

(defstruct chip
  ; ...
  (loaded-rom nil :type (or null string))
  ; ...
  )

That's it for now. We'll need a few more slots once we get to things like graphics and sound, but I'll introduce them when we need them.

The Flag Register

Register number 15, or #xF in hex, is special. It's nicknamed the "flag" register and gets set specially by certain instructions. We could just access it like the rest of the registers in our code:

(setf (aref (chip-registers chip) 15) 1) ; set flag to 1
(print (aref (chip-registers chip) #xF)) ; print the flag

But even with the #xF hex index to get that mnemonic "F" this is a bit too hard to read. Let's define some extra reader and writer functions to clean things up:

(defun-inline chip-flag (chip)
  (aref (chip-registers chip) #xF))

(defun-inline (setf chip-flag) (new-value chip)
  (setf (aref (chip-registers chip) #xF) new-value))

defun-inline is from my utility library — it just defuns the function and declaims it inline all in one step. Now things are much nicer:

(setf (chip-flag chip) 1) ; set flag to 1
(print (chip-flag chip)) ; print the flag

Removing Tedium

What we've got so far works, but I want to add one more piece of syntactic sugar before moving on.

We're going to be accessing the slots of the chip struct a lot, and it's going to get tedious to write (chip-SLOT chip) over and over again. For example, when we're resetting the emulator we'll have to do something like this:

(defun reset (chip)
  (fill (chip-memory chip) 0)
  (fill (chip-registers chip) 0)
  (setf (chip-running chip) t
        (chip-program-counter chip) #x200
        (fill-pointer (chip-stack chip)) 0))

This is annoying. Languages like Javascript and Python use . for slot access, so it ends up being a bit more concise: chip.memory() instead of (chip-memory chip).

We could use Lisp's with-accessors to clean up the actual usage a bit:

(defun reset (chip)
  (with-accessors ((memory chip-memory)
                   (registers chip-registers)
                   (running chip-running)
                   (program-counter chip-program-counter)
                   (stack chip-stack))
      chip
    (fill memory 0)
    (fill registers 0)
    (setf running t
          program-counter #x200
          (fill-pointer stack) 0)))

The usage looks much nicer now ((fill memory 0) is wonderfully readable) but we haven't actually fixed anything. We've just shifted all the typing a few lines up. But this is Lisp, we can do better!

Ideally what we'd like is to be able to do say something like (with-chip (chip) ...) to mean the giant with-accessors form above:

(defun reset (chip)
  (with-chip (chip)
    (fill memory 0)
    (fill registers 0)
    (setf running t
          program-counter #x200
          (fill-pointer stack) 0)))

This is nice and readable. Some folks will dislike the fact that it introduces new variable bindings that are "hidden" in the macro definition, but I like the concision you get from it, especially for a small project like this.

We could write with-chip ourselves, if we wanted to. It would look something like this:

(defmacro with-chip ((chip) &body body)
  `(with-accessors ((memory chip-memory)
                    (registers chip-registers)
                    ; ...
                    )
       ,chip
     ,@body)`
)

It's not hard to write, just tedious. But we're using Lisp, so anything tedious calls out for abstraction. Another macro from my utility library is define-with-macro. This is a macro-defining macro that we can use to define with-chip for us:

(define-with-macro chip
  running memory stack registers index program-counter flag)

If the hairiness of a macro-defining macro scares you, don't worry about it. We could have just written it by hand as shown above. I've just needed with-FOO macros like these often enough that it's been worth it to abstract away the tedium of writing them.

So one way or another we've got a nice with-chip macro that will let us access the fields of our struct with bare names.

Infrastructure

We'll need to define a bit of infrastructure for running things before we jump into implementing the CHIP-8 CPU instructions.

Resetting

Before we can emulate a ROM, we need to read it into our memory array. I've chosen to do this in the reset function so we can just (reset my-chip) to reload everything at once — it wouldn't make much sense to load a new ROM without resetting the rest of the emulation state (though the results could be... "interesting").

reset looks like this:

(defun reset (chip)
  (with-chip (chip)
    (fill memory 0)
    (fill registers 0)
    (replace memory (read-file-into-byte-vector loaded-rom)
             :start1 #x200)
    (setf running t
          program-counter #x200
          (fill-pointer stack) 0))
  (values))

This is pretty self explanatory, except for the actual ROM-loading bit:

(replace memory (read-file-into-byte-vector loaded-rom)
         :start1 #x200)

replace is just the standard Common Lisp replace function that copies the contents of one sequence into another. read-file-into-byte-vector is from Alexandria, and will just read the file at loaded-rom into a byte vector. Then all we need to do is say that we want to start the copying at index #x200 in the destination, because that's where the CHIP-8 ROM data is supposed start (there's other internal data (like font sprites) before it, which we'll see later).

One last thing to note is that reset just returns (values), i.e. it returns nothing at all. This is something I sometimes do for functions I call at the REPL for side effects, to avoid returning a meaningless result. I think of it as an application of The Rule of Silence for Lisp.

Loading ROMs

Now that we've got a reset function, it's trivial to define a function to load a new ROM into our emulator:

(defun load-rom (chip filename)
  (setf (chip-loaded-rom chip) filename)
  (reset chip))

I didn't bother with the with-chip macro here because we're only accessing a single field.

The Main Loop(s)

Now let's wrap things up into a nice interface. We'll define a top-level run function that will be what we call to fire up the emulator:

(defparameter *c* nil)

(defun run (rom-filename)
  (let ((chip (make-chip)))
    (setf *c* chip)
    (load-rom chip rom-filename)
    (run-cpu chip)))

This will get more complicated in the future, but for now it's pretty simple. Make a new chip object, load the specified ROM, and start emulating.

I've added a *c* global variable that's bound to the currently-running emulator so we can poke at it in NREPL or SLIME as it's running.

Now to write run-cpu:

(defconstant +cycles-per-second+ 500)
(defconstant +cycles-before-sleep+ 10)

(defun run-cpu (chip)
  (iterate
    (while (chip-running chip))
    (emulate-cycle chip)
    (for tick :every-nth +cycles-before-sleep+ :do
         (sleep (/ +cycles-before-sleep+ +cycles-per-second+)))))

CHIP-8 was designed to run on much weaker hardware than we have these days. Even though we're emulating it, if we just run as fast as possible we'll be running far too fast to be playable.

+cycles-per-second+ is a constant describing our ideal emulation speed. Anything from 300 to 800 or so is playable, depending on how fast you want games to run. We'll just use 500 for now as a happy medium.

It would be wasteful to call (sleep) after every single cycle, so instead we'll batch them together: run 10 instructions, sleep for a bit, repeat. This is the job of the (for ... every-nth N) iterate driver, which is also in my utility library. Every 10 iterations through the loop it will sleep for the appropriate amount of time.

The size of the cycle batches is arbitrary — larger batches will result in fewer (sleep) calls, but if you go too large you'll start noticing the emulator getting "jumpy". 10-cycle batches at 500 cycles per second means that the emulator will sleep for about 1/50 of a second each time, which isn't too noticeable.

Now we've got the main loop all set up and just need to emulate each individual cycle.

Individual Cycles

CHIP-8 instructions are each two bytes long, and are stored big-endian in memory. So to emulate a single cycle we:

Read the two bytes starting at the program counter and concatenate them to get the instruction.
Advance the program counter (to avoid having to do it inside every single instruction).
Dispatch to the appropriate instruction code.

We'll define emulate-cycle and a couple of helper functions for this:

(defun-inline chop (size integer)
  (ldb (byte size 0) integer))

(defun-inline cat-bytes (high-order low-order)
  (dpb high-order (byte 8 8) low-order))

(defun emulate-cycle (chip)
  (with-chip (chip)
   (let ((instruction (cat-bytes (aref memory program-counter)
                                 (aref memory (1+ program-counter)))))
     (zapf program-counter (chop 12 (+ % 2)))
     (dispatch-instruction chip instruction))))

chop truncates an integer to the given number of bits. cat-bytes concatenates two bytes. These are pretty simple, but I prefer the more descriptive names over writing dpb (deposit byte) and ldb (load byte) everywhere.

zapf is from my utility library. I've written an entire post about it.

What happens when the program counter is at the final index into memory? None of the CHIP-8 references I found specify what should happen. We could signal an error, or just wrap around to 0. I've chosen the latter by chopping (+ program-counter 2) to 12 bits, but signaling an error would be easy too.

Instruction Dispatch

The CHIP-8's instruction scheme is a bit different than most others that I've seen.

For systems like the Game Boy instructions have a one- or two-byte opcode, with arguments following. For systems like this you can dispatch on opcode by doing a giant case statement:

(case opcode
  (#x00 (op-foo ...))
  (#x01 (op-bar ...))
  ; ...
  )

But a more efficient way to do it is often to shove all the op- functions into an array, and then just use the opcode itself as the index into the array to find the function:

(funcall (aref opcodes opcode) ...)

But the CHIP-8 differentiates instructions a bit strangely. All instructions are two bytes long, including their arguments. Instead of just using the first N bits as the opcode, some instructions use a combination of high- and low-order bits. For example, the logical AND, OR, and XOR are specified by starting the instruction with an 8 nibble and ending it with 1, 2, or 3, with the "arguments" being the two nibbles in the middle:

8xy1 - OR Vx, Vy
8xy2 - AND Vx, Vy
8xy3 - XOR Vx, Vy

This makes it rather annoying to use some kind of function table approach. There are only a few dozen opcodes, so instead of trying to hack something together we'll just use a big old case instead:

(defun dispatch-instruction (chip instruction)
  (macrolet ((call (name) `(,name chip instruction)))
    (ecase (logand #xF000 instruction)
      (#x0000 (ecase instruction
                (#x00E0 (call op-cls))
                (#x00EE (call op-ret))))
      (#x1000 (call op-jp-imm))
      (#x2000 (call op-call))
      (#x3000 (call op-se-reg-imm))
      (#x4000 (call op-sne-reg-imm))
      (#x5000 (ecase (logand #x000F instruction)
                (#x0 (call op-se-reg-reg))))
      (#x6000 (call op-ld-reg<imm))
      (#x7000 (call op-add-reg<imm))
      (#x8000 (ecase (logand #x000F instruction)
                (#x0 (call op-ld-reg<reg))
                (#x1 (call op-or))
                (#x2 (call op-and))
                (#x3 (call op-xor))
                (#x4 (call op-add-reg<reg))
                (#x5 (call op-sub-reg<reg))
                (#x6 (call op-shr))
                (#x7 (call op-subn-reg<reg))
                (#xE (call op-shl))))
      (#x9000 (ecase (logand #x000F instruction)
                (#x0 (call op-sne-reg-reg))))
      (#xA000 (call op-ld-i<imm))
      (#xB000 (call op-jp-imm+reg))
      (#xC000 (call op-rand))
      (#xD000 (call op-draw))
      (#xE000 (ecase (logand #x00FF instruction)
                (#x9E (call op-skp))
                (#xA1 (call op-sknp))))
      (#xF000 (ecase (logand #x00FF instruction)
                (#x07 (call op-ld-reg<dt))
                (#x0A (call op-ld-reg<key))
                (#x15 (call op-ld-dt<reg))
                (#x18 (call op-ld-st<reg))
                (#x1E (call op-add-index<reg))
                (#x29 (call op-ld-font<vx))
                (#x33 (call op-ld-bcd<vx))
                (#x55 (call op-ld-mem<regs))
                (#x65 (call op-ld-regs<mem)))))))

The macrolet at the beginning saves us a bit of typing, but otherwise this is just a big boring table to figure out which instruction to call.

That's all the boring infrastructure work we need — on to implementing the actual instructions.

Instructions

The CHIP-8 supports thirty-six instructions, all of which we'll need to implement. We'll start with the simpler ones, and we'll be leaving some of the others (the graphics/sound related ones) for later articles.

(In contrast, the Game Boy has roughly five hundred instructions, so now you might see why I thought CHIP-8 would be simpler!)

Random Numbers

We'll start with the RND instruction, which generates random numbers. This may seem like an odd place to start, but I want to describe some extra syntactic sugar and RND is a nice standalone instruction to use as an example.

Cowgod's reference describes RND like so:

Cxkk - RND Vx, byte
Set Vx = random byte AND kk.

The interpreter generates a random number from 0 to 255,
which is then ANDed with the value kk. The results are
stored in Vx.

A first stab at the implementation in Lisp might look like this:

(defun op-rand (chip instruction)
  (let ((reg  (logand #x0F00 instruction))
        (mask (logand #x00FF instruction)))
    (with-chip (chip)
      (setf (aref registers reg)
            (logand (random 256) mask)))))

This would work, but it's an awful lot of code for something that just takes a sentence or two to describe. We're going to be defining a lot of instructions, so let's take a few minutes and make our lives easier.

Define-Instruction

We'll create a define-instruction macro that will abstract away some of the boring bits. The goal is to end up with instruction definitions that read as close to the documentation as possible.

We'll start by removing the need to use the with-chip macro. Every instruction needs to deal with the chip struct, so let's not repeat ourselves thirty-six times. Every instruction will also take chip and instruction arguments, so we can remove those too:

(defmacro define-instruction (name &body body)
  `(defun ,name (chip instruction)
      (declare (ignorable instruction))
      (with-chip (chip)
        ,@body)
      nil))

(define-instruction op-rand
  (let ((reg  (logand #x0F00 instruction))
        (mask (logand #x00FF instruction)))
    (setf (aref registers reg)
          (logand (random 256) mask))))

That's a little better. We're going to be calling these instructions a lot, so it wouldn't hurt to add a type declaration for each function. This is easy because they all take and return the same types:

(defmacro define-instruction (name &body body)
  `(progn
    (declaim (ftype (function (chip int16) null) ,name)) ; NEW
    (defun ,name (chip instruction)
      (declare (ignorable instruction))
      (with-chip (chip)
        ,@body)
      nil)))

Now all of our instruction functions will have type hints, and SBCL can check them (if safety is high) or use them to generate faster code (if speed is high).

We'll be accessing the values of registers quite often ((aref registers ...)) so let's add a little macrolet to make that read nicer:

(defmacro define-instruction (name &body body)
  `(progn
    (declaim (ftype (function (chip int16) null) ,name))
    (defun ,name (chip instruction)
      (declare (ignorable instruction))
      (with-chip (chip)
        (macrolet ((register (index)             ; NEW
                     `(aref registers ,index)))  ; NEW
          ,@body))
      nil)))

(define-instruction op-rand
  (let ((reg  (logand #x0F00 instruction))
        (mask (logand #x00FF instruction)))
    (setf (register reg)                         ; NEW
          (logand (random 256) mask))))

Now instead of (aref registers ...) we can just say (register ...). Cool.

The one thing that still bothers me is having to manually pull the instruction arguments out of the instruction with bitmasking. It would be much nicer if we could just declare what the arguments look like and have the computer generate the appropriate masking code to deal with them. The goal is to be able to write something like this:

(define-instruction op-rand (_ r (mask 2))              ;; RND
  (setf (register r)
        (logand (random 256) mask)))

The (_ r (mask 2)) is the "argument list" of the instruction. It says:

A single-nibble value which we don't care about: _
A single-nibble value should be bound to r: r
A two-nibble value should be bound to mask: (mask 2)

Compare the Lisp code and Cowgod's documentation:

Cxkk - RND Vx, byte
Set Vx = random byte AND kk.

(define-instruction op-rand (_ r (mask 2))              ;; RND
  (setf (register r)
        (logand (random 256) mask)))

Cowgod uses V as a shorthand for register, and calls the argument x instead of r, but otherwise I think this is pretty damn close to the documentation for actual running code.

So let's implement the argument-parsing macro. Feel free to skip this if you're not really comfortable with macros. But it's a good demonstration of how you can use them to get a nice clean language for a simple project.

First we'll wrap the body in a let:

                                   ; NEW
(defmacro define-instruction (name argument-list &body body)
  `(progn
    (declaim (ftype (function (chip int16) null) ,name))
    (defun ,name (chip instruction)
      (declare (ignorable instruction))
      (with-chip (chip)
        (macrolet ((register (index)
                     `(aref registers ,index)))
          (let ,(parse-instruction-argument-bindings argument-list) ; NEW
            ,@body))
        nil))))

We'll use a helper function to parse the argument list into a list of let bindings, instead of trying to cram it all into this one macro. It might look scary, but it's not actually too bad:

(defun parse-instruction-argument-bindings (argument-list)
  (flet ((normalize-arg (arg)
           (destructuring-bind (symbol &optional (nibbles 1))
               (ensure-list arg)
             (list symbol nibbles))))
    (iterate
      (for (symbol nibbles) :in (mapcar #'normalize-arg argument-list))
      (for position :first 3 :then (- position nibbles))
      (when (not (eql symbol '_))
        (collect `(,symbol (ldb (byte ,(* nibbles 4)
                                      ,(* position 4))
                                instruction)))))))

normalize-arg takes each argument from the list and turns it into (symbol nibbles), so we don't have to write the lengths for single-nibble arguments like this: ((_ 1) (r 1) (mask 2)).

Then we loop through each (symbol nibbles) pair in the argument list. We start at position 3 in the byte (the highest-order nibble) because we want to write the argument list from left to right, high-order to low-order, like the documentation.

For each pair, if the symbol is anything other than _ we collect a let binding with the appropriate ldb call for it. So for our example of (_ r (mask 2)) we end up with two let bindings:

(r (ldb (byte 4 8) instruction))
(mask (ldb (byte 8 0) instruction))

Notice that we only bother parsing out the bindings we need from the instruction. For instructions that don't need any arguments we'll end up with an empty (let () ...) which the compiler will optimize away.

This might seem a bit hairy. The nice thing about using a helper function like this is that we can call it on its own with a variety of argument lists to see what gets generated:

(map nil #'print
  (parse-instruction-argument-bindings '(_ r (mask 2))))

(R (LDB (BYTE 4 8) INSTRUCTION))
(MASK (LDB (BYTE 8 0) INSTRUCTION))

(map nil #'print
  (parse-instruction-argument-bindings '(_ x y _)))

(X (LDB (BYTE 4 8) INSTRUCTION))
(Y (LDB (BYTE 4 4) INSTRUCTION))

(map nil #'print
  (parse-instruction-argument-bindings '(_ (foo 3))))

(FOO (LDB (BYTE 12 0) INSTRUCTION))

And that wraps up define-instruction. So instead of writing RND as a vanilla function:

(defun op-rand (chip instruction)
  (let ((reg  (logand #x0F00 instruction))
        (mask (logand #x00FF instruction)))
    (with-chip (chip)
      (setf (aref registers reg)
            (logand (random 256) mask)))))

We can do something much cleaner:

(define-instruction op-rand (_ r (mask 2))              ;; RND
  (setf (register r)
        (logand (random 256) mask)))

This is going to pay off nicely as we implement the other thirty-five instructions.

Jumps and Calls

The CHIP-8 has two instructions for jumping directly to an address. The first jumps to an immediate, literal address. The second adds an immediate value to whatever is in register #x0 and jumps to the result:

(define-instruction op-jp-imm (_ (target 3))            ;; JP addr
  (setf program-counter target))

(define-instruction op-jp-imm+reg (_ (target 3))        ;; JP V0 + addr
  (setf program-counter (chop 12 (+ target (register 0)))))

Our define-instruction macro is already paying off. The immediate values of these arguments are three nibbles each, but our macro doesn't care.

There are also two instructions for implementing traditional function calls and returns using the CHIP-8's stack:

(define-instruction op-call (_ (target 3))              ;; CALL addr
  (vector-push program-counter stack)
  (setf program-counter target))

(define-instruction op-ret ()                           ;; RET
  (setf program-counter (vector-pop stack)))

We just use the built in vector-push and vector-pop functions to manage the stack's fill pointer trivially.

Note that we're NOT using vector-push-extend because the CHIP-8 stack is defined by the spec to only hold at most sixteen addresses — any ROM that tries to push more is broken.

Binary-Coded Decimal

Let's take another self-contained instruction next: BCD, which stands for binary-coded decimal (some references call this instruction LD B, Vx but I think it's different enough to deserve its own name).

The problem this instruction is designed to solve goes something like this. Let's say you've got a game where the player has a score that changes over time. You can store the player's score in a register or memory somewhere, and add/subtract to/from it with the normal arithmetic opcodes no problem.

But now you want to display the score to the player, and you probably want to do that in base 10 because humans are generally bad at reading binary or hex. So how do you take a byte like #xA5 and determine the hundreds, tens, and ones sprites to draw on the screen to read "165 points"?

This is what the BCD instruction does. It takes a single argument (the register) and stores the hundreds, tens, and ones digits of that register's value into three separate bytes of memory, starting at wherever the index register is currently pointing.

    Register V3   Index Register
    ┌───┐         ┌───┐
    │135│         │...│
    └───┘         └─┬─┘
      ┌─────────────┘
      │
      ▼
    ┌───┬───┬───┬───┬───┐
    │   │   │   │   │   │   Memory
    └───┴───┴───┴───┴───┘

    Run BCD V3 instruction =================

    Register V3   Index Register
    ┌───┐         ┌───┐
    │135│         │...│
    └───┘         └─┬─┘
      ┌─────────────┘
      │
      ▼
    ┌───┬───┬───┬───┬───┐
    │ 1 │ 3 │ 5 │   │   │   Memory
    └───┴───┴───┴───┴───┘

We'll split this into two parts to make it easier to read. First we'll make a digit function to retrieve a digit of a number:

(defun-inline digit (position integer &optional (base 10))
  (-<> integer
    (floor <> (expt base position))
    (mod <> base)))

We could have hard-coded the base 10, but part of the Common Lisp tradition is making flexible functions that can be reused in the future when it's not much harder to do so.

Let's make sure it works on its own:

(digit 0 135)
5

(digit 1 135)
3

(digit 2 135)
1

(digit 0 #xD6 16)
6

(digit 1 #xD6 16)
13     ; 13 in base 10 == D in base 16

And then we can define the actual operation:

(define-instruction op-ld-bcd<vx (_ r _ _)              ;; LD B, Vx
  (let ((number (register r)))
    (setf (aref memory (+ index 0)) (digit 2 number)
          (aref memory (+ index 1)) (digit 1 number)
          (aref memory (+ index 2)) (digit 0 number))))

Arithmetic

Next up are the arithmetic instructions. The CHIP-8 only supports addition and subtraction — there are no multiplication or division instructions.

The first two instructions are ADD and SUB:

ADD Vx, Vy → Vx = Vx + Vy
SUB Vx, Vy → Vx = Vx - Vy

Once again we'll start by creating two helper functions to perform the 8-bit addition/subtraction with overflow/underflow. These functions will return a second value that represents the carry/borrow bit.

(defun-inline +_8 (x y)
  (let ((result (+ x y)))
    (values (chop 8 result)
            (if (> result 255) 1 0)))) ; carry

(defun-inline -_8 (x y)
  (let ((result (- x y)))
    (values (chop 8 result)
            (if (> x y) 1 0)))) ; borrow

And now the instructions themselves are trivial:

(define-instruction op-add-reg<reg (_ rx ry)      ;; ADD Vx, Vy (8-bit)
    (setf (values (register rx) flag)
          (+_8 (register rx) (register ry))))

(define-instruction op-sub-reg<reg (_ rx ry)      ;; SUB Vx, Vy (8-bit)
  (setf (values (register rx) flag)
        (-_8 (register rx) (register ry))))

This takes advantage of the fact that you can use (setf (values ...) ...) to assign the multiple values returned by a function, without binding them to local variables.

Notice how we just assign to flag. Under the hood that flag has been bound with our with-chip macro to mean (chip-flag chip), which we defined way back in the beginning to mean (aref (chip-registers chip) #xF). But isn't it much nicer to just say (setf flag ...)?

There's also a SUBN instruction for subtracting the operands in reverse order (but still storing the result in the first):

(define-instruction op-subn-reg<reg (_ rx ry)     ;; SUBN Vx, Vy (8-bit)
  (setf (values (register rx) flag)
        ;; subtraction order is swapped for SUBN
        (-_8 (register ry) (register rx))))

Next is an ADD instruction that takes an immediate value. Unlike the other instructions this one does not set the flag for some reason (that was a fun bug to track down):

(define-instruction op-add-reg<imm (_ r (immediate 2))  ;; ADD Vx, Imm
  ;; For some weird reason the ADD immediate op doesn't set the flag
  (zapf (register r) (+_8 % immediate)))

Because Common Lisp will just ignore extra return values if we don't use them, we can just use our +_8 helper function here too and ignore the carry result.

There's also a single 16-bit ADD instruction that adds the value in a particular register to the index register. It too ignores the flag:

(define-instruction op-add-index<reg (_ r)        ;; ADD I, Vx (16-bit)
  (zapf index (chop 16 (+ % (register r)))))

Shifting

The CHIP-8 has two bit-shifting instructions: SHR and SHL which shift a register's contents right or left by a single bit. Both of these instructions also set the flag to the bit that got shifted "off the end" of the register.

We'll define two more helpers to do the actual 8-bit shifting and keep track of the bit that falls off the end:

(defun-inline get-bit (position integer)
  (ldb (byte 1 position) integer))

(defun-inline >>_8 (v)
  (values (ash v -1)
          (get-bit 0 v)))

(defun-inline <<_8 (v)
  (values (chop 8 (ash v 1))
          (get-bit 7 v)))

The instructions themselves are trivial:

(define-instruction op-shr (_ r)                    ;; SHR
  (setf (values (register r) flag)
        (>>_8 (register r))))

(define-instruction op-shl (_ r)                    ;; SHL
  (setf (values (register r) flag)
        (<<_8 (register r))))

Update: as fernly pointed out, this may not have been the original intended behavior. If we want to make our emulator match the original intent, it's pretty simple:

(define-instruction op-shr (_ rx ry)                 ;; SHR
  (setf (values (register ry) flag)
        (>>_8 (register rx))))

(define-instruction op-shl (_ rx ry)                 ;; SHL
  (setf (values (register ry) flag)
        (<<_8 (register rx))))

But I suspect that this will probably break some ROMs that rely on the incorrectly-documented behavior.

Logical Operations

Next up are the logical AND/OR/XOR instructions. We could define these like so:

(define-instruction op-and (_ destination source _)
  (zapf (register destination) (logand % (register source))))

(define-instruction op-or (_ destination source _)
  (zapf (register destination) (logior % (register source))))

(define-instruction op-xor (_ destination source _)
  (zapf (register destination) (logxor % (register source))))

This works, but aside from the name and the operation they're all identical. The next few groups of instructions are also going to be similar, so let's step back for a moment and see if we can abstract away the tedium.

Macro-Map

Instead of typing the same thing over and over, we'd like to just say what we want once. The traditional way to do this is with macrolet:

(macrolet
    ((define-logical-instruction (name function)
       `(define-instruction ,name (_ destination source _)
          (zapf (register destination)
                (,function % (register source))))))
  (define-logical-instruction op-and logand)
  (define-logical-instruction op-or logior)
  (define-logical-instruction op-xor logxor))

Now we're only writing out the actual functionality once instead of three times. That's better, but I'm still not satisfied. Using macrolet means I need to think of a name for the macro that I'm just going to use within this block and throw away, and naming things is hard.

What I really want to do here is just "map" a macro over a bunch of arguments and be done with it. I've played around a bit and ended up with a macro called macro-map that does just that:

(defmacro macro-map (lambda-list items &rest body)
  (with-gensyms (macro)
    `(macrolet ((,macro ,(ensure-list lambda-list) ,@body))
      ,@(iterate (for item :in items)
                 (collect `(,macro ,@(ensure-list item)))))))

macro-map takes a lambda list, list of arguments, and a body for the macro and builds the macrolet we wrote by hand earlier using a gensym for the name so we don't have to waste brain cells thinking of one. Now we can define our logical operations all at once:

(macro-map                                     ;; AND/OR/XOR
    (NAME    OP)
    ((op-and logand)
     (op-or  logior)
     (op-xor logxor))
  `(define-instruction ,name (_ destination source _)
    (zapf (register destination) (,op % (register source)))))

This actually ends up being one line of code longer than the copy/pasted version because I like to linebreak liberally, but the important thing is that we only say each thing we need to say once.

Notice how this almost starts to look like a table of data rather than code. I've taken advantage of the fact that the Lisp reader reads everything as uppercase by default and uppercased the "header" row (the lambda list) to make it stand out a bit more.

If you squint a little bit you might imagine how defining a set of related instructions could almost look like a page from a CPU manual.

Branching

Now that we've got a way to define batches of similar instructions without going crazy we can tackle the branching instructions.

Most CPUs have some form of "jump to some location if zero, otherwise continue" instruction to implement branching. The CHIP-8 needs to fit every instruction into two bytes, so it does something a bit simpler. Instead of an arbitrary jump on a condition, it has a series of skip instructions.

For example, SE Vx, 0 means "skip the next instruction if register X is zero". Skipping the next instruction is a simple as incrementing the program counter by an extra two bytes.

There are four variants of this instruction:

SE  Vx, Immediate → Skip when Vx equals Immediate
SNE Vx, Immediate → Skip when Vx does not equal Immediate
SE  Vx, Vy        → Skip when Vx equals Vy
SNE Vx, Vy        → Skip when Vx does not equal Vy

Let's use macro-map to write this out as a table of code:

(macro-map                                              ;; SE/SNE
    ((NAME            TEST X-ARG  X-FORM        Y-ARG         Y-FORM)
     ((op-se-reg-imm  =    (r 1)  (register r)  (immediate 2) immediate)
      (op-sne-reg-imm not= (r 1)  (register r)  (immediate 2) immediate)
      (op-se-reg-reg  =    (rx 1) (register rx) (ry 1)        (register ry))
      (op-sne-reg-reg not= (rx 1) (register rx) (ry 1)        (register ry))))
  `(define-instruction ,name (_ ,x-arg ,y-arg)
     (when (,test ,x-form ,y-form)
       (incf program-counter 2))))

After the macroexpansion we end up with things like:

(define-instruction op-sne-reg-imm (_ (r 1) (immediate 2))
  (when (not= (register r) immediate)
    (incf program-counter 2)))

We'll also need not= itself to avoid having to do messy things inside the macro body:

(defun-inline not= (x y)
  (not (= x y)))

Just one more group of instructions left (for this post).

Loads

The final group of instructions we'll look at for now is the LD family of loads. Normally we'd implement these first, but I wanted to introduce macro-map as gently as possible.

Most of the LD instructions simply take a value from a source and stick it into a destination, and we can implement them as a single setf form:

(macro-map                                              ;; LD
     (NAME           ARGLIST         DESTINATION   SOURCE)
     ((op-ld-i<imm   (_ (value 3))   index         value)
      (op-ld-reg<imm (_ r (value 2)) (register r)  value)
      (op-ld-reg<reg (_ rx ry _)     (register rx) (register ry))
      (op-ld-reg<dt  (_ r _ _)       (register r)  delay-timer)
      (op-ld-dt<reg  (_ r _ _)       delay-timer   (register r))
      (op-ld-st<reg  (_ r _ _)       sound-timer   (register r)))
  `(define-instruction ,name ,arglist
     (setf ,destination ,source)))

We haven't talked about the timers yet, so don't worry about them. I'm leaving them in so you can see how nice the macro-map is when you need to define lots of similar operations at once.

There are two other more interesting LD instructions which move data between multiple registers and memory.

LD [I], n loads consecutive bytes of memory with the contents of registers V0 through Vn, starting at wherever the index register is pointing. For example, LD [I], 2 would be:

     V0   V1   V2   V3   V4  ...
    ┌──┐ ┌──┐ ┌──┐ ┌──┐ ┌──┐
    │  │ │  │ │  │ │  │ │  │ ...
    └─┬┘ └─┬┘ └─┬┘ └──┘ └──┘
      └───┐└──┐ └─┐
          │   │   │
          ▼   ▼   ▼
    ┌───┬───┬───┬───┬───┐
    │   │   │   │   │   │   Memory
    └───┴───┴───┴───┴───┘
          ▲  ┌───┐
          └──│...│ Index Register
             └───┘

LD n, [I] does the opposite: it loads the contents of memory into the registers V0 through Vn.

Because we've used Lisp arrays for both the registers and memory, these instructions are really lovely to implement — they end up being just a single call to replace:

(define-instruction op-ld-mem<regs (_ n _ _)            ;; LD [I] < Vn
  (replace memory registers :start1 index :end2 (1+ n)))

(define-instruction op-ld-regs<mem (_ n _ _)            ;; LD Vn < [I]
  (replace registers memory :end1 (1+ n) :start2 index))

That's it for the LD instructions (for now) and for the instructions in general!

Future

With these instructions implemented (and with some stubs for the rest) we can load and run a ROM and it will push bytes around in memory and mostly work. In the next few posts we'll look at the next steps to getting a fully-functional emulator up and running, including:

Graphics and input
Sound
Debugging

Thanks to James Cash for reading a draft of this post.