What the Hell is Symbolic Computation?

Posted on June 29th, 2016.

I've been reading a lot of Lisp books lately, some more advanced than others. All of the introductory books I've seen cover the idea of symbolic computation, but most of them breeze past it with just a few pages about "symbols" and "quoting". For many programmers coming from non-Lisp languages these are new, foreign concepts that don't really map back to anything in their previous experience. But the books seem to expect you to understand this things after just a couple of paragraphs.

One book that does spend more time on this is the appropriately-named Common Lisp: A Gentle Introduction to Symbolic Computation. If you're looking for a good introductory Lisp book, that's the one I'd recommend. However, it's quite a long book and starts at the very basics (you don't even write code for the first few chapters — you program with drawings). If you already know some Lisp then you might find yourself bored if you go through the entire thing just for the symbolic computation bits.

This post is an attempt to explain what symbols actually are and what quoting does. You'll get the most out of it if you're already familiar with the basics of Lisp (i.e. you aren't scared by the parentheses). But if you already know what something like this would print, you'll probably be bored:

(let ((foo ''a))
  (print foo)
  (print 'foo)
  (print (car foo))
  (print (cdr foo)))
  1. Disclaimer
  2. The REPL
  3. Loop
  4. Print
  5. Read
  6. The RPL
  7. Reading Other Data
  8. Readable Printing
  9. Reading Non-Lisp
  10. Evaluating Non-Lisp
  11. Reading Lisp
  12. Symbols
  13. Evaluating Lisp
    1. Strings and Numbers
    2. Basic Lists
    3. Variables
    4. Special Forms
    5. An Ouroborromean Paradox?
    6. Exercises
  14. Quote
    1. The Special Operator
    2. The Read Macro
    3. Exercises
  15. A Quick Recap
  16. Where to Go From Here

Disclaimer

Before we start I'll get this warning out of the way: I'm going to gloss over a lot of gory details to get at the important core ideas.

When I say "this thing is implemented like so" it's safe to assume that in reality things are messier and more complicated and will vary between implementations and dialects. My goal is to give you some intuition for the basic ideas, not to exhaustively cover how a particular implementation works.

I'll be using Common Lisp for the examples, but the concepts apply equally to Scheme, Clojure, and other Lisps too.

With that out of the way, let's dive in!

The REPL

The Read-Eval-Print-Loop that so many languages these days have built-in is a good place to start in our exploration. You're probably familiar with the idea of a REPL from languages like Python, Ruby, Javascript, etc. In Python it looks like this:

$ python
>>> if True:
...     print "Yes"
... else:
...     print "No"
...
Yes

A handwavey definition of a REPL could be something like this:

A REPL is a program that lets you type in code. It runs that code, prints out the result, and loops back to the beginning to let you type in more.

This is a good first start, but to understand symbols we're going to need to get a much clearer, more precise handle on what exactly each letter in "REPL" means. We'll start backwards and outside-in with the simplest component.

Loop

The "L" in "REPL" stands for "Loop", and it's the easiest part to understand. A REPL doesn't just process one bit of code and then exit, it loops forever (or until you tell it to quit).

There's not a lot to say here, except that we'll go ahead and lump together the extra busywork a REPL needs to do into this step so we can ignore it for the rest of the post. This includes things like:

Print

The next letter is "P" for "Print". Like loop this should be familiar territory, but let's nail down exactly what it does. The job of the print function is to:

For example: if we give the integer 1 to Python's print, it produces a string of two characters (the digit 1 and a newline) and writes it to the terminal. There are a lot of little details that aren't important here (line endings, string encoding, where the output actually goes, etc). The important part that you should fix in your mind is this:

print takes an object, converts it into a string, and outputs it.

Note: I've called print a function here, and I'll do the same for read and eval later on. This is correct for Common Lisp, but not necessarily for languages like Python or Javascript. For the purposes of this post you can just imagine them as functions and realize that in the real world they're more like "phases" that have a lot more complex machinery under them.

Read

We'll jump over to the other side of the acronym now. "R" is for "Read", and this is where things start to get tricky.

In most non-Lisp languages the "read" and "eval" phases of the REPL tend to get blurred together, but if you want to understand symbolic computation in Lisp you absolutely must keep these two parts cleanly separated in your mind.

We can define read as almost the polar opposite of print. read is a function whose job is to:

Notice how this mirrors print:

This is the second definition you need to fix in your brain:

print takes an object, converts it into a string, and outputs it.
read takes a string, converts it into an object, and returns it.

If we read in the string 1.0 (that's three ASCII characters), we'll get some kind of floating point number object back. Sometimes there are many ways to represent the same object in memory. For example: reading the strings 1.0 and 1.0000 will both produce equivalent floating point objects representing the number "one".

Side note (if you're not feeling pedantic feel free to skip this): the word "equivalent" here is a bit tricky. When you read two strings like 1.0 and 1.000 you might get back two pointers to the same hunk of memory, or you might get back two pointers to two separate hunks of memory, or even just a machine word representing the floating point number directly. It depends on the language (and even the implementation). What I mean by "equivalent" is that there's no way to tell which value came from which input. You can't say "this hunk of memory must have come from the string 1.000" because read has "sanitized" the input by that point.

The RPL

So now we've got read, print, and loop. read takes characters from a stream and produces objects, and print takes objects and produces characters and writes them to a stream. We're still missing a letter, but what would happen if we just hook these three up together right now?

CL-USER> (loop (print (read)))
> 1
1
>    1.0
1.0
> 1.00000
1.0

I've added a little > prompt to the output here to make it clear where I'm typing.

Essentially what we've got here is a little pretty-printer! Notice how things like extra whitespace and useless zeroes got stripped out. This isn't terribly useful, but at least it's doing something.

However, it's important to realize where the "prettifying" actually happened: read actually did the hard work! By the time print got its grubby little paws on those one-point-zeroes the extra whitespace and digits were long gone. Maybe we should give credit where it's due and call this a "pretty-reader" instead.

Reading Other Data

Up to this point I've been using numbers as examples because they're simple, but of course we can read and print lots of other kinds of things. Lists and strings (and lists of strings!) are some common examples:

CL-USER> (print (read))
> ("hello"       "world")
("hello" "world")

$ python
>>> [    "hello"  , "world"]
['hello', 'world']

There are lots of other kinds of things you can read and print, depending on the particular language. What's important here is the strict separation of read and print.

Readable Printing

Before we move on, one thing we should talk about is the idea of "readable printing". This idea becomes clear almost immediately when you work at the Python REPL:

$ python
>>> "foo\nbar"
'foo\nbar'
>>> print "foo\nbar"
foo
bar

Why was the first string printed with quotes and an escaped newline, and the second without quotes and an actual newline?

When we defined print earlier, one of the steps was:

Much like there are many character strings that result in the same object (e.g. 1.0 and 1.000) there are often many ways we can represent a particular object as a series of characters. For example: we can represent the number two as 2, or 10 in binary, or two in English. We can represent a string as the actual characters in it, or we could represent it "escaped" like Python does in the first line.

The core idea here is that of "readable printing": printing the characters you would need to give to read to produce this object.

This is different from what I'll call "aesthetic printing": printing characters for this object which look nice to a human.

In Python these two kinds of printing are seen in the separate __str__ and __repr__ methods. In Common Lisp it's controlled by the *print-readably* variable. Other languages have different mechanisms.

Reading Non-Lisp

So we know how reading and printing data works, but you can do more than that at a typical REPL. Let's revisit the first Python example:

$ python
>>> if True:
...     print "Yes"
... else:
...     print "No"
...
Yes

We know that read takes a series of characters and turns them into an object. When we read this series of characters, what comes out the other end? For most languages (Python, Ruby, Javascript, Scala, etc) the answer is some kind of abstract syntax tree represented as a data structure. You might imagine something like this buried deep within the guts of the Python interpreter:

class IfStatement(ASTNode):
    def __init__(self, test_node, then_node, else_node):
        self.test_node = test_node
        self.then_node = then_node
        self.else_node = else_node

The details would surely be a lot hairier in the real world, but the general idea is that the stream of characters representing your code gets read in and parsed into some kind of data structure in memory.

This is all well and good, but our example printed "Yes", which certainly isn't a representation of an AST node. Where did that come from? To answer that we'll need the final letter in the acronym.

Evaluating Non-Lisp

The "E" in REPL stands for "Eval".

This is the last time we'll consider Python in this post, because we're about to move past it to more powerful ideas. In Python, the purpose of the eval phase of the REPL is to:

This is fairly straightforward, at least for our simple if statement. Python will evaluate the test_node part of the if, and then evaluate one of the other parts depending on whether the test was truthy or falsey.

Of course the actual process of evaluating real-life Python code is much more complicated than I've glossed over here, but this is good enough for us to use as a springboard to dive into the beautiful world of Lisp.

Reading Lisp

When we talked about reading the Python if statement I said that Python would parse it into some kind of data structure representing an AST node. I didn't point you at the actual data structure in the Python source code because I have no idea where it actually is, and it's probably not nearly as simple as the one I sketched out.

Now that we're talking about Lisp, however, I really can show you exactly what an if statement in Lisp reads as! I'll use read-from-string to make it a bit easier to see what's input and what's output, but it would work the same way with just plain old read and input from standard in. Here we go:

CL-USER> (print
           (read-from-string "(if t (print 1) (print 2))"))

(IF T (PRINT 1) (PRINT 2))

CL-USER> (type-of
           (read-from-string "(if t (print 1) (print 2))"))

CONS

Instead of some kind of AST node data structure buried deep within the heart of our implementation we get... a list! An ordinary, garden-variety, vanilla list made of good old trusty cons cells!

The list we get contains four elements, two of which are themselves lists. But what do these lists actually contain?

The numbers are simple enough — we saw them earlier. read turns a series of digits into a number somewhere in memory, and print spits out a series of characters that represent the number:

CL-USER> (type-of (read-from-string "100"))

(INTEGER 0 4611686018427387903)

The string 100 does indeed represent an integer between zero and four squillion-something.

But what are those other things in our list? The if and t and print? Are they some magic keywords internal to the guts of our Lisp that print is just outputting nicely for our benefit?

CL-USER> (type-of (read-from-string "if"))

SYMBOL

Buckle up — it's time to learn about symbols.

Symbols

A symbol in Lisp is a simple data structure that turns out to be extremely useful. You can imagine it being defined something like this:

(defstruct symbol
  name
  function
  value
  property-list
  package)

The name of a symbol is a string of characters representing the symbol's name:

CL-USER> (type-of (read-from-string "if"))

SYMBOL

CL-USER> (symbol-name (read-from-string "if"))

"IF"

Symbols get created (usually) when they are read for the first time. If you read a sequence of characters that matches the name of an existing symbol (let's ignore the lowercase/uppercase thing for now), read just returns that existing symbol:

CL-USER> (eq (read-from-string "cats")
             (read-from-string "cats"))

T

Here read saw the string cats and created a new symbol somewhere in memory with CATS as its name. The second call to read-from-string read in the string cats, saw an existing symbol with the matching name, and just returned it straight away. That's why eq returned true — the two read calls returned pointers to the exact same hunk of memory.

(read also "interned" the symbol into the current package. We're going to ignore the package system for this post — it's not important to the core topic. We'll also ignore the property list.)

So to recap:

So what does print do with a symbol? It just prints the name!

CL-USER> (print (read-from-string "cats"))

CATS

Simple enough. Note that this is readably printed — if we feed it back to read we'll get an equivalent symbol back (the exact same symbol, in fact).

But wait! If we read symbols in and they're just normal vanilla structures, then how do we actually do anything in Lisp? How does anything actually run?

Evaluating Lisp

Now we need to go back to eval. In the Python example I handwaved a bit and said that eval "ran the code" but didn't really explain how that happened. Now that we're in the warm embrace of Lisp we can nail things down a lot more cleanly.

eval (in Lisp) is a function whose job is to:

I know, I know, that seems as clear as a brick. But if we can actually define what "something useful" means it will be a lot more helpful. Lisp's eval follows a few simple rules to produce its output.

Strings and Numbers

First of all, eval will just pass numbers and strings straight through without touching them at all:

(defun eval (thing)
  (cond
    ((numberp thing) thing)
    ((stringp thing) thing)
    ; ...
    (t (error "What the hell is this?"))))

This makes sense because there's not a lot more useful stuff you can do with a number or a string on its own. I suppose you could write a language where the evaluation phase ran all numbers through the absolute value function because you wanted to remove all negativity from programming or something, but that seems a bit esoteric.

Basic Lists

What about lists, like our friend the if statement (a list of four things)? Well the first thing anyone learns about Lisp is the funny syntax. In most cases, when passed a list eval treats the first element as the name of a function, and the rest as the arguments. It will evaluate the arguments and call the function with them:

(defun eval (thing)
  (cond
    ((numberp thing) thing)
    ((stringp thing) thing)
    ((typep thing 'list)
     (destructuring-bind (head . arguments) thing
       (apply (symbol-function head)
              (mapcar #'eval arguments))))
    ; ...
    (t (error "What the hell is this?"))))

At this point we know enough to understand exactly what happens when we say something like (+ 1 2) at a Lisp REPL!

Whew! That seems like a lot, but each piece is pretty small. Go slowly through it and make sure you really understand each step in the preceding list before you move on.

If you don't believe me that this is really how it works, try this:

CL-USER> (setf (symbol-function (read-from-string "square"))
               (lambda (x) (* x x)))

#<FUNCTION (LAMBDA (X)) {1005DCF2FB}>

CL-USER> (square 4)

16

We set the function slot of the symbol square to be a function that squares its argument, and then we can run it like any other function.

Trace through the evaluation of that last (square 4) call before moving on.

Variables

What about symbols that aren't at the front of a list? For example:

CL-USER> (defvar ten 10)

10

CL-USER> (square ten)

100

In this case, everything proceeds as it did before until the recursive call to eval the argument to square. The argument is just a bare symbol (not a number or a list or a string), so we need another rule in our eval function:

(defun eval (thing)
  (cond
    ((numberp thing) thing)
    ((stringp thing) thing)
    ((symbolp thing) (symbol-value thing)) ; NEW
    ((listp thing)
     (destructuring-bind (head . arguments)
       (apply (symbol-function head)
              (mapcar #'eval arguments))))
    ; ...
    (t (error "What the hell is this?"))))

If eval sees a symbol on its own, it will just return whatever is in the value slot of the symbol data structure. Simple enough!

Special Forms

If course there are a few things we haven't talked about yet. First of all: there are some "magic" symbols that aren't functions, but are hardcoded into the evaluation function to do certain things. In Common Lisp they're called "special operators" and there are twenty-five of them.

Let's add a special rule to our eval function to handle one of them (our old friend if):

(defun eval (thing)
  (cond
    ((numberp thing) thing)
    ((stringp thing) thing)
    ((symbolp thing) (symbol-value thing))
    ((listp thing)
     (destructuring-bind (head . arguments) thing
       (cond
         ((eq head (read-from-string "if")) ; NEW
          (destructuring-bind (test then else) arguments
            (if (eval test)
              (eval then)
              (eval else))))
         (t
          (apply (symbol-function head)
                 (mapcar #'eval arguments))))))
    ; ...
    (t (error "What the hell is this?"))))

At this point your head might be starting to hurt. How have we just used if to define itself? Is anything here actually real, or is Lisp just a serpent eating its own tail forever?

An Ouroborromean Paradox?

If the "metacircularity" (god, I hate that word) of what we've just done bothers you, remember that this eval function here is just a sample of how you can write it in Lisp. You could also write it in some other language, like C or Java (see: ABCL) or really anything. That would let you "bottom out" to something that's not Lisp.

But really, is that any less circular? You still need some language to run the thing! Why does an eval written in C seem less "cheaty" than one written in Lisp? Because it compiles to machine code? Try (disassemble #'sb-impl::simple-eval-in-lexenv) in SBCL and you'll get a few thousand bytes of x86 assembly that was originally some Lisp code much like ours.

My advice is to just let it flow over you, even if you feel a bit uneasy at the sight of something like this. Eventually as you become more at home in your chosen Lisp, writing high-level code one moment and disassembleing a function the next, you'll make peace with the serpent.

Exercises

There are quite a few things in eval that we haven't covered here, like the other twenty-four special operators, lexical variables, lambda forms, and more. But this should be enough for you to wrap your head around symbols and how they're used in Lisp.

Before we move on to the final part of this story, make sure you can figure out the answers to these questions:

  1. What does the single character 1 read as? What does it evaluate to?
  2. What do the characters "hello\"world" read as? What do they evaluate to? How would the result be printed readably? How would the result be printed aesthetically?
  3. What do the characters (list (+ 1 2) ten) read as? What does it evaluate to? How does each step of the evaluation work? Trace out each of the calls to eval on paper, with their arguments and their results. What does the final result print as?
  4. Use symbol-function to see what's in the function slot of the symbol car. How does your Lisp print that thing?
  5. Try to look up the function slot of a symbol without a function. What happens?
  6. Use symbol-function to see what's in the function slot of the symbol if. What happens? Does that seem right? If so, why?
  7. Go to the HyperSpec page for symbol and search for "the consequences are undefined if". Try all of those things and find out what interesting things break in your Lisp.

If you're not using Common Lisp, port these exercises to your language.

Quote

Up to now we've been using the clumsy (read-from-string "foo") form to get ahold of symbols, but there's an easier way called "quoting".

Quoting often confuses new Lisp programmers, and they'll end up randomly adding/removing quotes to something until it appears to do what they want. Now that you've got read, eval, and print firmly separated in your mind, quote will be much easier to understand.

The Special Operator

Quoting actually has two distinct components: one that affects read and one that affects eval. In most books the distinction between these two only gets barely a sentence or two, which is probably why beginners are so confused. Let's look at the eval side of things first.

quote is one of the twenty-five special operators. All it does it pass its (single) argument back from eval untouched. Normally the argument itself would have been evaluated, but quote prevents that:

(defun eval (thing)
  (cond
    ((numberp thing) thing)
    ((stringp thing) thing)
    ((symbolp thing) (symbol-value thing))
    ((listp thing)
     (destructuring-bind (head . arguments) thing
       (cond
         ((eq head (read-from-string "if"))
          (destructuring-bind (test then else) arguments
            (if (eval test)
              (eval then)
              (eval else))))
         ((eq head (read-from-string "quote")) ; NEW
          (first arguments))
         (t
          (apply (symbol-function head)
                 (mapcar #'eval arguments))))))
    ; ...
    (t (error "What the hell is this?"))))

I know that seems strange, but let's look at it in action:

CL-USER> (+ 1 2)

3

CL-USER> (quote (+ 1 2))

(+ 1 2)

Our first example is the same one we used in the previous section:

The second example uses quote:

Just for fun, let's ride the serpent and use quote in the definition of itself inside eval to clean things up a bit:

(defun eval (thing)
  (cond
    ((numberp thing) thing)
    ((stringp thing) thing)
    ((symbolp thing) (symbol-value thing))
    ((listp thing)
     (destructuring-bind (head . arguments) thing
       (cond
         ((eq head (quote if))
          (destructuring-bind (test then else) arguments
            (if (eval test)
              (eval then)
              (eval else))))
         ((eq head (quote quote)) ; WHAT IN THE
          (first arguments))
         (t
          (apply (symbol-function head)
                 (mapcar #'eval arguments))))))
    ; ...
    (t (error "What the hell is this?"))))

That (quote quote) looks just completely bonkers, right? Step through it piece by piece to figure it out:

Try these in a Lisp REPL if you're not 100% sure of the answers. The functions read-from-string, eval, and type-of will come in handy.

The Read Macro

Now let's look at the final piece of the puzzle. ' (that's a single ASCII quote character) is a read macro. Don't worry if you don't know exactly what that means right now. The important part is that when read sees a quote character, it returns a two-element list containing the symbol quote and the next form, which is read normally.

The four characters 'foo go into read and come out as a two-element list containing: the symbol quote and the symbol foo. Again, because it's important: this happens at read time. By the time eval sees anything it's already been turned into that two-element list.

It can be a bit slippery to get a grip on this, because print will "helpfully" print a list of (quote whatever) as 'whatever! But we can use trusty old car and cdr to see what's really going on:

CL-USER> (read-from-string "'foo")

'FOO

CL-USER> (type-of (read-from-string "'foo"))

CONS

CL-USER> (car (read-from-string "'foo"))

QUOTE

CL-USER> (cdr (read-from-string "'foo"))

(FOO)

That's really all there is to it: 'form is returned by read as (quote form). Then that gets passed along to eval, which sees the quote special operator and passes the argument back untouched.

Note that form doesn't have to be a symbol:

Exercises

That pretty much wraps up quote. Once you've cleanly separated read, eval, and print in your mind it becomes a lot less mystical (or perhaps more, depending on your point of view).

If you want to twist your brain just a bit more, try these exercises:

  1. What do the characters foo read in as? What do they evaluate to? How does it print?
  2. What do the characters 'foo read in as? What do they evaluate to? How does it print?
  3. What do the characters ''foo read in as? What do they evaluate to? How does it print?
  4. What do the characters '1.0 read in as? What do they evaluate to? How does it print?
  5. What do the characters '(1 2 3) read in as? How many elements are in the list? What does it evaluate to? How many elements are in that list?
  6. Update our implementation of eval to use the ' read macro instead of quote.
  7. Certain symbols like t and nil are special — they evaluate to themselves like numbers or strings. Add this to eval.
  8. What do the characters nil read in as? What is the type of that object? What do they evaluate to? What type is that result?
  9. What do the characters 'nil read in as? What is the type of that object? What do they evaluate to? What type is that result?
  10. What do the characters ''nil read in as? What is the type of that object? What do they evaluate to? What type is that result?

A Quick Recap

If you've gotten this far and understood everything, you should have a good grasp of how symbolic computation works. Here are the main things I hope you've taken away from this post:

Where to Go From Here

Now that you've got a firmer grasp on what the hell symbols actually are, there are a lot of cool things you might want to check out that will show you what you can do with them: