Fun with Macros: Do-File

Posted on April 19th, 2022.

It's been a while, but it's time to take a look at another fun little Common Lisp macro with some interesting things inside it: do-file.

  1. Usage
  2. Implementation
    1. Let Over Defmacro
    2. &rest and &key
    3. Macros Using Macros
    4. Don't Loop
    5. Repetition Allergies
  3. Result

Usage

The macro we'll be taking a look at today is called do-file. It's used to open a file and iterate over the contents using a reader function, saving you some tedious boilerplate.

First let's look at some examples of how you could use it. Processing each line of a file is the default:

(do-file (line "foo.txt")
  (unless (string= "" line)
    (write-line (string-upcase line))))

Using a different reader function and another macro to gather data from inside the iteration:

(gathering
  (do-file (n :reader #'read-integer)
    (when (primep n)
      (gather n))))

Passing along options to the underlying open, and returning early:

(do-file (form "foo.lisp" :reader #'read :external-format :EBCDIC-US)
  (when (eq form :stop)
    (return :stopped-early))
  (print form))

All of these could of course be done in other ways. You could have a separate function that reads the file into a sequence and then pass that to mapcar or something else, but it can be wasteful to cons up the entire list if you're only going to process items and don't need to retain then (or if you're going to stop early).

You could also write a mapc-file that takes a function instead of making this a macro, but sometimes it's nice to not have to wrap things in a thunk. It's probably worth having that function as an additional tool in the toolbox though!

Implementation

Here's the full implementation of the macro:

(let ((eof (gensym "EOF")))
  (defmacro do-file ((symbol path
                      &rest open-options
                      &key (reader '#'read-line) &allow-other-keys)
                     &body body)
    "Iterate over the contents of `file` using `reader`.

    During iteration, `symbol` will be set to successive values read from the
    file by `reader`.

    `reader` can be any function that conforms to the usual reading interface,
    i.e. anything that can handle `(read-foo stream eof-error-p eof-value)`.

    Any keyword arguments other than `:reader` will be passed along to `open`.

    If `nil` is used for one of the `:if-…` options to `open` and this results
    in `open` returning `nil`, no iteration will take place.

    An implicit block named `nil` surrounds the iteration, so `return` can be
    used to terminate early.

    Returns `nil`.

    Examples:

      (do-file (line \"foo.txt\")
        (print line))

      (do-file (form \"foo.lisp\" :reader #'read :external-format :EBCDIC-US)
        (when (eq form :stop)
          (return :stopped-early))
        (print form))

      (do-file (line \"does-not-exist.txt\" :if-does-not-exist nil)
        (this-will-not-be-executed))

    "
    (let ((open-options (alexandria:remove-from-plist open-options :reader)))
      (alexandria:with-gensyms (stream)
        (alexandria:once-only (path reader)
          `(when-let ((,stream (open ,path :direction :input ,@open-options)))
             (unwind-protect
                 (do ((,symbol
                       (funcall ,reader ,stream nil ',eof)
                       (funcall ,reader ,stream nil ',eof)))
                     ((eq ,symbol ',eof))
                   ,@body)
               (close ,stream))))))))

There are a few interesting things to talk about here.

Let Over Defmacro

The very first line is unusual: instead of the defmacro being the top level form, we wrap it in a let to generate one single unique EOF sentinel object:

(let ((eof (gensym "EOF")))
  (defmacro do-file ()))

We could put the let inside the macro, but then we'd be generating a separate EOF object for every use of the macro, which is wasteful.

&rest and &key

Note how the argument list of the macro takes both &rest and &key arguments, and uses &allow-other-keys to let the macro take arbitrary keyword arguments

(defmacro do-file ((symbol path
                    &rest open-options
                    &key (reader '#'read-line) &allow-other-keys)
                   &body body)
  (let ((open-options (alexandria:remove-from-plist open-options :reader)))(when-let ((,stream (open ,path :direction :input ,@open-options))))))

We pass along any keyword arguments we get (aside from the special :reader argument for this macro) to open. Using &allow-other-keys means we don't need to hardcode all the possible options to open, and also allows for additional implementation-specific options to be passed to open if the user wants.

We could have omitted the keyword arguments entirely, taken the arguments as a raw &rest, and pulled out :reader ourselves with getf. But doing it this way means we don't have to fiddle around doing that, and also can also provide slightly nicer documentation in an editor when it shows the macro's argument list in the status bar. We'll also get a nicer error if we accidentally pass an odd number of keyword arguments.

One more thing before we move on: note the extra level of quoting for the (reader '#'read-line) default value. It's important to remember that this is a macro, and so when someone writes (do-file (… :reader #'foo) …) the macro isn't getting the function foo because it's not evaluated yet, it's getting the list (function foo). But the default value is evaluated when the argument is missing, so we need the extra layer of quoting to make sure the result makes sense and matches what we'd be getting normally.

Macros Using Macros

We use with-gensyms and once-only from Alexandria to maintain good hygiene in the macro. We also use when-let to avoid some more boilerplate:

(defmacro do-file ()
  (alexandria:with-gensyms (stream)
    (alexandria:once-only (path reader)
      `(when-let ((,stream (open ,path :direction :input ,@open-options)))
         (unwind-protect
             (do …)
           (close ,stream))))))

Don't Loop

Finally we get to the meat of the macro:

(do ((,symbol
     (funcall ,reader ,stream nil ',eof)
     (funcall ,reader ,stream nil ',eof)))
    ((eq ,symbol ',eof))
  ,@body)

Unfortunately we need to use the tedious do instead of loop here to avoid an annoying bug: if we expanded into a loop call, and the user is calling this from their own loop, and they use (loop-finish) in the body code, then it would finish our loop instead of their loop, which would very confusing.

Imagine the user wrote this very contrived example:

(defun find-the-cat (&rest paths)
  (loop
    :with result = nil
    :for (path . remaining) :on paths
    :for i :from 1
    :do (do-file (line path)
          (when (string= line "meow")
            (setf result path)
            (loop-finish))) ;; This should obviously go to the finally below.
    :finally
    (when result
      (format t "Found cat after searching ~D files (did not search ~D other~:P)."
              i (length remaining))
      (return result))))

If do-file expanded into a loop form, then the (loop-finish) would only terminate that loop.

The same issue kind of applies with the implicit block named nil around do. But this is much less surprising for a macro named do-…, and we've documented it in the docstring, so that's probably okay.

Repetition Allergies

Using do here is a little annoying because the init form and the step form are exactly the same. If you're allergic to repeating yourself you could use #n= and #n# reader macros to get around it:

(do ((,symbol #1=(funcall ,reader ,stream nil ',eof) #1#))
    ((eq ,symbol ',eof))
  ,@body)

I find this more confusing than helpful, but to each their own.

Result

We've got a nice little macro for easily iterating over files piece by piece. It can take any reader function that conforms to the usual (read-foo stream eof-error-p eof-value) interface, which means we can write our own reader functions that will compose nicely with the macro.

We'll end with an exercise for the reader: figure out how to support declarations correctly. For example:

(do-file (n "numbers.txt" :reader #'read-fixnum)
  (declare (type fixnum n))
  (when (primep n)
    (collect (* n n))))

Hint: you'll need to deal with the sentinel value a bit differently so it doesn't contaminate the type of the bound variable.