Skip to content

Latest commit

 

History

History
178 lines (145 loc) · 7.5 KB

README.org

File metadata and controls

178 lines (145 loc) · 7.5 KB

Simple Lisp

Simple Lisp interpreter made in C with no dependencies.

Some good resources for learning about Lisp:

Building

$ git clone https://github.com/8dcc/sl
$ cd sl
$ make
...

Usage

The full manual can be found in the doc folder. It’s meant to be exported from .org to .texi, and then from .texi to .pdf, .html, .info, etc.

The test folder also contains the Lisp code used for testing the interpreter.

$ ./sl
sl> (+ 1 2 (- 5 4) (* 3 4) 10)
26

sl> (define my-addition +)
<primitive 0x55d31f51ba21>

sl> (define my-global 10.0)
10.000000

sl> (define my-lambda
      (lambda (x y)
        (* my-global (my-addition x y))))
<lambda>

sl> (my-lambda 3 4)
70.000000

sl> (define defmacro
      (macro (name formals body)
        (list 'define name (list 'macro formals body))))
<macro>

sl> (defmacro inc (var)
      (list 'define var (list '+ var 1)))
<macro>

sl> (inc my-global)
11.000000

sl> my-global
11.000000

Overview of the code

This Lisp has a few components that are responsible for their own data. This is the basic REPL process:

  1. The user input is read using read_expr(), defined in read.c. This function will read a single Lisp expression across lines, and save the data in an allocated string.
  2. The raw user input is converted into an array of Token structures using the tokenize() function, which calls the static function get_token(). This step might be redundant for a simple language as Lisp, but I decided to do it anyway. After this step the Token array could be printed using token_print(), if needed. All these functions are defined in lexer.c.
  3. The Token array is converted into a linked list of Expr structures using the parse() function from parser.c. This linked list is essentially the Abstract Syntax Tree (AST). At this point, nothing has been evaluated; it should just be a different representation of the user input. All the values allocated by tokenize() have been copied into the AST instead of reused, so they can be freed safely.
  4. We evaluate the expression using eval(), defined in eval.c. This function will return another linked list of Expr structures, but just like parse(), it will not reuse any data in the heap, so the old Expr* can be freed safely. This is the rough evaluation process:
    1. Before evaluating the expression, it checks if the current expression is a Special Form, which are evaluated differently from procedure calls. For example quote, lambda or if. The arguments of these special forms are passed to the C primitives (defined in primitives.c) un-evaluated.
    2. If it wasn’t a special form, the expression type is checked. Some expressions like numbers evaluate to themselves, while some others don’t.
    3. If the expression was a symbol, the environment is searched to find the expression bound to that symbol. This is done with the env_get() function, defined in env.c.
    4. If the expression was a list, it is evaluated as a procedure or macro call. This is the evaluation process of a call:
      1. The first element of the list is evaluated. It should return either a lambda, a macro or a C primitive.
      2. If the function was a macro, the arguments are not evaluated and they are passed to macro_call(), defined in lambda.c. From there, the macro is expanded with macro_expand() and the expanded expression is evaluated and returned. For more information on how macros behave in this Lisp, see Emacs Lisp manual.
      3. If the function was not a macro, each argument should be evaluated before calling the function. This is done with the eval_args() static function. Then the function is applied to the arguments with apply(), also defined in eval.c.
      4. The function type is checked inside apply(). If it’s a C primitive, the function pointer stored in the Expr is called with the arguments we got from eval(). If it’s a lambda, it is called using lambda_call(), defined in lambda.c.
      5. The lambda_call() function operates on the LambdaCtx structure of the Expr. It binds each formal argument to the lambda’s environment; sets the parent environment (so the body can access globals); and evaluates each expression in the body in order, returning the last one.
  5. After that, we print the evaluated expression using expr_print(), defined in expr.c.

Todo list

These are some things that need to be done. Feel free to make a PR if you want to contribute.

Better closures

In the following example:

(define get-inner
  (lambda (a)
    (lambda (b)
      (+ a b))))

(define inner (get-inner 10))
(inner 20) ; Error: Unbound symbol `a'

The function inner can’t access a, even though it was defined when get-inner returned the lambda. This happens because the parent environment of inner is set whenever it gets called, not when it’s defined. This means that it can access global symbols, and even a as long as inner was called within the body of get-inner.

Tail-call optimization

The following code defines a recursive procedure that performs an iterative process.

(defun sum-iter (i end total)
  (if (> i end)
      total
      (sum-iter (+ i 1)
                end
                (+ total i))))

(sum-iter 1 5 0) ; 15

Even though that procedure is recursive, since it calls itself, the process is iterative, because it has all the necessary information for continuing the computation in its parameters. The interpreter doesn’t need to keep track of where it was called from, it can just jump to the start of the function with the new parameters and no information will be lost. This jump optimization is called tail-call optimization, and an interpreter with this feature is called tail-recursive. For more information, see section 1.2.1 of SICP.