(How to Write a (Lisp) Interpreter (In Python)) (2010)

Why does understanding compilers and interpreters matter? This article guides you step-by-step through building a simple yet fully functional Scheme (a Lisp dialect) interpreter in Python.
Why does this matter? As Steve Yegge said, "If you don't know how compilers work, then you don't know how computers work." Yegge describes 8 problems that can be solved with compilers (or equally well with interpreters, or with Yegge's typical heavy dosage of cynicism).
Scheme syntax is different from most other programming languages. Consider:
Java has a wide variety of syntactic conventions (keywords, infix operators, three kinds of brackets, operator precedence, dot notation, quotes, commas, semicolons), but Scheme syntax is much simpler:
Java:
if(x.val() > 0) {
return fn(A[i] + 3 * i, new String[] {"one", "two"});
}
Scheme:
(if(> (val x) 0)
(fn (+ (aref A i) (* 3 i))
(quote(one two))))
In this page we will cover all the important points of the Scheme language and its interpretation (omitting some minor details), but we will take two steps to get there, defining a simplified language first, before defining the near-full Scheme language.
Here is a table of all the allowable expressions:
| Expression | Syntax | Semantics and Example | |---|---|---| | variable reference | symbol | A symbol is interpreted as a variable name; its value is the variable's value. Example: r ⇒ 10 (assuming r was previously defined to be 10) | | constant literal | number | A number evaluates to itself. Examples: 12 ⇒ 12 or -3.45e+6 ⇒ -3.45e+6 | | conditional | (if test conseq alt) | Evaluate test; if true, evaluate and return conseq; otherwise alt. Example: (if (> 10 20) (+ 1 1) (+ 3 3)) ⇒ 6 | | definition | (define symbol exp) | Define a new variable and give it the value of evaluating the expression exp. Examples: (define r 10) | | procedure call | (proc arg...) | If proc is anything other than one of the symbols if, define, or quote then it is treated as a procedure. Evaluate proc and all the args, and then the procedure is applied to the list of arg values. Example: (sqrt (* 2 8)) ⇒ 4.0 |
In the Syntax column of this table, symbol must be a symbol, number must be an integer or floating point number, and the other italicized words can be any expression. The notation arg... means zero or more repetitions of arg.
program ➡➡ abstract-syntax-tree ➡ ➡ result
And here is a short example of what we want parse and eval to be able to do (begin evaluates each expression in order and returns the final one):
>>> program = "(begin (define r 10) (* pi (* r r)))"
>>> parse(program)
['begin', ['define', 'r', 10], ['*', 'pi', ['*', 'r', 'r']]]
>>> eval(parse(program))
314.1592653589793
Symbol = str # A Scheme Symbol is implemented as a Python str
Number = (int, float) # A Scheme Number is implemented as a Python int or float
Atom = (Symbol, Number) # A Scheme Atom is a Symbol or Number
List = list # A Scheme List is implemented as a Python list
Exp = (Atom, List) # A Scheme expression is an Atom or List
Env = dict # A Scheme environment is a mapping of {variable: value}
def tokenize(chars: str) -> list:
"Convert a string of characters into a list of tokens."
return chars.replace('(', ' ( ').replace(')', ' ) ').split()
Here we apply tokenize to our sample program:
>>> program = "(begin (define r 10) (* pi (* r r)))"
>>> tokenize(program)
['(', 'begin', '(', 'define', 'r', '10', ')', '(', '*', 'pi', '(', '*', 'r', 'r', ')', ')', ')']
Our function parse will take a string representation of a program as input, call tokenize to get a list of tokens, and then call read_from_tokens to assemble an abstract syntax tree. read_from_tokens looks at the first token; if it is a ')' that's a syntax error. If it is a '(', then we start building up a list of sub-expressions until we hit a matching ')'. Any non-parenthesis token must be a symbol or number. We'll let Python make the distinction between them: for each non-paren token, first try to interpret it as an int, then as a float, and if it is neither of those, it must be a symbol. Here is the parser:
def parse(program: str) -> Exp:
"Read a Scheme expression from a string."
return read_from_tokens(tokenize(program))
def read_from_tokens(tokens: list) -> Exp:
"Read an expression from a sequence of tokens."
if len(tokens) == 0:
raise SyntaxError('unexpected EOF')
token = tokens.pop(0)
if token == '(':
L = []
while tokens[0] != ')':
L.append(read_from_tokens(tokens))
tokens.pop(0) # pop off ')'
return L
elif token == ')':
raise SyntaxError('unexpected )')
else:
return atom(token)
def atom(token: str) -> Atom:
"Numbers become numbers; every other token is a symbol."
try:
return int(token)
except ValueError:
try:
return float(token)
except ValueError:
return Symbol(token)
>>> program = "(begin (define r 10) (* pi (* r r)))"
>>> parse(program)
['begin', ['define', 'r', 10], ['*', 'pi', ['*', 'r', 'r']]]
We're almost ready to define the environment:
import math
import operator as op
def standard_env() -> Env:
"An environment with some Scheme standard procedures."
env = Env()
env.update(vars(math)) # sin, cos, sqrt, pi, ...
env.update({
'+':op.add, '-':op.sub, '*':op.mul, '/':op.truediv,
'>':op.gt, '<':op.lt, '>=':op.ge, '<=':op.le, '=':op.eq,
'abs': abs,
'append': op.add,
'apply': lambda proc, args: proc(*args),
'begin': lambda *x: x[-1],
'car': lambda x: x[0],
'cdr': lambda x: x[1:],
'cons': lambda x,y: [x] + y,
'eq?': op.is_,
'expt': pow,
'equal?': op.eq,
'length': len,
'list': lambda *x: List(x),
'list?': lambda x: isinstance(x, List),
'map': map,
'max': max,
'min': min,
'not': op.not_,
'null?': lambda x: x == [],
'number?': lambda x: isinstance(x, Number),
'print': print,
'procedure?': callable,
'round': round,
'symbol?': lambda x: isinstance(x, Symbol),
})
return env
global_env = standard_env()
We are now ready for the implementation of eval:
def eval(x: Exp, env=global_env) -> Exp:
"Evaluate an expression in an environment."
if isinstance(x, Symbol): # variable reference
return env[x]
elif isinstance(x, Number): # constant number
return x
elif x[0] == 'if': # conditional
(_, test, conseq, alt) = x
exp = (conseq if eval(test, env) else alt)
return eval(exp, env)
elif x[0] == 'define': # definition
(_, symbol, exp) = x
env[symbol] = eval(exp, env)
else: # procedure call
proc = eval(x[0], env)
args = [eval(arg, env) for arg in x[1:]]
return proc(*args)
We're done! You can see it all in action:
>>> eval(parse("(begin (define r 10) (* pi (* r r)))"))
314.1592653589793
def repl(prompt='lis.py> '):
"A prompt-read-eval-print loop."
while True:
val = eval(parse(input(prompt)))
if val is not None:
print(schemestr(val))
def schemestr(exp):
"Convert a Python object back into a Scheme-readable string."
if isinstance(exp, List):
return '(' + ' '.join(map(schemestr, exp)) + ')'
else:
return str(exp)
>>> repl()
lis.py> (define r 10)
lis.py> (* pi (* r r))
314.159265359
lis.py> (if (> (* 11 11) 120) (* 7 6) oops)
42
lis.py> (list (+ 1 1) (+ 2 2) (* 2 3) (expt 2 3))
(2 4 6 8)
Source: Hacker News










