To teach a friend how compilers work, I've written a compiler for a small subset of Pascal. It lexically analyses and parses a program, code-generates the syntax tree into code for a stack machine not unlike the Ocode virtual machine once used by C's ancestral language BCPL, and interprets the code. I'm going to write a few essays about the compiler and its target machine; in this first one, I'll demonstrate how a stack machine evaluates expressions.
A good place to start is reverse Polish. As many readers will know, this is a notation for expressions whereby all operators and functions follow their arguments. It's important because once we have the reverse Polish form of an expression, we can trivially transcribe it into a program that evaluates the expression.
Here's a very simple example. The reverse Polish form of the expression
11 * 22is
11 22 *
This works with any number of operators. So, for example, the reverse Polish form of the expression
11 * 22 + 33is
11 22 * 33 +
Notice that this is the reverse Polish for E + 33
, which
is E 33 +
, with E
replaced by
the reverse Polish for 11 * 22
. Reverse
Polish has this lovely recursive structure.
What's magic about this is that we can transcribe any reverse-Polish expression into a machine-code program that evaluates the expression. Providing, that is, that our machine has a stack.
So,
assume a stack. Let
there be a loadconst
instruction,
which
pushes a constant onto the stack. And
let there be an add
instruction, which pops the top
two elements off the stack, adds them, and pushes
the result back. Let mult
operate
similarly. And let us transcribe the reverse
Polish above, replacing constants by
loadconst
, and renaming operators
the corresponding instruction.
Doing this,
11 22 * 33 +becomes
loadconst 11 loadconst 22 mult loadconst 33 add
I fed the instruction sequence above to my stack-machine interpreter, and this is what it displayed. The stack, which starts off empty, is shown as a list, with the topmost element on the left:
Stack: [] About to do OP(loadconst, 11) Stack: [11] About to do OP(loadconst, 22) Stack: [22, 11] About to do OP(mult) Stack: [242] About to do OP(loadconst, 33) Stack: [33, 242] About to do OP(add) Stack: [275]
Here's a bigger example. I'm going to evaluate this expression:
((11 * 22 + 33 * 44) / 121 = 14) and (10 = 20 / 2)
And here is the same expression rewritten in reverse Polish. The line breaks have no significance, they just stop me overflowing the blog's right-hand margin:
11 22 * 33 44 * + 121 / 14 = 10 20 2 / = and
This is the reverse Polish transcribed into machine instructions:
loadconst 11 loadconst 22 mult loadconst 33 loadconst 44 mult add loadconst 121 div loadconst 14 eq loadconst 10 loadconst 20 loadconst 2 div eq logand
I've used a greater variety of
instructions here. The div
instruction divides the element next to
the top of the stack by the element on the
top, popping them and
pushing back the integer
quotient.
The eq
instruction
compares the top two elements, pops them, and
pushes back 1 (signifying TRUE) if they are equal, 0
(signifying FALSE)
otherwise. And the logand
instruction does a logical "and", popping
the top two elements and pushing
back 1 if they are both 1, otherwise 0.
Now here is my interpreter interpreting those instructions:
Stack: [] About to do OP(loadconst, 11) Stack: [11] About to do OP(loadconst, 22) Stack: [22, 11] About to do OP(mult) Stack: [242] About to do OP(loadconst, 33) Stack: [33, 242] About to do OP(loadconst, 44) Stack: [44, 33, 242] About to do OP(mult) Stack: [1452, 242] About to do OP(add) Stack: [1694] About to do OP(loadconst, 121) Stack: [121, 1694] About to do OP(div) Stack: [14] About to do OP(loadconst, 14) Stack: [14, 14] About to do OP(eq) Stack: [1] About to do OP(loadconst, 10) Stack: [10, 1] About to do OP(loadconst, 20) Stack: [20, 10, 1] About to do OP(loadconst, 2) Stack: [2, 20, 10, 1] About to do OP(div) Stack: [10, 10, 1] About to do OP(eq) Stack: [1, 1] About to do OP(logand) Stack: [1]
The two equality comparisons both returned TRUE, that is 1; and the logical "and" combined these, pushing a final 1 onto the stack.
In my next essay on this topic, I'll extend the machine to implement variables, jumps, and input and output. By the way, it has not been unknown for people to code directly in reverse Polish — as The Evolution of RPN & Numeric Entry in Hewlett-Packard's The Museum of HP Calculators tells us.