An interpreter for a trivial imperative language, part 1 (semantics)

Posted on September 6, 2017

Preliminaries

This is a beginners’ tutorial on how to write an interpreter for a really simple imperative language. I wrote it because my friend asked me for some information on where to even begin looking around for such a thing, and I couldn’t redirect him to a specific place, so here we are.

What you’ll need

First of all, I require basic knowledge of Haskell and some understanding of fundamental mathematical concepts, like functions. We won’t do any category theory or advanced programming, however I recommend you take a peek at monad transformers. Also, I am assuming you know what a BNF Grammar is.

We will also use some crazy software which will solve the problem of parsing for us - bnfc. In order to produce Haskell, it requires also alex and happy. You can get all of them from most package managers, including apt.

Notation

I’m going to use some convenient notation shortcuts, listed here:

What we’re going to do

A programming language is defined by its syntax and semantics. The syntax is usually defined with a context-free grammar, which is the case for our working example. Note, however, that most languages are not context-free, for instance Python.

Like I mentioned before, we will not implement a parser, making use of existing tools.

What we will focus on mostly are the language’s semantics. To put it as simple as it gets: program’s semantics are a mathematical model of the computation the program represents.

The language

The language we’ll be implementing is called Tiny where I come from. We won’t prove its Turing completeness, which some madmen might attempt. However this language represents all of the most basic constructs of imperative programming.

Syntax

Tiny’s syntax consists of the following clauses:

This is it! All Tiny can do is an assignment, an if statement and a while loop. It looks pretty easy for now, especially since we know what all those constructs should do. The next question we’ll be facing is: what do those syntactic constructs actually mean?

Semantics

The crucial question here is how do we model a programming language’s behaviour? The correct answer to this question is: it depends on the language! Obviously, a purely functional language, like Haskell, would not require a state to be a part of the semantics, unlike any of the imperative languages known to man.

In order to avoid a lengthy discussion on various approaches to defining semantics, we’ll answer a simpler question: what do syntactic constructs of Tiny mean?

Let’s begin with everything but statements. Expressions’ values depend on what has happend in the code before their evaluation, and might even cause some errors, like undeclared variable or division by zero. In order to devise the semantics of expressions, first we need to define what a state of the program is. At any given time, a Tiny program’s state is defined by what all variables’ values are. At the beginning none of them are defined and they stop being undefined when something is assigned to them for the first time.

This means that a state can be modelled as a partial function \[ s \in \textbf{State} = \textbf{Var} \rightharpoonup \mathbb{Z} \]

Recall that a partial function is a function that might not be defined everywhere on its domain, in other words it is an entire function

\[ s : \textbf{Var} \to \mathbb{Z} \cup \{\perp\} \]

where \( s ~ x = \perp \) means that \( s \) is undefined on \( x \).

\[ \mathcal{E}: \textbf{Var} \to \textbf{State} \rightharpoonup \mathbb{N} \] \[ \mathcal{E} (n) = \lambda s . \mathcal{N}(n) \] \[ \mathcal{E} (x) = \mathcal{V}(x) \] \[ \mathcal{E} (e_1 + e_2) = \lambda s . (\mathcal{E}(e_1) s + \mathcal{E}(e_2) s)\] \[ \mathcal{E} (e_1 * e_2) = \lambda s . (\mathcal{E}(e_1) s * \mathcal{E}(e_2) s)\] \[ \mathcal{E} (e_1 - e_2) = \lambda s . (\mathcal{E}(e_1) s - \mathcal{E}(e_2) s)\]

Note: We implicitly assume that errors propagate, i.e. if \( V(e_1) = \perp \), then so is \( V(e_1 + e_2 \). As you can imagine, in such situations monads come in very handy.

\[ \mathcal{B} : \textbf{BExp} \to \textbf{State} \rightharpoonup \{ \mathrm{tt}, \mathrm{ff} \}\] \[ \mathcal{B} (\texttt{true}) = \lambda s . \mathrm{tt} \] \[ \mathcal{B} (\texttt{false}) = \lambda s . \mathrm{ff} \] \[ \mathcal{B} (e_1 \leq e_2) = \lambda s . \left( \mathcal{E}(e_1) s \leq \mathcal{E}(e_2) s \right) \] \[ \mathcal{B} (\texttt{not } b') = \lambda s . \left( \neg \mathcal{B}(b') s \right) \]

Everything looks fine for now. Now, let’s head to the beef - statements.

And that would be all! We have constructed a very simple mathematical model for the semantics of Tiny’s semantics. In the next post, we will use this model to implement an actual interpreter for this language, possibly using some cheats for I/O.