Build Your Own Brainfuck Interpreter

This challenge is to build your own Brainf*ck Interpreter. Just in case you’re wondering, yes the * should be a u, but Coding Challenges is family friendly so we’re avoiding the direct profanity (aka, my kids read Coding Challenges and will tell me off for it).

Brainf*ck is an esoteric programming language that was created by Urban Müller in 1993. Apparently the intention was to be able to create a language that allowed for the creation of a tiny compiler. The first compiler being just 296 bytes, which later shrank to 240 bytes. There’s now one that is around 170 bytes!

The Challenge - Building A BrainF*ck Interpreter

In this coding challenge you’re going to be building an interpreter for the esoteric programming language Brainf*ck. Then you’re going to implement a simple compiler to compile BF code to bytecode, and a virtual machine that will run the bytecode. Finally you’ll optimise the compiler and virtual machine to drastically improve the performance.

So why is Brainf*ck interesting? Because it is an example of a Turing-complete language, therefore like any Turing-complete language, Brainf*ck is theoretically capable of computing any computable function or simulating any other computational model if given access to an unlimited amount of memory and time.

In reality, it's not practical for real-world usage as it provides almost no abstraction making it tedious to develop anything other than toy programs with it. On the flip side the lack of abstraction makes it very easy to build an interpreter and compiler for. Thus it’s a great first programming language to use to learn a little bit about how they’re built.

**The Brainf*ck Language And Memory Model**

Brainf*ck operates on an array of memory cells, each initialised to zero. In the original implementation, the array was 30,000 cells long. There is an instruction pointer than points to the command to be executed, which then moves forward after the instruction has been executed. There is a data pointer that initially points to the first memory cell.

The language consists of eight commands:

Command	Description
`>`	Move the pointer to the right
`<`	Move the pointer to the left
`+`	Increment the memory cell at the pointer
`-`	Decrement the memory cell at the pointer
`.`	Output the character signified by the cell at the pointer
`,`	Input a character and store it in the cell at the pointer
`[`	Jump past the matching ] if the cell at the pointer is 0
`]`	Jump back to the matching [ if the cell at the pointer is nonzero

All characters other than ><+-.,[] are considered to be comments and are ignored.

Step Zero

In this step you’re going to set your environment up ready to begin developing and testing your solution. I’ll leave you to setup your IDE / editor of choice and programming language of choice.

I’d also suggest having a quick go at writing a simple Brainf*ck program. A good start would be working out how to print out the message Hello! If you need help, you’ll find the answer is at the end of this Coding Challenge.

Step 1

In this step your goal is to write a simple Read-Evaluate-Print-Loop (REPL) for your Brainf*ck interpreter.

The REPL should allow the user to enter Brainf*ck code and for now it should simply echo it back to the user. We’ll start interpreting it in later steps. That should look something like this:

% ccbf
CCBF> ++++++++++[>+>+++>+++++++>++++++++++<<<<-]>>>++.>+++++.<<<.
++++++++++[>+>+++>+++++++>++++++++++<<<<-]>>>++.>+++++.<<<.

Part two of this step is to allow the user to provide a Brainf*ck script as a command line argument and have it printed out, something like this, assuming you’ve copied the code above into the file test.bf:

% ccbf test.bf
++++++++++[>+>+++>+++++++>++++++++++<<<<-]>>>++.>+++++.<<<.

Step 2

In this step your goal is to interpret the Brainf*ck code and produce the output of the program being interpreted. To do that you’ll need to read the source code a character at a time and take the appropriate action.

Remember if the input character isn’t one of the eight instructions it should be treated as a comment and ignored. You can treat the memory as a simple array, setting the size to 30,000 (or make it customisable). Be sure to initialise every element to zero.

You’ll need to have a program counter that keeps track of where you are in the source code being interpreted. It will normally increase by one after every instruction (or comment), but you’ll have to allow for increasing and decreasing it to find the matching jump instructions (i.e. the [ and ] instructions).

You’ll also need a data counter / pointer that is incremented/decremented by the > and < instructions. This is used to determine which memory cell the other operations apply to.

When you’re ready run the following test code to check your interpreter works:

This is a test Brainf*ck script written
for Coding Challenges!
++++++++++[>+>+++>+++++++>++++++++++<<<
<-]>>>++.>+.+++++++..+++.<<++++++++++++
++.------------.>-----.>.-----------.++
+++.+++++.-------.<<.>.>+.-------.+++++
++++++..-------.+++++++++.-------.--.++
++++++++++++. What does it do?

Congratulations you’ve now written a simple programming language interpreter!

Now for the big test, try running the code to generate a Mandelbrot set on it! You can find the Brainf*ck code for a Mandelbrot set on the Coding Challenges website.

Copy the code and run it locally, I suggest you use time to see how long it takes to run:

% time ./ccbf mandelbrot.bf > /dev/null 2>&1
./ccbf mandelbrot.bf > /dev/null 2>&1  25.12s user 0.02s system 99% cpu 5.843 total

I’ve redirected stderr and stdout here to keep the example clear, you should check that when running the code your interpreter does actually print out a Mandelbrot set.

I’m sure you’ve noticed it is slow! Probably in the region of 20 to 40 seconds if you’ve implemented the interpreter without having jumped ahead to do any optimisation (and depending on the speed of your CPU).

So now we’re going to look at making it faster in the next couple of steps.

Step 3

In this step your goal is to create a very simple compiler that will target a very simple Virtual Machine (VM). Our VM will then run bytecode instructions instead of directly executing the Brainf*ck source code.

What is Bytecode?

Bytecode is an intermediate, low-level representation of code. It is designed to be executed by a virtual machine (VM). It sits between source code and machine code, acting as an abstraction that is platform-independent.

The name bytecode stems from the fact that the instruction sets usually have one-byte opcodes followed by optional parameters. Intermediate representations such as bytecode are often output by programming language implementations to ease interpretation as you’re going to do in this coding challenge, or they may be used to reduce hardware and operating system dependence by allowing the same code to run cross-platform, on different devices - this was a goal for Java and the Java Virtual Machine.

Bytecode can be either directly executed on a virtual machine or it may be further compiled into machine code for better performance. Bytecode instructions can be arbitrarily complex, though they’re typically relatively simple and similar to the machine code instructions supported by CPUs.

Bytecode is widely used, for example:

Java compiles to bytecode targeted at the Java Virtual Machine (JVM).
Scala, Kotlin, Groovy, Clojure all also compile to bytecode for the JVM.
Python compiles to bytecode for the Python interpreter.
Ruby compiles to bytecode for YARV.
C# (and other .NET languages) compile to bytecode for Common Language Runtime (CLR).

Many other programming languages also compile to bytecode which is then run (or interpreted) on a virtual machine, i.e. Raku, PHP, TCL, mawk, Forth, ActionScript, eBPF, Common LISP, Dalvik and even the cryptocurrency Ethereum makes use of bytecode and a VM.

For this step, define a simple bytecode instruction set, one instruction in bytecode for each operation in the source language. In doing so we’ll create our first two optimisations:

Comments will not be copied from the source code to the bytecode, therefore they will no longer need to be “processed” during the interpretation step. Potentially reducing the runtime.
When compiling the code we can identify the beginning and ends of the loops at compile time and set the jump target for both. Thus jumps now only require an assignment to the program counter instead of a loop backwards/forwards through the source code to find the matching bracket.

These are small optimisations but should make the execution measurably faster once you’ve implemented a virtual machine to run the bytecode. Which brings us to step 4.

Step 4

In this step your goal is to implement the virtual machine to run your compiled bytecode on. As with the interpreter it needs a program counter and data pointer. It will also need to read the next opcode and any parameters (only applicable to the loop parameters) and then update the relevant counter or memory cell.

Once you’ve implemented your VM check the programs completed so far still run correctly and time the Mandelbrot set.

Step 5

In this step your goal is to improve the performance of your Brainf*ck interpreter by optimising the virtual machine and the bytecode produced by your compiler.

The low hanging fruit here is to recognise that many instruction sequences are repeats, for example consider the Hello! program:

++++++++++[>+>+++>+++++++>++++++++++<<<<-]>>>++.>+.+++++++..+++.<<+++.

There are ten ‘+’ operations at the beginning. If we could turn that into a single operation it could in theory run in 10% of the time (one instruction on the VM instead of ten).

If you look at the code for most Brainf*ck programs you’ll see these repeated runs of the same operation, it’s not surprising really given the small instruction set. So your goal for this step is to have your compiler recognise repeated operations and emit more efficient bytecode. For example the above program could be just 26 bytecode instructions instead of the 70 source code instructions.

Step 6

In this step your goal is to further improve the performance of your Brainf*ck interpreter by optimising the virtual machine and the bytecode produced by your compiler with higher level operations.

If you really dig into Brainf*ck code you’ll see there are common patterns that occur in the loops, for example consider this snippet:

>>>>[-]>[-]>[-]>[-]>[-]>[-]

The code [-] occurs several times. It is a loop we could optimise out into a single bytecode instruction, instead of three bytecode instructions that might be run hundreds of times. That would be a huge speedup! In fact this operation is run over 100 million times in one program I profiled!

Then there’s the next level up. The same operation is being done to several cells in the code, i.e. [-]>[-]>[-]>[-]>[-]>[-] . Taking a quick look at a few programs shows that there is a pattern of using these repeats so translating the pattern [-]> into an single bytecode instruction will also provide an optimisation.

Identify these operations in your compiler and compile them to a new bytecode instruction. Adapt your VM to support it and time the performance of your optimised compiler and interpreter. You should see it work significantly faster.

Going Further

If you want to take this further, a great first step would be to log out all the loops that are executed and how many times they’re executed. Then starting with the most common patterns, create bytecode instructions for them. Following this process should easily get you to running the Mandelbrot set in < 15% of the original runtime.

Beyond that consider replacing your VM with a Just-In-Time (JIT) compiler or making it produce either assembler or an intermediate representation (IR) for a compiler backend such as LLVM.

Hello! In Brainf*ck!

Here’s how to write Hello! from a Brainf*ck program:

++++++++++[>+>+++>+++++++>++++++++++<<<<-]>>>++.>+.+++++++..+++.<<+++.

If you think your solution is an example other developers can learn from please share it, put it on GitHub, GitLab or elsewhere. Then let me know - ping me a message on the Discord Server, via Twitter or LinkedIn or just post about it there and tag me. Alternately please add a link to it in the Coding Challenges Shared Solutions Github repo.

Get The Challenges By Email

If you would like to receive the coding challenges by email, you can subscribe to the weekly newsletter on SubStack here:

The Challenge - Building A BrainF*ck Interpreter​

The Brainf*ck Language And Memory Model​

Step Zero​

Step 1​

Step 2​

Step 3​

Step 4​

Step 5​

Step 6​

Going Further​

Hello! In Brainf*ck!​

Help Others by Sharing Your Solutions!​

Get The Challenges By Email​