Introduction

For some ungodly reason I decided to write a programming language. As the saying goes, we don’t do things because they are easy, but because we thought they were easy. Could this be a symptom of acute merger-induced joblessness? We might triangulate my motivation until the end of time, so here instead is a sentence: a language has been built.

The following sections describe this untyped, minimalistic, non-strict, purely-functional term-rewriting language. Its name is utrl, an acronym which stands for “Untitled Term Rewriting Language”.

Definition Syntax

A program is a set of definitions. Each definition has three parts: the text def, a pattern, and a constructor, the three of them surrounded in parentheses. This is a program with two definitions:

(def my-pattern my-constructor)
(def A B)

To run a program, you must supply it with input. Parts of the input that match the patterns specified in the program will be replaced with the corresponding constructors. The program above replaces all references to my-pattern with my-constructor and all references to A with B. In what follows, the symbol “↦” will be used to denote the application of a definition.

my-pattern              ↦ my-constructor
(my-pattern my-pattern) ↦ (my-constructor my-constructor)
(my-pattern A)          ↦ (my-constructor B)

If part of the input doesn’t match any of the patterns, it is not transformed.

not-my-pattern     ↦ not-my-pattern
(not-my-pattern A) ↦ (not-my-pattern B)

Symbols and Lists

The text my-pattern and my-constructor, as well as A and B are called symbols. Symbols cannot contain spaces or parentheses. The text (my-pattern my-pattern) is called a list. Lists may contain symbols or other lists, and the elements of a list are separated by whitespace.

Patterns and constructors may be lists, not just symbols.

(def (a list) (another list))

A list-pattern only matches if all of its elements match.

(a list)           ↦ (another list)
(a different list) ↦ (a different list)

Nested Lists

Patterns and constructors may be arbitrarily complicated. When definitions become long enough, it makes sense to write the constructor on a separate line so that it is more clear where the pattern ends and the constructor begins.

(def (1 (2 (3 (4 (5 levels of parentheses)))))
  that-is-a-lot)

Input is rewritten the same way, no matter where the whitespace is in the program.

(1 (2 (3 (4 (5 levels of parentheses)))))
 ↦ that-is-a-lot

Evaluation

If the output of a definition matches another definition, that other definition is applied.

(def Input Step1)
(def Step1 Step2)
(def Step2 Step3)
(def Step3 Done)

Definitions continue to be applied until there are no more matching definitions.

Input
 ↦ Step1
 ↦ Step2
 ↦ Step3
 ↦ Done

Nested Evaluation

If the input is a list, and no definition matches it, then the elements of the list are examined instead. Each element, from left to right, is matched against the available definitions. If a definition matches an element, the element is replaced accordingly. As soon as one element is replaced, the original list (with the replaced element) is considered again for matching.

(def Input (A A A))
(def A B)
(def (B B B) Done)

Input
 ↦ (A A A)
 ↦ (B A A)
 ↦ (B B A)
 ↦ (B B B)
 ↦ Done

Variables

Patterns and constructors may contain variables, which are symbols that start with the dollar-sign character ($). A variable matches anything in the pattern, and its value in the constructor is the portion of input it matched.

(def (greet $name) (Hello $name))

Variables make it possible to write definitions that apply in many circumstances.

(greet Alice)       	↦ (Hello Alice)
(greet Bob)         	↦ (Hello Bob)
(greet (Alice and Bob)) ↦ (Hello (Alice and Bob))

Ellipses on Variables

Variables may be followed by ellipses, which—for brevity—are denoted by two periods (..) instead of the usual three (...). If a variable has ellipses following it in the pattern, it must also have ellipses following it in the constructor.

(def (greet-all $name ..)
  (Hello to all $name ..))

A variable followed by ellipses matches zero or more portions of the input.

(greet-all)                   ↦ (Hello to all)
(greet-all Alice)             ↦ (Hello to all Alice)
(greet-all Alice Bob)         ↦ (Hello to all Alice Bob)
(greet-all Alice Bob Charlie) ↦ (Hello to all Alice Bob Charlie)

Ellipses on Lists

Lists may be followed by ellipses in the pattern. If a list is followed by an ellipsis, it must have at least one variable inside it.

(def (x-coordinates (Point $x $y) ..)
  ($x ..))
(def (y-coordinates (Point $x $y) ..)
  ($y ..))

Variables in such lists are bound to a sequence of values, one value for each time the list matches.

(x-coordinates (Point 1 2) (Point 3 4) (Point 5 6)) ↦ (1 3 5)
(y-coordinates (Point 1 2) (Point 3 4) (Point 5 6)) ↦ (2 4 6)

Lists may also be followed by ellipses in the constructor.

(def (swap-coordinates (Point $x $y) ..)
  ((Point $y $x) ..))

(swap-coordinates (Point a 0) (Point b 1)) 
 ↦ ((Point 0 a) (Point 1 b))

Multiple Ellipses

A variable or list may be surrounded by more than one ellipsis.

(def (flatten (list (list $x ..) ..))
  (list $x .. ..))

This has the effect of binding variables to trees of values.

(flatten (list (list 1 2 3)
               (list 4 5 6)
               (list 7 8 9)))
 ↦ (list 1 2 3 4 5 6 7 8 9)

In the example above, the variable $x is bound to the values in this tree:

    $x ─┐
        │
  ┌─────┼─────┐
  │     │     │
┌─┼─┐ ┌─┼─┐ ┌─┼─┐
│ │ │ │ │ │ │ │ │
1 2 3 4 5 6 7 8 9

Different levels of this tree can be extracted by changing the position of the ellipses in the constructor.

(def (example1 (($x ..) ..))
  (($x ..) ..))
(def (example2 (($x ..) ..))
  ($x .. ..))

(example1 ((1 2 3) (4 5 6) (6 8 9)))
 ↦ ((1 2 3) (4 5 6) (7 8 9))
(example2 ((1 2 3) (4 5 6) (6 8 9)))
 ↦ (1 2 3 4 5 6 7 8 9)

Comments

Single line comments are written with // and multi-line comments are written with /* and */ pairs.

(def /* Hello, world! */ A B) // rewrites A into B

Examples

A utrl program is run by passing the interpreter two files: a definitions file containing the program, and an input file. The interpreter compiles the program—reporting any errors it finds along the way—and, if successful, processes the input according to the program’s definitions. This information can also be found by passing the --help flag to the interpreter:

$ cabal run utrl -- --help
Usage: utrl --defs FILE [--input FILE] [--trace] [--emit STAGES]

  Simple text tree rewriting engine language

Available options:
  --defs FILE              File containing definitions to compile
  --input FILE             File to process using compiled definitions. If no
                           file is provided, simply check the definitions for
                           errors.
  --trace                  Display intermediate computation steps. Has no effect
                           without '--input'.
  --emit STAGES            Emit generated code at given stage. Possible values:
                           1, C0, C1, C2, P0. Supply multiple stages by
                           separating each by a comma, so to emit stages C0 and
                           C2, use --emit 'C0,C2'.
  -h,--help                Show this help text

What follows are some example definition and input files which demonstrate some of what can be accomplished in utrl.

Natural Numbers

We can use utrl to define simple operations on Peano natural numbers:

// File 'add.defs'

/* Addition */
(def ($n + 0)      $n)
(def ($n + (S $m)) (S ($n + $m)))

/* Equality */
(def (0      == 0)      true)
(def ((S $n) == 0)      false)
(def (0      == (S $m)) false)
(def ((S $n) == (S $m)) ($n == $m))

/* Peano natural numbers 1-5 */
(def 1 (S 0))
(def 2 (S 1))
(def 3 (S 2))
(def 4 (S 3))
(def 5 (S 4))

// File 'add.input'

(5 == (2 + 3)) /* this should evaluate to 'true' */

$ cabal run utrl -- --defs ./add.defs --input ./add.input
true
$ # Good, that is what we wanted. '--trace' will show the steps:
$ cabal run utrl -- --defs ./add.defs --input ./add.input --trace
0. (5 == (2 + 3))
1. ((S 4) == (2 + 3))
2. ((S (S 3)) == (2 + 3))
3. ((S (S 3)) == ((S 1) + 3))
4. ((S (S 3)) == ((S 1) + (S 2)))
5. ((S (S 3)) == (S ((S 1) + 2)))
6. ((S 3) == ((S 1) + 2))
7. ((S (S 2)) == ((S 1) + 2))
8. ((S (S 2)) == ((S 1) + (S 1)))
9. ((S (S 2)) == (S ((S 1) + 1)))
10. ((S 2) == ((S 1) + 1))
11. ((S (S 1)) == ((S 1) + 1))
12. ((S (S 1)) == ((S 1) + (S 0)))
13. ((S (S 1)) == (S ((S 1) + 0)))
14. ((S 1) == ((S 1) + 0))
15. ((S 1) == (S 1))
16. (1 == 1)
17. ((S 0) == 1)
18. ((S 0) == (S 0))
19. (0 == 0)
20. true
true

Mergesort

With a bit of effort, it’s possible to write a mergesort algorithm using utrl.

// File 'mergesort.defs'

// Sorts a list of natural numbers via merge sort
//
// sort : List Nat -> List Nat
(def (sort $xs)
  (sort-rec (length $xs) nil $xs))

// Helper function for 'sort'. Partitions its third
// argument (a list) into two pieces, accumulating the
// first piece in the second argument. Returns the
// result of 'merge'ing the final two pieces together.
// The integer argument counts twice the number of
// list elements to split off.
//
// sort-rec : Nat -> List Nat -> List Nat -> List Nat
(def (sort-rec 0 nil nil)
  nil)
(def (sort-rec 0 ($a :: nil) nil)
  ($a :: nil))
(def (sort-rec 0 nil ($x :: nil))
  ($x :: nil))
(def (sort-rec 0 ($a :: nil) ($x :: nil))
  (merge ($a :: nil)
         ($x :: nil)))
(def (sort-rec 0 ($a :: nil) ($x :: ($y :: nil)))
  (merge ($a :: nil)
         (sort ($x :: ($y :: nil)))))
(def (sort-rec 0 ($a :: ($b :: $as)) ($x :: ($y :: $ys)))
  (merge (sort ($a :: ($b :: $as)))
         (sort ($x :: ($y :: $ys)))))
(def (sort-rec (S 0) $as $xs)
  (sort-rec 0 $as $xs))
(def (sort-rec (S (S $n)) $as ($x :: $xs))
  (sort-rec $n ($x :: $as) $xs))

// merge : List Nat -> List Nat -> List Nat
//
// Combines two lists, both in ascending order, into
// a single list in ascending order.
(def (merge nil ($y :: $ys))
  ($y :: $ys))
(def (merge ($x :: $xs) nil)
  ($x :: $xs))
(def (merge ($x :: $xs) ($y :: $ys))
  (if ($x < $y)
    then ($x :: (merge $xs ($y :: $ys)))
    else ($y :: (merge ($x :: $xs) $ys))))

// Returns the length of a list
//
// length : List a -> Nat
(def (length nil)
  0)
(def (length ($x :: $xs))
  (S (length $xs)))

// 'If' statement syntax
(def (if true then $then else $else) $then)
(def (if false then $then else $else) $else)

// A couple translations of arabic numerals into Peano numbers.
// See: https://en.wikipedia.org/wiki/Peano_axioms
(def 1 (S 0))
(def 2 (S 1))
(def 3 (S 2))
(def 4 (S 3))
(def 5 (S 4))
(def 6 (S 5))
(def 7 (S 6))
(def 8 (S 7))
(def 9 (S 8))

// Equality of natural numbers
//
// (==) : Nat -> Nat -> Bool
(def (0 == 0)
  true)
(def ((S $n) == 0)
  false)
(def (0 == (S $m))
  false)
(def ((S $n) == (S $m))
  ($n == $m))

// Equality of lists
//
// (==) : List a -> List a -> Bool
(def (nil == nil)
  true)
(def (($x :: $xs) == nil)
  false)
(def (nil == ($y :: $ys))
  false)
(def (($x :: $xs) == ($y :: $ys))
  (($x == $y) && ($xs == $ys)))

// Logical operator 'AND'
//
// (&&) : Bool -> Bool -> Bool
(def (true && $x) $x)

// 'less-than' operator on natural numbers
//
// (<) : Nat -> Nat -> Bool
(def (0 < 0)
  false)
(def ((S $n) < 0)
  false)
(def (0 < (S $m))
  true)
(def ((S $n) < (S $m))
  ($n < $m))

// List construction syntax: creates cons cells
// from a flat list.
(def (list)
  nil)
(def (list $x $xs ..)
  ($x :: (list $xs ..)))

// File 'mergesort.input'

((list 1 2 3) == (sort (list 3 2 1)))

$ time cabal run utrl -- --defs ./mergesort.defs --input ./mergesort.input
true

real    0m0.014s
user    0m0.008s
sys     0m0.004s
$ # Using '--trace' here produces a lot of output!
$ utrl --defs ./mergesort.defs --input ./mergesort.input --trace
0. ((list 1 2 3) == (sort (list 3 2 1)))
1. ((1 :: (list 2 3)) == (sort (list 3 2 1)))
2. ((1 :: (list 2 3)) == (sort-rec (length (list 3 2 1)) nil (list 3 2 1)))
3. (((S 0) :: (list 2 3)) == (sort-rec (length (list 3 2 1)) nil (list 3 2 1)))
4. (((S 0) :: (2 :: (list 3))) == (sort-rec (length (list 3 2 1)) nil (list 3 2 1)))
5. (((S 0) :: (2 :: (list 3))) == (sort-rec (length (list 3 2 1)) nil (3 :: (list 2 1))))

Intermediate steps ellided for brevity. '--trace' will actually
print every single step. Try it for yourself and see all the text
fly by!

60. (((S 2) :: nil) == ((S (S 1)) :: nil))
61. (((S 2) == (S (S 1))) && (nil == nil))
62. ((2 == (S 1)) && (nil == nil))
63. ((2 == (S 1)) && true)
64. (((S 1) == (S 1)) && true)
65. ((1 == 1) && true)
66. (((S 0) == 1) && true)
67. (((S 0) == (S 0)) && true)
68. ((0 == 0) && true)
69. (true && true)
70. true
true

Brainfuck interpreter

To help demonstrate Turing completeness, I wrote a Brainfuck interpreter in utrl. The definitions file that implements the interpreter can be found in the test directory. It can be used to sort a list via quicksort written in Brainfuck.

Binary Numbers

We have seen it is possible to implement numbers via Peano arithmetic, but this was a rather inefficient number representation. Instead of representing the natural numbers in terms of a series of increments to zero, we can represent them as either (1) zero, (2) two times some natural number, or (3) one plus two times some natural number. These correspond to (1) the binary number 0, (2) the act of taking a binary number n and putting a zero on the right-hand side, as in n0, and (3) the act of taking a binary number n and putting a one on the right-hand side, as in n1. This strategy requires only a factor of log_2(n) (as opposed to n) bits to represent the number n. Here is a utrl program that implements basic binary arithmetic:

// Converts a flat list of bits into our internal binary number
// representation:
//
// (bin 1 0 1) --> 1 + 2 * (2 * (1 + 2 * 0))
(def (bin)
  0)
(def (bin $b .. 0)
  (2 * (bin $b ..)))
(def (bin $b .. 1)
  (1 + 2 * (bin $b ..)))

// add : Bin -> Bin -> Bin
(def (add 0 0)
  0)
(def (add (2 * $n) 0)
  (2 * $n))
(def (add 0 (2 * $m))
  (2 * $m))
(def (add (1 + 2 * $n) 0)
  (1 + 2 * $n))
(def (add 0 (1 + 2 * $m))
  (1 + 2 * $m))
(def (add (2 * $n) (2 * $m))
  (2 * (add $n $m)))
(def (add (1 + 2 * $n) (1 + 2 * $m))
  (add (2 * (1 + 2 * 0))
       (2 * (add $n $m))))
(def (add (1 + 2 * $n) (2 * $m))
  (1 + 2 * (add $n $m)))
(def (add (2 * $n) (1 + 2 * $m))
  (1 + 2 * (add $n $m)))

// mul : Bin -> Bin -> Bin
(def (mul $n 0)
  0)
(def (mul $n (2 * $m))
  (2 * (mul $n $m)))
(def (mul $n (1 + 2 * $m))
  (add $n (2 * (mul $n $m))))

// equal : Bin -> Bin -> Bool
(def (equal 0 0)
  true)
(def (equal (1 + 2 * $n) 0)
  false)
(def (equal 0 (1 + 2 * $m))
  false)
(def (equal (2 * $n) (1 + 2 * $m))
  false)
(def (equal (1 + 2 * $n) (2 * $m))
  false)
(def (equal (2 * $n) 0)
  (equal $n 0))
(def (equal 0 (2 * $m))
  (equal 0 $m))
(def (equal (2 * $n) (2 * $m))
  (equal $n $m))
(def (equal (1 + 2 * $n) (1 + 2 * $m))
  (equal $n $m))

Let’s test this code by computing 146 + 296 = 34 * 13 (in binary, of course):

              // 1 1 0 1 1 1 0 1 0
(equal (add (bin   1 0 0 1 0 0 1 0)
            (bin 1 0 0 1 0 1 0 0 0))
       (mul (bin 1 0 0 0 1 0)
            (bin 1 1 0 1)))

This computation requires 83 definition applications to produce the output true. I’ve shown the entirety of it in full because I think it’s fun to see all the work that goes on under the hood:

$ cabal run utrl -- --defs test/programs/binary-arithmetic.defs --input test/programs/binary-arithmetic.input --trace
0. (equal (add (bin 1 0 0 1 0 0 1 0) (bin 1 0 0 1 0 1 0 0 0)) (mul (bin 1 0 0 0 1 0) (bin 1 1 0 1)))
1. (equal (add (2 * (bin 1 0 0 1 0 0 1)) (bin 1 0 0 1 0 1 0 0 0)) (mul (bin 1 0 0 0 1 0) (bin 1 1 0 1)))
2. (equal (add (2 * (bin 1 0 0 1 0 0 1)) (2 * (bin 1 0 0 1 0 1 0 0))) (mul (bin 1 0 0 0 1 0) (bin 1 1 0 1)))
3. (equal (2 * (add (bin 1 0 0 1 0 0 1) (bin 1 0 0 1 0 1 0 0))) (mul (bin 1 0 0 0 1 0) (bin 1 1 0 1)))
4. (equal (2 * (add (bin 1 0 0 1 0 0 1) (bin 1 0 0 1 0 1 0 0))) (mul (2 * (bin 1 0 0 0 1)) (bin 1 1 0 1)))
5. (equal (2 * (add (bin 1 0 0 1 0 0 1) (bin 1 0 0 1 0 1 0 0))) (mul (2 * (bin 1 0 0 0 1)) (1 + 2 * (bin 1 1 0))))
6. (equal (2 * (add (bin 1 0 0 1 0 0 1) (bin 1 0 0 1 0 1 0 0))) (add (2 * (bin 1 0 0 0 1)) (2 * (mul (2 * (bin 1 0 0 0 1)) (bin 1 1 0)))))
7. (equal (2 * (add (bin 1 0 0 1 0 0 1) (bin 1 0 0 1 0 1 0 0))) (2 * (add (bin 1 0 0 0 1) (mul (2 * (bin 1 0 0 0 1)) (bin 1 1 0)))))
8. (equal (add (bin 1 0 0 1 0 0 1) (bin 1 0 0 1 0 1 0 0)) (add (bin 1 0 0 0 1) (mul (2 * (bin 1 0 0 0 1)) (bin 1 1 0))))
9. (equal (add (1 + 2 * (bin 1 0 0 1 0 0)) (bin 1 0 0 1 0 1 0 0)) (add (bin 1 0 0 0 1) (mul (2 * (bin 1 0 0 0 1)) (bin 1 1 0))))
10. (equal (add (1 + 2 * (bin 1 0 0 1 0 0)) (2 * (bin 1 0 0 1 0 1 0))) (add (bin 1 0 0 0 1) (mul (2 * (bin 1 0 0 0 1)) (bin 1 1 0))))
11. (equal (1 + 2 * (add (bin 1 0 0 1 0 0) (bin 1 0 0 1 0 1 0))) (add (bin 1 0 0 0 1) (mul (2 * (bin 1 0 0 0 1)) (bin 1 1 0))))
12. (equal (1 + 2 * (add (bin 1 0 0 1 0 0) (bin 1 0 0 1 0 1 0))) (add (1 + 2 * (bin 1 0 0 0)) (mul (2 * (bin 1 0 0 0 1)) (bin 1 1 0))))
13. (equal (1 + 2 * (add (2 * (bin 1 0 0 1 0)) (bin 1 0 0 1 0 1 0))) (add (1 + 2 * (bin 1 0 0 0)) (mul (2 * (bin 1 0 0 0 1)) (bin 1 1 0))))
14. (equal (1 + 2 * (add (2 * (bin 1 0 0 1 0)) (2 * (bin 1 0 0 1 0 1)))) (add (1 + 2 * (bin 1 0 0 0)) (mul (2 * (bin 1 0 0 0 1)) (bin 1 1 0))))
15. (equal (1 + 2 * (2 * (add (bin 1 0 0 1 0) (bin 1 0 0 1 0 1)))) (add (1 + 2 * (bin 1 0 0 0)) (mul (2 * (bin 1 0 0 0 1)) (bin 1 1 0))))
16. (equal (1 + 2 * (2 * (add (bin 1 0 0 1 0) (bin 1 0 0 1 0 1)))) (add (1 + 2 * (2 * (bin 1 0 0))) (mul (2 * (bin 1 0 0 0 1)) (bin 1 1 0))))
17. (equal (1 + 2 * (2 * (add (bin 1 0 0 1 0) (bin 1 0 0 1 0 1)))) (add (1 + 2 * (2 * (bin 1 0 0))) (mul (2 * (bin 1 0 0 0 1)) (2 * (bin 1 1)))))
18. (equal (1 + 2 * (2 * (add (bin 1 0 0 1 0) (bin 1 0 0 1 0 1)))) (add (1 + 2 * (2 * (bin 1 0 0))) (2 * (mul (2 * (bin 1 0 0 0 1)) (bin 1 1)))))
19. (equal (1 + 2 * (2 * (add (bin 1 0 0 1 0) (bin 1 0 0 1 0 1)))) (1 + 2 * (add (2 * (bin 1 0 0)) (mul (2 * (bin 1 0 0 0 1)) (bin 1 1)))))
20. (equal (2 * (add (bin 1 0 0 1 0) (bin 1 0 0 1 0 1))) (add (2 * (bin 1 0 0)) (mul (2 * (bin 1 0 0 0 1)) (bin 1 1))))
21. (equal (2 * (add (2 * (bin 1 0 0 1)) (bin 1 0 0 1 0 1))) (add (2 * (bin 1 0 0)) (mul (2 * (bin 1 0 0 0 1)) (bin 1 1))))
22. (equal (2 * (add (2 * (bin 1 0 0 1)) (1 + 2 * (bin 1 0 0 1 0)))) (add (2 * (bin 1 0 0)) (mul (2 * (bin 1 0 0 0 1)) (bin 1 1))))
23. (equal (2 * (1 + 2 * (add (bin 1 0 0 1) (bin 1 0 0 1 0)))) (add (2 * (bin 1 0 0)) (mul (2 * (bin 1 0 0 0 1)) (bin 1 1))))
24. (equal (2 * (1 + 2 * (add (bin 1 0 0 1) (bin 1 0 0 1 0)))) (add (2 * (2 * (bin 1 0))) (mul (2 * (bin 1 0 0 0 1)) (bin 1 1))))
25. (equal (2 * (1 + 2 * (add (bin 1 0 0 1) (bin 1 0 0 1 0)))) (add (2 * (2 * (bin 1 0))) (mul (2 * (bin 1 0 0 0 1)) (1 + 2 * (bin 1)))))
26. (equal (2 * (1 + 2 * (add (bin 1 0 0 1) (bin 1 0 0 1 0)))) (add (2 * (2 * (bin 1 0))) (add (2 * (bin 1 0 0 0 1)) (2 * (mul (2 * (bin 1 0 0 0 1)) (bin 1))))))
27. (equal (2 * (1 + 2 * (add (bin 1 0 0 1) (bin 1 0 0 1 0)))) (add (2 * (2 * (bin 1 0))) (2 * (add (bin 1 0 0 0 1) (mul (2 * (bin 1 0 0 0 1)) (bin 1))))))
28. (equal (2 * (1 + 2 * (add (bin 1 0 0 1) (bin 1 0 0 1 0)))) (2 * (add (2 * (bin 1 0)) (add (bin 1 0 0 0 1) (mul (2 * (bin 1 0 0 0 1)) (bin 1))))))
29. (equal (1 + 2 * (add (bin 1 0 0 1) (bin 1 0 0 1 0))) (add (2 * (bin 1 0)) (add (bin 1 0 0 0 1) (mul (2 * (bin 1 0 0 0 1)) (bin 1)))))
30. (equal (1 + 2 * (add (1 + 2 * (bin 1 0 0)) (bin 1 0 0 1 0))) (add (2 * (bin 1 0)) (add (bin 1 0 0 0 1) (mul (2 * (bin 1 0 0 0 1)) (bin 1)))))
31. (equal (1 + 2 * (add (1 + 2 * (bin 1 0 0)) (2 * (bin 1 0 0 1)))) (add (2 * (bin 1 0)) (add (bin 1 0 0 0 1) (mul (2 * (bin 1 0 0 0 1)) (bin 1)))))
32. (equal (1 + 2 * (1 + 2 * (add (bin 1 0 0) (bin 1 0 0 1)))) (add (2 * (bin 1 0)) (add (bin 1 0 0 0 1) (mul (2 * (bin 1 0 0 0 1)) (bin 1)))))
33. (equal (1 + 2 * (1 + 2 * (add (bin 1 0 0) (bin 1 0 0 1)))) (add (2 * (2 * (bin 1))) (add (bin 1 0 0 0 1) (mul (2 * (bin 1 0 0 0 1)) (bin 1)))))
34. (equal (1 + 2 * (1 + 2 * (add (bin 1 0 0) (bin 1 0 0 1)))) (add (2 * (2 * (bin 1))) (add (1 + 2 * (bin 1 0 0 0)) (mul (2 * (bin 1 0 0 0 1)) (bin 1)))))
35. (equal (1 + 2 * (1 + 2 * (add (2 * (bin 1 0)) (bin 1 0 0 1)))) (add (2 * (2 * (bin 1))) (add (1 + 2 * (bin 1 0 0 0)) (mul (2 * (bin 1 0 0 0 1)) (bin 1)))))
36. (equal (1 + 2 * (1 + 2 * (add (2 * (bin 1 0)) (1 + 2 * (bin 1 0 0))))) (add (2 * (2 * (bin 1))) (add (1 + 2 * (bin 1 0 0 0)) (mul (2 * (bin 1 0 0 0 1)) (bin 1)))))
37. (equal (1 + 2 * (1 + 2 * (1 + 2 * (add (bin 1 0) (bin 1 0 0))))) (add (2 * (2 * (bin 1))) (add (1 + 2 * (bin 1 0 0 0)) (mul (2 * (bin 1 0 0 0 1)) (bin 1)))))
38. (equal (1 + 2 * (1 + 2 * (1 + 2 * (add (bin 1 0) (bin 1 0 0))))) (add (2 * (2 * (1 + 2 * (bin)))) (add (1 + 2 * (bin 1 0 0 0)) (mul (2 * (bin 1 0 0 0 1)) (bin 1)))))
39. (equal (1 + 2 * (1 + 2 * (1 + 2 * (add (bin 1 0) (bin 1 0 0))))) (add (2 * (2 * (1 + 2 * (bin)))) (add (1 + 2 * (2 * (bin 1 0 0))) (mul (2 * (bin 1 0 0 0 1)) (bin 1)))))
40. (equal (1 + 2 * (1 + 2 * (1 + 2 * (add (bin 1 0) (bin 1 0 0))))) (add (2 * (2 * (1 + 2 * (bin)))) (add (1 + 2 * (2 * (bin 1 0 0))) (mul (2 * (bin 1 0 0 0 1)) (1 + 2 * (bin))))))
41. (equal (1 + 2 * (1 + 2 * (1 + 2 * (add (bin 1 0) (bin 1 0 0))))) (add (2 * (2 * (1 + 2 * (bin)))) (add (1 + 2 * (2 * (bin 1 0 0))) (add (2 * (bin 1 0 0 0 1)) (2 * (mul (2 * (bin 1 0 0 0 1)) (bin)))))))
42. (equal (1 + 2 * (1 + 2 * (1 + 2 * (add (bin 1 0) (bin 1 0 0))))) (add (2 * (2 * (1 + 2 * (bin)))) (add (1 + 2 * (2 * (bin 1 0 0))) (2 * (add (bin 1 0 0 0 1) (mul (2 * (bin 1 0 0 0 1)) (bin)))))))
43. (equal (1 + 2 * (1 + 2 * (1 + 2 * (add (bin 1 0) (bin 1 0 0))))) (add (2 * (2 * (1 + 2 * (bin)))) (1 + 2 * (add (2 * (bin 1 0 0)) (add (bin 1 0 0 0 1) (mul (2 * (bin 1 0 0 0 1)) (bin)))))))
44. (equal (1 + 2 * (1 + 2 * (1 + 2 * (add (bin 1 0) (bin 1 0 0))))) (1 + 2 * (add (2 * (1 + 2 * (bin))) (add (2 * (bin 1 0 0)) (add (bin 1 0 0 0 1) (mul (2 * (bin 1 0 0 0 1)) (bin)))))))
45. (equal (1 + 2 * (1 + 2 * (add (bin 1 0) (bin 1 0 0)))) (add (2 * (1 + 2 * (bin))) (add (2 * (bin 1 0 0)) (add (bin 1 0 0 0 1) (mul (2 * (bin 1 0 0 0 1)) (bin))))))
46. (equal (1 + 2 * (1 + 2 * (add (2 * (bin 1)) (bin 1 0 0)))) (add (2 * (1 + 2 * (bin))) (add (2 * (bin 1 0 0)) (add (bin 1 0 0 0 1) (mul (2 * (bin 1 0 0 0 1)) (bin))))))
47. (equal (1 + 2 * (1 + 2 * (add (2 * (bin 1)) (2 * (bin 1 0))))) (add (2 * (1 + 2 * (bin))) (add (2 * (bin 1 0 0)) (add (bin 1 0 0 0 1) (mul (2 * (bin 1 0 0 0 1)) (bin))))))
48. (equal (1 + 2 * (1 + 2 * (2 * (add (bin 1) (bin 1 0))))) (add (2 * (1 + 2 * (bin))) (add (2 * (bin 1 0 0)) (add (bin 1 0 0 0 1) (mul (2 * (bin 1 0 0 0 1)) (bin))))))
49. (equal (1 + 2 * (1 + 2 * (2 * (add (bin 1) (bin 1 0))))) (add (2 * (1 + 2 * 0)) (add (2 * (bin 1 0 0)) (add (bin 1 0 0 0 1) (mul (2 * (bin 1 0 0 0 1)) (bin))))))
50. (equal (1 + 2 * (1 + 2 * (2 * (add (bin 1) (bin 1 0))))) (add (2 * (1 + 2 * 0)) (add (2 * (2 * (bin 1 0))) (add (bin 1 0 0 0 1) (mul (2 * (bin 1 0 0 0 1)) (bin))))))
51. (equal (1 + 2 * (1 + 2 * (2 * (add (bin 1) (bin 1 0))))) (add (2 * (1 + 2 * 0)) (add (2 * (2 * (bin 1 0))) (add (1 + 2 * (bin 1 0 0 0)) (mul (2 * (bin 1 0 0 0 1)) (bin))))))
52. (equal (1 + 2 * (1 + 2 * (2 * (add (1 + 2 * (bin)) (bin 1 0))))) (add (2 * (1 + 2 * 0)) (add (2 * (2 * (bin 1 0))) (add (1 + 2 * (bin 1 0 0 0)) (mul (2 * (bin 1 0 0 0 1)) (bin))))))
53. (equal (1 + 2 * (1 + 2 * (2 * (add (1 + 2 * (bin)) (2 * (bin 1)))))) (add (2 * (1 + 2 * 0)) (add (2 * (2 * (bin 1 0))) (add (1 + 2 * (bin 1 0 0 0)) (mul (2 * (bin 1 0 0 0 1)) (bin))))))
54. (equal (1 + 2 * (1 + 2 * (2 * (1 + 2 * (add (bin) (bin 1)))))) (add (2 * (1 + 2 * 0)) (add (2 * (2 * (bin 1 0))) (add (1 + 2 * (bin 1 0 0 0)) (mul (2 * (bin 1 0 0 0 1)) (bin))))))
55. (equal (1 + 2 * (1 + 2 * (2 * (1 + 2 * (add (bin) (bin 1)))))) (add (2 * (1 + 2 * 0)) (add (2 * (2 * (2 * (bin 1)))) (add (1 + 2 * (bin 1 0 0 0)) (mul (2 * (bin 1 0 0 0 1)) (bin))))))
56. (equal (1 + 2 * (1 + 2 * (2 * (1 + 2 * (add (bin) (bin 1)))))) (add (2 * (1 + 2 * 0)) (add (2 * (2 * (2 * (bin 1)))) (add (1 + 2 * (2 * (bin 1 0 0))) (mul (2 * (bin 1 0 0 0 1)) (bin))))))
57. (equal (1 + 2 * (1 + 2 * (2 * (1 + 2 * (add (bin) (bin 1)))))) (add (2 * (1 + 2 * 0)) (add (2 * (2 * (2 * (bin 1)))) (add (1 + 2 * (2 * (bin 1 0 0))) (mul (2 * (bin 1 0 0 0 1)) 0)))))
58. (equal (1 + 2 * (1 + 2 * (2 * (1 + 2 * (add (bin) (bin 1)))))) (add (2 * (1 + 2 * 0)) (add (2 * (2 * (2 * (bin 1)))) (add (1 + 2 * (2 * (bin 1 0 0))) 0))))
59. (equal (1 + 2 * (1 + 2 * (2 * (1 + 2 * (add (bin) (bin 1)))))) (add (2 * (1 + 2 * 0)) (add (2 * (2 * (2 * (bin 1)))) (1 + 2 * (2 * (bin 1 0 0))))))
60. (equal (1 + 2 * (1 + 2 * (2 * (1 + 2 * (add (bin) (bin 1)))))) (add (2 * (1 + 2 * 0)) (1 + 2 * (add (2 * (2 * (bin 1))) (2 * (bin 1 0 0))))))
61. (equal (1 + 2 * (1 + 2 * (2 * (1 + 2 * (add (bin) (bin 1)))))) (1 + 2 * (add (1 + 2 * 0) (add (2 * (2 * (bin 1))) (2 * (bin 1 0 0))))))
62. (equal (1 + 2 * (2 * (1 + 2 * (add (bin) (bin 1))))) (add (1 + 2 * 0) (add (2 * (2 * (bin 1))) (2 * (bin 1 0 0)))))
63. (equal (1 + 2 * (2 * (1 + 2 * (add (bin) (bin 1))))) (add (1 + 2 * 0) (2 * (add (2 * (bin 1)) (bin 1 0 0)))))
64. (equal (1 + 2 * (2 * (1 + 2 * (add (bin) (bin 1))))) (1 + 2 * (add 0 (add (2 * (bin 1)) (bin 1 0 0)))))
65. (equal (2 * (1 + 2 * (add (bin) (bin 1)))) (add 0 (add (2 * (bin 1)) (bin 1 0 0))))
66. (equal (2 * (1 + 2 * (add (bin) (bin 1)))) (add 0 (add (2 * (bin 1)) (2 * (bin 1 0)))))
67. (equal (2 * (1 + 2 * (add (bin) (bin 1)))) (add 0 (2 * (add (bin 1) (bin 1 0)))))
68. (equal (2 * (1 + 2 * (add (bin) (bin 1)))) (2 * (add (bin 1) (bin 1 0))))
69. (equal (1 + 2 * (add (bin) (bin 1))) (add (bin 1) (bin 1 0)))
70. (equal (1 + 2 * (add (bin) (bin 1))) (add (1 + 2 * (bin)) (bin 1 0)))
71. (equal (1 + 2 * (add (bin) (bin 1))) (add (1 + 2 * (bin)) (2 * (bin 1))))
72. (equal (1 + 2 * (add (bin) (bin 1))) (1 + 2 * (add (bin) (bin 1))))
73. (equal (add (bin) (bin 1)) (add (bin) (bin 1)))
74. (equal (add 0 (bin 1)) (add (bin) (bin 1)))
75. (equal (add 0 (1 + 2 * (bin))) (add (bin) (bin 1)))
76. (equal (1 + 2 * (bin)) (add (bin) (bin 1)))
77. (equal (1 + 2 * 0) (add (bin) (bin 1)))
78. (equal (1 + 2 * 0) (add 0 (bin 1)))
79. (equal (1 + 2 * 0) (add 0 (1 + 2 * (bin))))
80. (equal (1 + 2 * 0) (1 + 2 * (bin)))
81. (equal 0 (bin))
82. (equal 0 0)
83. true
true

Fibonacci Sequence

We can use the existing definitions in our natural numbers file to define the Fibonacci sequence:

/* ... natural number definitions (addition, equality, etc.) 
   ellided for brevity ... */

(def (fib 0)
  0)
(def (fib (S 0))
  (S 0))
(def (fib (S (S $n)))
  (add (fib $n)
       (fib (S $n))))

Let’s check that fib 9 = 34 = 4 + 3 * 10:

(equal // 34 = fib 9
       (fib 9)
       // 34 = 4 + 3 * (1 + 9)
       (add 4 (mul 3 (add 1 9))))

Computing this answer takes 639 steps:

$ cabal run utrl -- --defs test/programs/peano-fibonacci.defs --input test/programs/peano-fibonacci.input --trace | head
0. (equal (fib 9) (add 4 (mul 3 (add 1 9))))
1. (equal (fib (S 8)) (add 4 (mul 3 (add 1 9))))
2. (equal (fib (S 8)) (add (S 3) (mul 3 (add 1 9))))
3. (equal (fib (S (S 7))) (add (S 3) (mul 3 (add 1 9))))
4. (equal (add (fib 7) (fib (S 7))) (add (S 3) (mul 3 (add 1 9))))
5. (equal (add (fib (S 6)) (fib (S 7))) (add (S 3) (mul 3 (add 1 9))))
6. (equal (add (fib (S 6)) (fib (S 7))) (add (S (S 2)) (mul 3 (add 1 9))))
7. (equal (add (fib (S 6)) (fib (S 7))) (add (S (S 2)) (mul (S 2) (add 1 9))))
8. (equal (add (fib (S (S 5))) (fib (S 7))) (add (S (S 2)) (mul (S 2) (add 1 9))))
9. (equal (add (add (fib 5) (fib (S 5))) (fib (S 7))) (add (S (S 2)) (mul (S 2) (add 1 9))))
$ cabal run utrl -- --defs test/programs/peano-fibonacci.defs --input test/programs/peano-fibonacci.input --trace | tail
631. (equal (S (S (S (S (S 0))))) (add (S (S (S (S 0)))) (S 0)))
632. (equal (S (S (S (S (S 0))))) (S (add (S (S (S (S 0)))) 0)))
633. (equal (S (S (S (S 0)))) (add (S (S (S (S 0)))) 0))
634. (equal (S (S (S (S 0)))) (S (S (S (S 0)))))
635. (equal (S (S (S 0))) (S (S (S 0))))
636. (equal (S (S 0)) (S (S 0)))
637. (equal (S 0) (S 0))
638. (equal 0 0)
639. true
true

Functional list processing

We had to go through great lengths to define natural numbers and binary numbers. It might be surprising then to learn that list processing functions (head, take, repeat, map, foldr, zip-with) are actually quite simple to implement:

/* ... natural number definitions (addition, equality, etc.) 
   ellided for brevity ... */

(def (head (cons $a $d))
  (some $a))
(def (head nil)
  none)

(def (take 0 $c)
  nil)
(def (take (S $n) (cons $a $d))
  (cons $a (take $n $d)))

(def (repeat $x)
  (cons $x (repeat $x)))

(def (map $f nil)
  nil)
(def (map $f (cons $a $d))
  (cons ($f $a)
        (map $f $d)))

(def (foldr $f $z nil)
  $z)
(def (foldr $f $z (cons $a $d))
  ($f $a (foldr $f $z $d)))

(def (zip-with $f (cons $a1 $d1) nil)
  nil)
(def (zip-with $f nil (cons $a2 $d2))
  nil)
(def (zip-with $f (cons $a1 $d1) (cons $a2 $d2))
  (cons ($f $a1 $a2)
        (zip-with $f $d1 $d2)))

Let’s use this to prove that 4 * (2 + 3) is equivalent to summing the first four elements of an infinite list produced by summing together the infinite lists [2, 2, 2, ...] and [3, 3, 3, ...]:

(equal
  (mul 4 (add 2 3))
  (foldr add 0
    (take 4
      (zip-with add
        (repeat 2)
        (repeat 3)))))

It takes 189 steps to compute the answer:

$ cabal run utrl -- --defs test/programs/list-processing.defs --input test/programs/list-processing.input --trace | head
0. (equal (mul 4 (add 2 3)) (foldr add 0 (take 4 (zip-with add (repeat 2) (repeat 3)))))
1. (equal (mul (S 3) (add 2 3)) (foldr add 0 (take 4 (zip-with add (repeat 2) (repeat 3)))))
2. (equal (mul (S (S 2)) (add 2 3)) (foldr add 0 (take 4 (zip-with add (repeat 2) (repeat 3)))))
3. (equal (mul (S (S 2)) (add (S 1) 3)) (foldr add 0 (take 4 (zip-with add (repeat 2) (repeat 3)))))
4. (equal (mul (S (S 2)) (add (S 1) (S 2))) (foldr add 0 (take 4 (zip-with add (repeat 2) (repeat 3)))))
5. (equal (mul (S (S 2)) (S (add (S 1) 2))) (foldr add 0 (take 4 (zip-with add (repeat 2) (repeat 3)))))
6. (equal (add (S (S 2)) (mul (S (S 2)) (add (S 1) 2))) (foldr add 0 (take 4 (zip-with add (repeat 2) (repeat 3)))))
7. (equal (add (S (S 2)) (mul (S (S 2)) (add (S 1) 2))) (foldr add 0 (take (S 3) (zip-with add (repeat 2) (repeat 3)))))
8. (equal (add (S (S (S 1))) (mul (S (S 2)) (add (S 1) 2))) (foldr add 0 (take (S 3) (zip-with add (repeat 2) (repeat 3)))))
9. (equal (add (S (S (S 1))) (mul (S (S 2)) (add (S 1) (S 1)))) (foldr add 0 (take (S 3) (zip-with add (repeat 2) (repeat 3)))))
$ cabal run utrl -- --defs test/programs/list-processing.defs --input test/programs/list-processing.input --trace | tail
181. (equal (S (add (S (S (S (S 0)))) 0)) (add (S (S (S (S (S 0))))) 0))
182. (equal (S (add (S (S (S (S 0)))) 0)) (S (S (S (S (S 0))))))
183. (equal (add (S (S (S (S 0)))) 0) (S (S (S (S 0)))))
184. (equal (S (S (S (S 0)))) (S (S (S (S 0)))))
185. (equal (S (S (S 0))) (S (S (S 0))))
186. (equal (S (S 0)) (S (S 0)))
187. (equal (S 0) (S 0))
188. (equal 0 0)
189. true
true

Installation

To run utrl programs you’ll need the interpreter, which can be downloaded using git:

git clone https://github.com/MichaelMMacLeod/utrl.git
cd utrl

I use Nix to manage dependencies and the build process. Nix version 2.8 (released on 2022-04-19) or greater is required. To build and test the interpreter use:

nix --extra-experimental-features 'nix-command flakes' build

This installs the interpreter to ./result/bin/utrl. To run it, use:

./result/bin/utrl --help

A development environment containing VSCodium with Haskell/Nix extensions pre-installed, along with the Haskell build tool Cabal and compiler GHC can be entered using:

nix --extra-experimental-features 'nix-command flakes' develop

These are some available commands inside the development environment:

codium                   # open VSCodium IDE
cabal build              # build interpreter
cabal test               # run tests
cabal run utrl -- --help # run interpreter

It’s a good idea when developing to use cabal build instead of nix build; the former will selectively build only the parts of the code that have changed, while the latter will always perform a full-rebuild.

Implementation Details

Compilation Process

Let’s consider the following simple program and walk through how it is compiled:

(def (copy $xs ..) ($xs ..))

The file is parsed via parser combinators written in megaparsec into a list of values of type Ast0, one for each definition in the program being compiled. AST is short for Abstract Syntax Tree, and the zero on the end communicates that this is the first such tree. As we will see, there is more than just one type of abstract syntax tree.
```
-- File Ast0.hs:

data Ast
  = Symbol Text
  | Compound [Ast]
```
The definition of Ast0 closely resembles the source program. I chose the word “Compound” instead of “List” because there are too many things named “List” already in Haskell and I didn’t want to run into name-clashes.
Each Ast0 is transformed into two different Ast0 values: one for the pattern, and one for the constructor. In our example this means we separate the definition into (copy $xs ..), the pattern part, and ($xs ..), the constructor part.

Every pattern and constructor Ast0 values are transformed into Ast1 values, which make ellipses explicit.

-- File Ast1.hs

data Ast
  = Symbol Text
  | Compound [Ast]
  | Ellipses Ast

This change might not seem like much, but it makes the subsequent steps much easier. Having a separate variant for ellipses means we don’t have to check for the presence of ellipses; we simply have ellipses, or we don’t.

Here is a textual representation of the generated Ast1 values from our example:

$ cabal run utrl -- --defs copy.defs --emit '1'
error[E013]: [--emit] generated code for stage 1 (pattern part)
copy.defs:1:6
   |
 1 | (def (copy $xs ..) ($xs ..))
   |      ^ (copy {Ellipses $xs})

error[E013]: [--emit] generated code for stage 1 (constructor part)
copy.defs:1:20
   |
 1 | (def (copy $xs ..) ($xs ..))
   |                    ^ ({Ellipses $xs})

All Ast1 pattern values are transformed into AstP0 values, a type which makes explicit the terms preceding, at, and following each ellipsis. The P in AstP0 stands for “pattern”. This is where the compilation process for patterns end; the AstP0 values are used directly in the interpreter as-is. Future work could be done to compile these further.
```
-- File AstP0.hs

data Ast
  = Symbol Text
  | CompoundWithoutEllipses [Ast]
  | CompoundWithEllipses AstP0CompoundWtihEllipses

data AstP0CompoundWtihEllipses = AstP0CompoundWtihEllipses
  { before :: [Ast],
    ellipses :: Ast,
    after :: [Ast]
  }
```
To radically simplify the implementation of the interpreter, I made the choice to limit pattern terms to each having one ellipsis per list. This is why it is possible to talk about the parts “before” and “after” an ellipsis, given that there is never going to be more than one of them in each list. This same design choice is made in other similar systems, such as Racket’s syntax-case.

Here is a textual representation of the generated AstP0 value from our example. Notice that copy is in the “before” field, as it precedes $x .., and there is nothing in the “after” field, as no terms follow the ellipses.
```
$ cabal run utrl -- --defs copy.defs --emit 'P0'
error[E013]: [--emit] generated code for stage P0
copy.defs:1:6
   |
 1 | (def (copy $xs ..) ($xs ..))
   |      ^ {Before = [copy], Ellipses = $xs, after = []}
```
All Ast1 constructor values are transformed into AstC0 values, a type which makes explicit the location in the pattern that each variable binds to. The C in AstC0 stands for “constructor”.
```
-- File AstC0.hs

data Ast
  = Symbol Text
  | Compound [Ast]
  | Ellipses Ast
  | Variable Index Text

type Index = [IndexElement]

data IndexElement
  = ZeroPlus Int
  | LenMinus Int
  | Between AstC0Between

data AstC0Between = AstC0Between
  { zeroPlus :: Int,
    lenMinus :: Int
  }
```
The only difference between Ast1 and AstC0 is the introduction of the Index type on the Variable variant. This is a special kind of index that requires some explanation.

Programming languages across the board define array access syntax such as xs[n] to access the nth element of an array. These are commonly zero-based indices, meaning that xs[0] retrieves the first element of xs. Some programming languages such as Python also define array access syntax that starts from the end. In Python, xs[-1] accesses the last element of xs, and is essentially a simplified way of writing xs[xs.length-1], i.e., the -1 represents the length of the array, minus one.

In patterns that do not have any ellipses it is sufficient to use zero-based indexing. For example, in the pattern (copy $x) we can denote the location of copy as zero and the location of $x as one. This is represented by the ZeroPlus variant of the IndexElement type.

One difference from syntax such as xs[n] is that we are indexing into trees, not arrays. This means we may need more than one index if our target is located anywhere other than at the surface of the tree. For example, the symbol x is at tree-index [1,0] in the following term: (a (x b) c). This is why Index is defined as a list of IndexElements, and not simply as a single IndexElement.

Some patterns we may encounter include symbols or variables that occur after an ellipsis. For example, the variable $y occurs after an ellipsis in the pattern (copy $xs .. $y). There is no way to use zero-based indexing to refer to the location of $y given that it is unknown at compile time how many elements will bind to $xs. What we can do instead is use Python-like negative array indexing to denote the location of $y as the length of the input matched against the pattern minus one. This gives us a simple way to locate $y given that the length of the input can be readily computed at runtime. This is represented by the LenMinus variant of the IndexElement type.

When faced with compiling constructors with variables bound in patterns such as (copy $xs ..), we must use a combination of zero-based and Python-like negative array indexing. In the list (copy $xs ..), while the symbol copy is at index zero, the variable $xs matches all indices starting at one and ending at the length of the list (exclusive). We can represent this by saying that $xs has two indices: one, and length minus zero (exclusive). These describe the range of indices that $xs will bind to at runtime, and is represented by the Between variant of the IndexElement type.

You may wonder how we refer to variables that occur between two ellipses, such as $y in ($x .. $y $z ..). The answer is that we disallow such patterns altogether. This makes our lives considerably easier, and is the same trade-off made by other similar systems. If this rule is violated you will receive a compilation error:
```
$ cabal run utrl -- --defs ./test/programs/error-pattern-3-ellipses.defs 
error[E005]: too many ellipses in single term of pattern
./test/programs/error-pattern-3-ellipses.defs:1:6
   |
 1 | (def ($x .. $y .. $z ..) constructor)
   |      ^^^^^^^^^^^^^^^^^^^ this term has more than one ellipsis
./test/programs/error-pattern-3-ellipses.defs:1:16
   |
 1 | (def ($x .. $y .. $z ..) constructor)
   |                ^^ ellipsis #2
./test/programs/error-pattern-3-ellipses.defs:1:22
   |
 1 | (def ($x .. $y .. $z ..) constructor)
   |                      ^^ ellipsis #3
help: each term in a definitions's pattern may have at most one ellipsis
```
Here is a textual representation of the AstC0 value in our running example:
```
$ cabal run utrl -- --defs copy.defs --emit 'C0'
error[E013]: [--emit] generated code for stage C0
copy.defs:1:20
   |
 1 | (def (copy $xs ..) ($xs ..))
   |                    ^ ({Ellipses {$xs in input[1..len]}})
```
The dots in the index represent the Between variant, with zeroPlus = 1 and lenMinus = 0. Because $xs is located at the surface level of the tree, we only need one IndexElement to properly identify it.
All AstC0 values are transformed into AstC1 values, a type which replaces ellipses and variables with a series of for-loops and assignments. Here is a textual representation of our example’s AstC1 code:
```
$ cabal run utrl -- --defs copy.defs --emit 'C1'
error[E013]: [--emit] generated code for stage C1
copy.defs:1:20
   |
 1 | (def (copy $xs ..) ($xs ..))
   |                    ^ ({let $1 = input; {for $0 in $1[1..len]; $0}})
```
What were once ellipses are now for loops (actually, foreach loops), and surface-level variables such as $xs have been replaced with generated variables with numeric names such as $0 and $1. These new variables are assigned to each other or to the input supplied to a definition (denoted by input). In our example, the generated variable $1 is assigned to the input, and then $0 is assigned, in a for loop, to the terms at indices 1..len(gth of $1) (exclusive). The body of the for loop is $0, which has the effect of inserting each term bound to $0 into the surrounding context. Because the entire {let $1 = input; ... $0 } is surrounded in parentheses, this “context” is a list, and thus the result is to fill the list with all the terms matched to our original variable $xs.

The code for this stage, due to some issues I ran into with Haskell’s record types, is split across multiple files and is rather verbose, so I will not be including it here. It can be found at this link if you are curious.
All AstC1 values are transformed into AstC2 values, a custom assembly language that at runtime will be used as-is in the interpreter. The for-loops found in AstC1 are replaced with conditional jumps and assignment statements. Here is a textual representation of our example’s AstC2 code:
```
$ cabal run utrl -- --defs copy.defs --emit 'C2'
error[E013]: [--emit] generated code for stage C2
copy.defs:1:20
   |
 1 | (def (copy $xs ..) ($xs ..))
   |                    ^ 0.      $2 = 0
   |                      1.      $1 = input
   |                      2.      $3 = 1
   |                      3.      $4 = $1.length - 0
   |                      4.      jump 9 if True
   |                      5.      $0 = $1[$3]
   |                      6.      push $0
   |                      7.      $3 = $3 + 1
   |                      8.      $2 = $2 + 1
   |                      9.      jump 5 if $3 < $4
   |                      10.     build $2
```
There are four different kinds of assembly instructions.
1. Assignment instructions of the form <var> = <expr>. These evaluate the right hand side expression and assign it to a variable. The interpreter contains an array whose elements store the values of variables, so the statement $2 = 0 assigns the integer zero to the third element of the variable-array.
2. Conditional jump instructions of the form jump <line-number> if <condition>. These evaluate the condition expression and, if true, instructs the interpreter to continue evaluation at the given line number. If the condition is false the interpreter treats the jump instruction as a no-op and continues to the next line.
3. Push instructions of the form push <expr>. These evaluate the given expression and store its result at the end of a dynamically-expanding array. When the interpreter begins evaluating AstC2 code it creates this (initially empty) array and uses the same one for every push instruction. There is no way to explicitly refer to this array, and there are no expressions that can access elements from it. The only way to interact with it is through pushing elements onto it via push <expr> and, as explained below, via build <expr>.
4. Build instructions of the form build <expr>. The expression is evaluated; its result is guaranteed to be an integer due to how AstC2 code is generated. The build instruction then takes this integer and removes that number of elements from the end of an array, the same array pushed to via push instructions. Once the elements are removed they are collected into a list which is then pushed back onto the end of the array.
The result of running a program is the first element on the stack (as described in (3) and (4)) when the interpreter finishes processing every instruction. It is possible to write programs that run forever, in which case there is no result. For example, the following program when receiving the input LOOP will only halt when the interpreter process runs out of available memory to allocate:
```
(def LOOP (LOOP LOOP))
```
Here’s what happens when you try to run this program:
```
$ cabal run utrl -- --defs loop.defs --input input.txt --trace
0. LOOP
1. (LOOP LOOP)
2. ((LOOP LOOP) LOOP)
3. ((LOOP LOOP) (LOOP LOOP))
4. (((LOOP LOOP) LOOP) (LOOP LOOP))
5. (((LOOP LOOP) (LOOP LOOP)) (LOOP LOOP))
6. (((LOOP LOOP) (LOOP LOOP)) ((LOOP LOOP) LOOP))
7. (((LOOP LOOP) (LOOP LOOP)) ((LOOP LOOP) (LOOP LOOP)))
8. ((((LOOP LOOP) LOOP) (LOOP LOOP)) ((LOOP LOOP) (LOOP LOOP)))
9. ((((LOOP LOOP) (LOOP LOOP)) (LOOP LOOP)) ((LOOP LOOP) (LOOP LOOP)))
(... this goes on almost forever ...)
```

Recursion Schemes

A recursion scheme is a function that implements a common recursive pattern. This project makes frequent use of these functions. The most commonly used recursion scheme in this project is cata (I count 37 uses), followed para (5 uses), and lastly ana (1 use).

The recursion scheme cata implements a bottom-up traversal of a data structure. I use cata to do the following:

Return the set of all variables in an AST.
Attach line numbers to AstC2 statements (a zip-like operation).
Implement functions which convert various ASTs into Text. See Display.hs for the code.
Implement most compilation steps, e.g., Ast0 -> Ast1 and Ast1 -> AstC0. See Compile.hs for the code.
Implement most analysis steps, e.g., the step that checks that there are no undefined variables and the step that checks that each variable is not defined more than once. See Analyze.hs for the code.

The recursion scheme para is like cata except that it keeps extra information around during the bottom-up traversal about the data that was previously processed. I use para to do the following:

Compile AstC1 to AstC2. We need the extra information here when compiling lists (which may contain loops) to count the number of loops inside each list. That number is used to help determine how big each list (at runtime) will be, so we can accurately allocate space for each element. If we used cata, while we would have access to the AstC2 assembly statements corresponding to each loop, we would not have access to the original AstC1 which produced those instructions. This means that we would be unable to easily tell if the assembly statements were compiled from loops or not, and thus could not easily count the number of loops.
Implement the analyzeEllipsesAppliedToSymbols (See Analyze.hs) analysis step, which checks for code like (x ..) (the correct code would be ($x ..)). We need para here to get access to the symbols inside the ellipses variant of Ast1. Without para, while we would have access to any errors produced from terms inside the ellipses, we wouldn’t have the original Ast1 that produced those errors.
Implement certain other compilation and analysis steps.

The recursion scheme ana is, in a sense, the opposite of cata: instead of consuming a data structure from the bottom-up, ana produces a data structure from the top down. The one use of ana in this project is to implement the function iterateMaybe :: (b -> Maybe b) -> b -> [b] which creates a list by repeatedly applying a function to a seed value until the function returns Nothing. This function, iterateMaybe, is what implements the main loop of the interpreter. We pass iterateMaybe a transition function (which applies one definition at a time), and receive a function which applies definitions over and over again until no definition matches. The list that this produces is the one that can be seen via passing the --trace command line flag. For example, consider this program:

// File nat.defs

// Addition
(def ($n + 0)      $n)
(def ($n + (S $m)) (S ($n + $m)))

// Equality
(def (0      == 0)      true)
(def ((S $n) == 0)      false)
(def (0      == (S $m)) false)
(def ((S $n) == (S $m)) ($n == $m))

// Peano natural numbers 1-5
(def 1 (S 0))
(def 2 (S 1))
(def 3 (S 2))
(def 4 (S 3))
(def 5 (S 4))

// File nat.input

(5 == (2 + 3)) // this should evaluate to 'true'

The list iterateMaybe produces using these definitions and input is 20 elements long, since it takes 20 applications of definitions to reduce the input to its final result:

$ cabal run utrl -- --defs nat.defs --input nat.input --trace
0. (5 == (2 + 3))
1. ((S 4) == (2 + 3))
2. ((S (S 3)) == (2 + 3))
3. ((S (S 3)) == ((S 1) + 3))
4. ((S (S 3)) == ((S 1) + (S 2)))
5. ((S (S 3)) == (S ((S 1) + 2)))
6. ((S 3) == ((S 1) + 2))
7. ((S (S 2)) == ((S 1) + 2))
8. ((S (S 2)) == ((S 1) + (S 1)))
9. ((S (S 2)) == (S ((S 1) + 1)))
10. ((S 2) == ((S 1) + 1))
11. ((S (S 1)) == ((S 1) + 1))
12. ((S (S 1)) == ((S 1) + (S 0)))
13. ((S (S 1)) == (S ((S 1) + 0)))
14. ((S 1) == ((S 1) + 0))
15. ((S 1) == (S 1))
16. (1 == 1)
17. ((S 0) == 1)
18. ((S 0) == (S 0))
19. (0 == 0)
20. true
true

Error Code Index

I spent a good bit of time and energy ensuring the compiler produces helpful, good-looking error messages. Each error message contains a description of the error, a source location, at least one annotated code block, and possibly some ‘help’ text at the end. All the following errors are compile errors (i.e., they occur only at compile time). In fact, there are no runtime errors other than the possibility of reaching an out-of-memory state. These errors are diagnosed in the Analyze.hs file.

Of particular interest is E003, a diagnosis of ‘zipping together’ variables which are bound to (possibly) different numbers of terms. The reason this is interesting is because—as far as I know—this is the only language which detects this error when the rewrite-rule/macro is defined, instead of when it is used. For example, consider the following Rust code:

macro_rules! bad_list {
    ( ($( $x:expr ),*) ($( $y:expr ),*) ) => {
        {
            let mut temp_vec = Vec::<(i32, &'static str)>::new();
            $(
                temp_vec.push(($x, $y));
            )*
            temp_vec
        }
    };
}

pub fn main() {
    bad_list!((1, 2, 3) ("a", "b", "c", "d", "e"));
}

This code defines a macro, bad_list, which attempts to zip together two input lists into a single vector. This, generally speaking, does not work, as there is no guarantee that the two lists being zipped together will be of the same length. Despite the fact that defining a macro this way is almost certainly a bad idea, Rust is okay with it, and will only signal an error when it is used incorrectly:

error: meta-variable `x` repeats 3 times, but `y` repeats 5 times
 --> <source>:5:14
  |
5 |               $(
  |  ______________^
6 | |                 temp_vec.push(($x, $y));
7 | |             )*
  | |_____________^

The fact that the error message mentions the macro definition site instead of the macro use site suggests that this is fundamentally a problem with the macro definition as opposed to its use. It is strange, then, that Rust does not detect this error when the macro is defined, but only when it is used with lists of differing length.

Another language, Racket (an inspiration for this language, as well as Rust), behaves similarly:

#lang racket

(define-syntax (bad-list stx)
  (syntax-case stx ()
    [(_ (x ...) (y ...))
     #'(list '(x . y) ...)]))

(bad-list (1 2 3) ("a" "b" "c" "d" "e"))

Just like in Rust, Racket is okay with this ill-formed macro, and will only signal an error when bad-list is used with lists of differing lengths:

test.rkt:5:13: syntax: incompatible ellipsis match counts for template
  at: ...
  in: ((quote (x . y)) ...)
  location...:
  test.rkt:5:13

Again, we see that the error message mentions only the macro definition site, not its use site. Again, It is strange Racket does not detect this error when the macro is defined, but only when it is used with lists of differing length.

This language detects these kinds of problematic definitions before they are ever used. For example, consider the equivalent bad-list definition in utrl:

(def (bad-list ($x ..) ($y ..))
  (list ($x $y) ..))

The resulting error is simple, clean and easy to understand. It gets to the heart of the issue (two variables were matched under different ellipses but used with the same ellipsis) instead of complaining about “incompatible ellipsis match counts” or variables repeating differing amounts:

error[E003]: variables matched under different ellipses used with same ellipsis
./bad-list.defs:1:20
   |
 1 | (def (bad-list ($x ..) ($y ..))
   |                    ^^ $x matched under this ellipsis
./bad-list.defs:1:28
   |
 1 | (def (bad-list ($x ..) ($y ..))
   |                            ^^ $y matched under this ellipsis
./bad-list.defs:2:17
   |
 2 |   (list ($x $y) ..))
   |                 ^^ both used with this ellipsis
help: variables matched under different ellipses can't be used with the same ellipsis

E001

There are unbalanced parentheses or unbalanced block-comments (/* and */).

Example (missing right parenthesis)

(def A B

error[E001]: bad syntax
./test/programs/error-missing-right-paren.defs:1:9
   |
 1 | (def A B
   |         ^ unexpected end of input
   |           expecting '(' or ')'

E002

There is a definition with a variable used in the constructor with a different number of ellipses (..) than it was matched with in the pattern.

Example (too few ellipses)

(def ($x ..) $x)

error[E002]: too few ellipses
./test/programs/error-ellipses-1-0.defs:1:7
   |
 1 | (def ($x ..) $x)
   |       ^^ matched with 1 ellipsis in pattern
./test/programs/error-ellipses-1-0.defs:1:14
   |
 1 | (def ($x ..) $x)
   |              ^^ used with 0 ellipses in constructor
help: variables must be used with the same number of ellipses they were matched with

Example (too many ellipses)

(def (flatten (list (list $x ..) ..))
  (list $x .. .. ..))

error[E002]: too many ellipses
./test/programs/error-ellipses-2-3.defs:1:27
   |
 1 | (def (flatten (list (list $x ..) ..))
   |                           ^^ matched with 2 ellipses in pattern
./test/programs/error-ellipses-2-3.defs:2:9
   |
 2 |   (list $x .. .. ..))
   |         ^^ used with 3 ellipses in constructor
help: variables must be used with the same number of ellipses they were matched with

E003

There are two variables that are used with the same ellipsis, but not matched under the same ellipsis.

Example

(def (capture ($x ..) ($y ..))
  (result ($x $y) ..))

error[E003]: variables matched under different ellipses used with same ellipsis
./test/programs/error-captures-1-level.defs:1:19
   |
 1 | (def (capture ($x ..) ($y ..))
   |                   ^^ $x matched under this ellipsis
./test/programs/error-captures-1-level.defs:1:27
   |
 1 | (def (capture ($x ..) ($y ..))
   |                           ^^ $y matched under this ellipsis
./test/programs/error-captures-1-level.defs:2:19
   |
 2 |   (result ($x $y) ..))
   |                   ^^ both used with this ellipsis
help: variables matched under different ellipses can't be used with the same ellipsis

Example (multiple ellipses)

(def (capture (($x ..) ..) (($y ..) ..))
  (result ($x .. $y ..) ..))

error[E003]: variables matched under different ellipses used with same ellipsis
./test/programs/error-captures-2-level-inside.defs:1:24
   |
 1 | (def (capture (($x ..) ..) (($y ..) ..))
   |                        ^^ $x matched under this ellipsis
./test/programs/error-captures-2-level-inside.defs:1:37
   |
 1 | (def (capture (($x ..) ..) (($y ..) ..))
   |                                     ^^ $y matched under this ellipsis
./test/programs/error-captures-2-level-inside.defs:2:25
   |
 2 |   (result ($x .. $y ..) ..))
   |                         ^^ both used with this ellipsis
help: variables matched under different ellipses can't be used with the same ellipsis

E004

An ellipsis follows a symbol. Ellipses may only follow variables or lists which contain one or more variables.

Example

(def (copy (list x ..))
  (list x ..))

error[E004]: ellipsis following symbol
./test/programs/error-ellipsis-on-symbol.defs:1:18
   |
 1 | (def (copy (list x ..))
   |                  ^ this symbol doesn't begin with a dollar-sign ('$'),
   |                    so it is not considered a variable
help: perhaps you meant '$x'?

error[E004]: ellipsis following symbol
./test/programs/error-ellipsis-on-symbol.defs:2:9
   |
 2 |   (list x ..))
   |         ^ this symbol doesn't begin with a dollar-sign ('$'),
   |           so it is not considered a variable
help: perhaps you meant '$x'?

E005

There is more than one ellipsis in a list of some definition’s pattern.

Example

(def ($x .. $y .. $z ..) constructor)

error[E005]: too many ellipses in single term of pattern
./test/programs/error-pattern-3-ellipses.defs:1:6
   |
 1 | (def ($x .. $y .. $z ..) constructor)
   |      ^^^^^^^^^^^^^^^^^^^ this term has more than one ellipsis
./test/programs/error-pattern-3-ellipses.defs:1:16
   |
 1 | (def ($x .. $y .. $z ..) constructor)
   |                ^^ ellipsis #2
./test/programs/error-pattern-3-ellipses.defs:1:22
   |
 1 | (def ($x .. $y .. $z ..) constructor)
   |                      ^^ ellipsis #3
help: each term in a definitions's pattern may have at most one ellipsis

It is okay to have multiple ellipses in a definition’s pattern, but there cannot ever be more than one in each list in the pattern. The following is perfectly acceptable, as no single list contains more than one ellipsis:

(def (($x ..) ($y ..) $z ..) constructor)
//   (___________________..)     See how each
//    (___..)                 list contains only
//            (___..)         a single ellipsis?

This rule does not apply to constructors, which may have as many ellipses per list as desired. For example, here is a valid definition that uses three ellipses in a single list in its constructor:

(def
  // All the ellipses here must be in different lists,
  // as this is the definition's pattern.
  (flatten3 (list (list (list $x ..) ..) ..))

  // All the ellipses here can be in the same list,
  // as this is the definition's constructor.
  (list $x .. .. ..))

This restriction is may be lifted in future versions of utrl.

E006

The same variable occurs more than once in a definition’s pattern.

Example

(def (twice $x $x) constructor)

error[E006]: variable occurs more than once in pattern
./test/programs/error-var-pattern-use-2.defs:1:13
   |
 1 | (def (twice $x $x) constructor)
   |             ^^ use #1
./test/programs/error-var-pattern-use-2.defs:1:16
   |
 1 | (def (twice $x $x) constructor)
   |                ^^ use #2
help: a variable may occur at most once in a definition's pattern

E007

Two definitions exist which match the same term(s). A given input term may be matched by at most one definition. This prevents ambiguity about which definition to apply to a given input term.

Example (exactly the same pattern)

(def A Y)
(def A Z)

error[E007]: overlapping patterns
./test/programs/error-overlapping-patterns-simple.defs:1:6
   |
 1 | (def A Y)
   |      ^ this pattern may match the same term as ...
./test/programs/error-overlapping-patterns-simple.defs:2:6
   |
 2 | (def A Z)
   |      ^ ... this other pattern
help: patterns possibly matching the same term are not allowed

Example (overlapping via variables)

(def ($n 0) _)
(def (0 $m) _)

error[E007]: overlapping patterns
./test/programs/error-overlapping-patterns-variables.defs:1:6
   |
 1 | (def ($n 0) _)
   |      ^^^^^^ this pattern may match the same term as ...
./test/programs/error-overlapping-patterns-variables.defs:2:6
   |
 2 | (def (0 $m) _)
   |      ^^^^^^ ... this other pattern
help: patterns possibly matching the same term are not allowed

Example (overlapping via variables and ellipses)

(def (a b $x ..) _)
(def (a b $x .. c) _)

error[E007]: overlapping patterns
./test/programs/error-overlapping-patterns-multiple.defs:1:6
   |
 1 | (def (a b $x ..) _)
   |      ^^^^^^^^^^^ this pattern may match the same term as ...
./test/programs/error-overlapping-patterns-multiple.defs:2:6
   |
 2 | (def (a b $x .. c) _)
   |      ^^^^^^^^^^^^^ ... this other pattern
help: patterns possibly matching the same term are not allowed

E008

A term in either the pattern or constructor is followed by ellipses but contains no variables.

Example

(def pattern
  ((no vars here) ..))

error[E008]: no variables in term preceding ellipsis
./test/programs/error-no-vars-in-ellipsis-constructor.defs:2:19
   |
 2 |   ((no vars here) ..))
   |                   ^^ the ellipsis
./test/programs/error-no-vars-in-ellipsis-constructor.defs:2:4
   |
 2 |   ((no vars here) ..))
   |    ^^^^^^^^^^^^^^ the term preceding the ellipsis
help: there must be at least one variable in the term preceding an ellipsis

E009

A symbol is present at the top-level of the definitions.

Example

(def I_am_okay because_I_am_a_definition)
I_am_a_symbol_not_a_definition

error[E009]: expected definition, found symbol
./test/programs/error-def-is-symbol.defs:2:1
   |
 2 | I_am_a_symbol_not_a_definition
   | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ this should be a definition
help: definitions look like this: '(def <pattern> <constructor>)'

E010

A definition was found with too many or too few terms. Each definition must have exactly three terms: (def[1] <pattern>[2] <constructor>[3])

Example (too many terms)

(def a b c)

error[E010]: definition has too many terms
./test/programs/error-def-4-terms.defs:1:1
   |
 1 | (def a b c)
   | ^^^^^^^^^^^ has 4 terms, but should have 3
help: definitions look like this: '(def <pattern> <constructor>)'

Example (too few terms)

(def (blah (blah) blah))

error[E010]: definition has too few terms
./test/programs/error-def-2-terms.defs:1:1
   |
 1 | (def (blah (blah) blah))
   | ^^^^^^^^^^^^^^^^^^^^^^^^ has 2 terms, but should have 3
help: definitions look like this: '(def <pattern> <constructor>)'

E011

A definition did not start with def. All definitions must start with the symbol def.

Example

(DEF A B)

error[E011]: definition does not start with 'def'
./test/programs/error-def-missing-def.defs:1:2
   |
 1 | (DEF A B)
   |  ^^^ this term should be 'def'
help: definitions look like this: '(def <pattern> <constructor>)'

E012

A variable was used in the constructor that was not matched in the pattern. This may be due to a missing dollar-sign ($) prefix somewhere in the pattern.

Example

(def (copy x) $x)

error[E012]: variable not matched in pattern
./test/programs/error-var-not-matched.defs:1:15
   |
 1 | (def (copy x) $x)
   |               ^^ the variable
./test/programs/error-var-not-matched.defs:1:6
   |
 1 | (def (copy x) $x)
   |      ^^^^^^^^ the pattern
help: variables cannot be used without first being matched

Reflections on Haskell

I originally attempted to write this language in Rust, but switched over to Haskell after being unhappy with the clarity of the compilation process implementation. It became clear to me during development that the compilation process needed to be a series of rather complicated recursive traversals of various abstract syntax trees. Rust encourages implementing these traversals via ad hoc depth/breadth-first searches on data stored using some kind of container (such as a slotmap). In practice, this meant that I had to deal with a lot of busy work (managing slotmap keys, implementing and testing traversals, fighting the borrow checker) for not much benefit.

Switching to Haskell, I now no longer needed to worry as much about data lifetimes and could substitute ad hoc breadth/depth-first search implementations with functions from the recursion-schemes package. The code became simpler, clearer, and more maintainable.

Overall, Haskell is an excellent language. My favorite features are its algebraic data types, type classes, and type inference. That being said, Haskell is older than Java, so some rough edges are to be expected. Here are some I cut myself on.

Record Field Name Clashes

Consider the following Haskell code:

data Person = Person { name :: String, age :: Int }
data Animal = Animal { name :: String, age :: Int }

In any other language this would be no problem at all. We could imagine that if bob :: Person and bingo :: Animal then we could access their names via bob.name and bingo.name, and their ages via bob.age and bingo.age. It’s not so simple in Haskell land, where the dot (.) is reserved for function composition. Instead of bob.name and bingo.name, we use name bob and name bingo, where name :: Person -> String and name :: Animal -> String. This is similar to how some languages like Rust allow name-qualified field accesses such as Person::name(bob) and Animal::name(bingo).

Unfortunately, Haskell doesn’t introduce each name and age function into their own nested namespaces—it spits them out verbatim into the surrounding module:

data Person = Person { name :: String, age :: Int }
data Animal = Animal { name :: String, age :: Int }
                       ^^^^            ^^^ Multiple declarations of ‘age’
                       Multiple declarations of ‘name’

Herein lies the problem. It’s not possible in Haskell to define two record types which share a field name in the same module, because Haskell (sensibly) does not allow multiple function definitions with the same name. This is a well-worn topic with several diverging solutions. Here is a short list of the ones I considered when building this language:

Put the data types in different modules.

Advantages:
1. You get to keep the existing names.
2. Requires no dependencies or extra language extensions.
Disadvantages:
1. Impossible with mutually-recursive data definitions (i.e., data types defined in terms of each other). This is because Haskell disallows cyclic module importation.
2. Importing both data types and their fields into the same module requires qualification and/or aliasing, thereby increasing verbosity.
Use the DuplicateRecordFields GHC extension.

Advantages:
1. You get to keep the existing names.
2. Works with mutually-recursive data definitions.
3. No qualification/aliasing needed when importing data type and fields.
Disadvantages:
1. Increases technical debt (an extra language extension).
2. Ambiguity sometimes requires explicit type annotations.
Prefix field names with type names, as in:
```
data Person = Person { personName :: String, personAge :: Int }
data Animal = Animal { animalName :: String, animalAge :: Int }
```
Advantages:
1. Works with mutually-recursive data definitions.
2. Requires no dependencies or extra language extensions.
Disadvantages:
1. Increased verbosity.
2. Renaming the structure is more annoying now.

The simplest, most maintainable solution is arguably #3 as it is the only one that doesn’t increase technical debt, allows for mutually-recursive definitions, and is the least fragile. Unfortunately, I only realized this when I was knee-deep in #2. In addition to DuplicateRecordFields I turned on OverloadedRecordDot, which makes syntax like bob.name “just work”. This is convenient but, alas, increases technical debt; if I had to write this language again, I’d use solution #3.

Shadowing Woes

It is common when dealing with linearly-consumed resources to use shadowing, thereby eliminating the possibility of referencing consumed values. This pattern is encouraged in Rust:

let resource = create_resource();
let resource = transform_1(resource);
let resource = transform_2(resource);
...

In this example the transform_x functions consume resource; it would be an error to access resource once its ownership was passed to transform_x. By shadowing resource it becomes impossible to talk about what came before, thus eliminating the possibility that we might later accidentally refer to a consumed version.

This pattern is impossible in Haskell’s let and where clauses. The problem is that the x on the right-hand side of the assignment statement in let x = transform_1 x in ... in Haskell is the same x as the one on the left-hand side; Haskell’s let (and where) clauses are recursive, and there is no way to turn this off. This is not the case with Haskell’s left-arrow “bind” operation in do-blocks:

do
  resource <- createResource
  resource <- transform_1 resource
  resource <- transform_2 resource
  ...

This code works because bindings introduced via <- are not recursive. The resource on the right-hand side is the previous resource, not the resource on the left-hand side.

On a couple of occasions I needed to refactor do blocks into let and where clauses, and was bitten by the fact that what used to be shadowing was now recursion. This did not produce a compile error. Instead, it produced an infinite loop at runtime which would consume all of my computer’s memory and eventually crash. Even with some rudimentary “stack tracing” options turned on, Haskell’s laziness made tracking down these infinite loops frustratingly difficult.

Certain other programming languages such as OCaml, Lean, and Racket address this (and other) issue(s) with two variants of let clauses, namely let (non-recursive) and let rec (recursive).

My suggestion is to not use shadowing at all in Haskell (as is recommended by a GHC warning). If this pattern is really important to you, it might just be worth using the Identity monad to keep the non-recursive bindings in do blocks instead of refactoring code into let and where clauses.

Implementing a term-rewriting language in Haskell

Introduction

Definition Syntax

Symbols and Lists

Nested Lists

Evaluation

Nested Evaluation

Variables

Ellipses on Variables

Ellipses on Lists

Multiple Ellipses

Comments

Examples

Natural Numbers

Mergesort

Brainfuck interpreter

Binary Numbers

Fibonacci Sequence

Functional list processing

Installation

Implementation Details

Compilation Process

Recursion Schemes

Error Code Index

E001

Example (missing right parenthesis)

E002

Example (too few ellipses)

Example (too many ellipses)

E003

Example

Example (multiple ellipses)

E004

Example

E005

Example

E006

Example

E007

Example (exactly the same pattern)

Example (overlapping via variables)

Example (overlapping via variables and ellipses)

E008

Example

E009

Example

E010

Example (too many terms)

Example (too few terms)

E011

Example

E012

Example

Reflections on Haskell

Record Field Name Clashes

Shadowing Woes