Introduction
For some ungodly reason I decided to write a programming language. As the saying goes, we don’t do things because they are easy, but because we thought they were easy. Could this be a symptom of acute merger-induced joblessness? We might triangulate my motivation until the end of time, so here instead is a sentence: a language has been built.
The following sections describe this untyped, minimalistic,
non-strict, purely-functional term-rewriting language. Its name is
utrl
, an acronym which stands for “Untitled Term Rewriting
Language”.
Definition Syntax
A program is a set of definitions. Each definition has three parts:
the text def
, a pattern, and a constructor, the three of them
surrounded in parentheses. This is a program with two definitions:
(def my-pattern my-constructor)
(def A B)
To run a program, you must supply it with input. Parts of the input
that match the patterns specified in the program will be replaced with
the corresponding constructors. The program above replaces all
references to my-pattern
with my-constructor
and all references to
A
with B
. In what follows, the symbol “↦” will be used to denote
the application of a definition.
my-pattern ↦ my-constructor
(my-pattern my-pattern) ↦ (my-constructor my-constructor)
(my-pattern A) ↦ (my-constructor B)
If part of the input doesn’t match any of the patterns, it is not transformed.
not-my-pattern ↦ not-my-pattern
(not-my-pattern A) ↦ (not-my-pattern B)
Symbols and Lists
The text my-pattern
and my-constructor
, as well as A
and B
are
called symbols. Symbols cannot contain spaces or parentheses. The text
(my-pattern my-pattern)
is called a list. Lists may contain symbols
or other lists, and the elements of a list are separated by
whitespace.
Patterns and constructors may be lists, not just symbols.
(def (a list) (another list))
A list-pattern only matches if all of its elements match.
(a list) ↦ (another list)
(a different list) ↦ (a different list)
Nested Lists
Patterns and constructors may be arbitrarily complicated. When definitions become long enough, it makes sense to write the constructor on a separate line so that it is more clear where the pattern ends and the constructor begins.
(def (1 (2 (3 (4 (5 levels of parentheses)))))
that-is-a-lot)
Input is rewritten the same way, no matter where the whitespace is in the program.
(1 (2 (3 (4 (5 levels of parentheses)))))
↦ that-is-a-lot
Evaluation
If the output of a definition matches another definition, that other definition is applied.
(def Input Step1)
(def Step1 Step2)
(def Step2 Step3)
(def Step3 Done)
Definitions continue to be applied until there are no more matching definitions.
Input
↦ Step1
↦ Step2
↦ Step3
↦ Done
Nested Evaluation
If the input is a list, and no definition matches it, then the elements of the list are examined instead. Each element, from left to right, is matched against the available definitions. If a definition matches an element, the element is replaced accordingly. As soon as one element is replaced, the original list (with the replaced element) is considered again for matching.
(def Input (A A A))
(def A B)
(def (B B B) Done)
Input
↦ (A A A)
↦ (B A A)
↦ (B B A)
↦ (B B B)
↦ Done
Variables
Patterns and constructors may contain variables, which are symbols
that start with the dollar-sign character ($
). A variable matches
anything in the pattern, and its value in the constructor is the
portion of input it matched.
(def (greet $name) (Hello $name))
Variables make it possible to write definitions that apply in many circumstances.
(greet Alice) ↦ (Hello Alice)
(greet Bob) ↦ (Hello Bob)
(greet (Alice and Bob)) ↦ (Hello (Alice and Bob))
Ellipses on Variables
Variables may be followed by ellipses, which—for brevity—are
denoted by two periods (..
) instead of the usual three (...
). If a
variable has ellipses following it in the pattern, it must also have
ellipses following it in the constructor.
(def (greet-all $name ..)
(Hello to all $name ..))
A variable followed by ellipses matches zero or more portions of the input.
(greet-all) ↦ (Hello to all)
(greet-all Alice) ↦ (Hello to all Alice)
(greet-all Alice Bob) ↦ (Hello to all Alice Bob)
(greet-all Alice Bob Charlie) ↦ (Hello to all Alice Bob Charlie)
Ellipses on Lists
Lists may be followed by ellipses in the pattern. If a list is followed by an ellipsis, it must have at least one variable inside it.
(def (x-coordinates (Point $x $y) ..)
($x ..))
(def (y-coordinates (Point $x $y) ..)
($y ..))
Variables in such lists are bound to a sequence of values, one value for each time the list matches.
(x-coordinates (Point 1 2) (Point 3 4) (Point 5 6)) ↦ (1 3 5)
(y-coordinates (Point 1 2) (Point 3 4) (Point 5 6)) ↦ (2 4 6)
Lists may also be followed by ellipses in the constructor.
(def (swap-coordinates (Point $x $y) ..)
((Point $y $x) ..))
(swap-coordinates (Point a 0) (Point b 1))
↦ ((Point 0 a) (Point 1 b))
Multiple Ellipses
A variable or list may be surrounded by more than one ellipsis.
(def (flatten (list (list $x ..) ..))
(list $x .. ..))
This has the effect of binding variables to trees of values.
(flatten (list (list 1 2 3)
(list 4 5 6)
(list 7 8 9)))
↦ (list 1 2 3 4 5 6 7 8 9)
In the example above, the variable $x
is bound to the values in this
tree:
$x ─┐
│
┌─────┼─────┐
│ │ │
┌─┼─┐ ┌─┼─┐ ┌─┼─┐
│ │ │ │ │ │ │ │ │
1 2 3 4 5 6 7 8 9
Different levels of this tree can be extracted by changing the position of the ellipses in the constructor.
(def (example1 (($x ..) ..))
(($x ..) ..))
(def (example2 (($x ..) ..))
($x .. ..))
(example1 ((1 2 3) (4 5 6) (6 8 9)))
↦ ((1 2 3) (4 5 6) (7 8 9))
(example2 ((1 2 3) (4 5 6) (6 8 9)))
↦ (1 2 3 4 5 6 7 8 9)
Comments
Single line comments are written with //
and multi-line comments are
written with /*
and */
pairs.
(def /* Hello, world! */ A B) // rewrites A into B
Examples
A utrl
program is run by passing the interpreter two files: a definitions file containing the program, and an input file. The interpreter compiles the program—reporting any errors it finds along the way—and, if successful, processes the input according to the program’s definitions. This information can also be found by passing the --help
flag to the interpreter:
$ cabal run utrl -- --help
Usage: utrl --defs FILE [--input FILE] [--trace] [--emit STAGES]
Simple text tree rewriting engine language
Available options:
--defs FILE File containing definitions to compile
--input FILE File to process using compiled definitions. If no
file is provided, simply check the definitions for
errors.
--trace Display intermediate computation steps. Has no effect
without '--input'.
--emit STAGES Emit generated code at given stage. Possible values:
1, C0, C1, C2, P0. Supply multiple stages by
separating each by a comma, so to emit stages C0 and
C2, use --emit 'C0,C2'.
-h,--help Show this help text
What follows are some example definition and input files which demonstrate some of what can be accomplished in utrl
.
Natural Numbers
We can use utrl
to define simple operations on Peano natural numbers:
// File 'add.defs'
/* Addition */
(def ($n + 0) $n)
(def ($n + (S $m)) (S ($n + $m)))
/* Equality */
(def (0 == 0) true)
(def ((S $n) == 0) false)
(def (0 == (S $m)) false)
(def ((S $n) == (S $m)) ($n == $m))
/* Peano natural numbers 1-5 */
(def 1 (S 0))
(def 2 (S 1))
(def 3 (S 2))
(def 4 (S 3))
(def 5 (S 4))
// File 'add.input'
(5 == (2 + 3)) /* this should evaluate to 'true' */
$ cabal run utrl -- --defs ./add.defs --input ./add.input
true
$ # Good, that is what we wanted. '--trace' will show the steps:
$ cabal run utrl -- --defs ./add.defs --input ./add.input --trace
0. (5 == (2 + 3))
1. ((S 4) == (2 + 3))
2. ((S (S 3)) == (2 + 3))
3. ((S (S 3)) == ((S 1) + 3))
4. ((S (S 3)) == ((S 1) + (S 2)))
5. ((S (S 3)) == (S ((S 1) + 2)))
6. ((S 3) == ((S 1) + 2))
7. ((S (S 2)) == ((S 1) + 2))
8. ((S (S 2)) == ((S 1) + (S 1)))
9. ((S (S 2)) == (S ((S 1) + 1)))
10. ((S 2) == ((S 1) + 1))
11. ((S (S 1)) == ((S 1) + 1))
12. ((S (S 1)) == ((S 1) + (S 0)))
13. ((S (S 1)) == (S ((S 1) + 0)))
14. ((S 1) == ((S 1) + 0))
15. ((S 1) == (S 1))
16. (1 == 1)
17. ((S 0) == 1)
18. ((S 0) == (S 0))
19. (0 == 0)
20. true
true
Mergesort
With a bit of effort, it’s possible to write a mergesort algorithm using utrl
.
// File 'mergesort.defs'
// Sorts a list of natural numbers via merge sort
//
// sort : List Nat -> List Nat
(def (sort $xs)
(sort-rec (length $xs) nil $xs))
// Helper function for 'sort'. Partitions its third
// argument (a list) into two pieces, accumulating the
// first piece in the second argument. Returns the
// result of 'merge'ing the final two pieces together.
// The integer argument counts twice the number of
// list elements to split off.
//
// sort-rec : Nat -> List Nat -> List Nat -> List Nat
(def (sort-rec 0 nil nil)
nil)
(def (sort-rec 0 ($a :: nil) nil)
($a :: nil))
(def (sort-rec 0 nil ($x :: nil))
($x :: nil))
(def (sort-rec 0 ($a :: nil) ($x :: nil))
(merge ($a :: nil)
($x :: nil)))
(def (sort-rec 0 ($a :: nil) ($x :: ($y :: nil)))
(merge ($a :: nil)
(sort ($x :: ($y :: nil)))))
(def (sort-rec 0 ($a :: ($b :: $as)) ($x :: ($y :: $ys)))
(merge (sort ($a :: ($b :: $as)))
(sort ($x :: ($y :: $ys)))))
(def (sort-rec (S 0) $as $xs)
(sort-rec 0 $as $xs))
(def (sort-rec (S (S $n)) $as ($x :: $xs))
(sort-rec $n ($x :: $as) $xs))
// merge : List Nat -> List Nat -> List Nat
//
// Combines two lists, both in ascending order, into
// a single list in ascending order.
(def (merge nil ($y :: $ys))
($y :: $ys))
(def (merge ($x :: $xs) nil)
($x :: $xs))
(def (merge ($x :: $xs) ($y :: $ys))
(if ($x < $y)
then ($x :: (merge $xs ($y :: $ys)))
else ($y :: (merge ($x :: $xs) $ys))))
// Returns the length of a list
//
// length : List a -> Nat
(def (length nil)
0)
(def (length ($x :: $xs))
(S (length $xs)))
// 'If' statement syntax
(def (if true then $then else $else) $then)
(def (if false then $then else $else) $else)
// A couple translations of arabic numerals into Peano numbers.
// See: https://en.wikipedia.org/wiki/Peano_axioms
(def 1 (S 0))
(def 2 (S 1))
(def 3 (S 2))
(def 4 (S 3))
(def 5 (S 4))
(def 6 (S 5))
(def 7 (S 6))
(def 8 (S 7))
(def 9 (S 8))
// Equality of natural numbers
//
// (==) : Nat -> Nat -> Bool
(def (0 == 0)
true)
(def ((S $n) == 0)
false)
(def (0 == (S $m))
false)
(def ((S $n) == (S $m))
($n == $m))
// Equality of lists
//
// (==) : List a -> List a -> Bool
(def (nil == nil)
true)
(def (($x :: $xs) == nil)
false)
(def (nil == ($y :: $ys))
false)
(def (($x :: $xs) == ($y :: $ys))
(($x == $y) && ($xs == $ys)))
// Logical operator 'AND'
//
// (&&) : Bool -> Bool -> Bool
(def (true && $x) $x)
// 'less-than' operator on natural numbers
//
// (<) : Nat -> Nat -> Bool
(def (0 < 0)
false)
(def ((S $n) < 0)
false)
(def (0 < (S $m))
true)
(def ((S $n) < (S $m))
($n < $m))
// List construction syntax: creates cons cells
// from a flat list.
(def (list)
nil)
(def (list $x $xs ..)
($x :: (list $xs ..)))
// File 'mergesort.input'
((list 1 2 3) == (sort (list 3 2 1)))
$ time cabal run utrl -- --defs ./mergesort.defs --input ./mergesort.input
true
real 0m0.014s
user 0m0.008s
sys 0m0.004s
$ # Using '--trace' here produces a lot of output!
$ utrl --defs ./mergesort.defs --input ./mergesort.input --trace
0. ((list 1 2 3) == (sort (list 3 2 1)))
1. ((1 :: (list 2 3)) == (sort (list 3 2 1)))
2. ((1 :: (list 2 3)) == (sort-rec (length (list 3 2 1)) nil (list 3 2 1)))
3. (((S 0) :: (list 2 3)) == (sort-rec (length (list 3 2 1)) nil (list 3 2 1)))
4. (((S 0) :: (2 :: (list 3))) == (sort-rec (length (list 3 2 1)) nil (list 3 2 1)))
5. (((S 0) :: (2 :: (list 3))) == (sort-rec (length (list 3 2 1)) nil (3 :: (list 2 1))))
Intermediate steps ellided for brevity. '--trace' will actually
print every single step. Try it for yourself and see all the text
fly by!
60. (((S 2) :: nil) == ((S (S 1)) :: nil))
61. (((S 2) == (S (S 1))) && (nil == nil))
62. ((2 == (S 1)) && (nil == nil))
63. ((2 == (S 1)) && true)
64. (((S 1) == (S 1)) && true)
65. ((1 == 1) && true)
66. (((S 0) == 1) && true)
67. (((S 0) == (S 0)) && true)
68. ((0 == 0) && true)
69. (true && true)
70. true
true
Brainfuck interpreter
To help demonstrate Turing completeness, I wrote a Brainfuck interpreter in utrl
. The definitions file that implements the interpreter can be found in the test directory. It can be used to sort a list via quicksort written in Brainfuck.
Binary Numbers
We have seen it is possible to implement numbers via Peano arithmetic, but this was a rather inefficient number representation. Instead of representing the natural numbers in terms of a series of increments to zero, we can represent them as either (1) zero, (2) two times some natural number, or (3) one plus two times some natural number. These correspond to (1) the binary number 0
, (2) the act of taking a binary number n
and putting a zero on the right-hand side, as in n0
, and (3) the act of taking a binary number n
and putting a one on the right-hand side, as in n1
. This strategy requires only a factor of log_2(n)
(as opposed to n
) bits to represent the number n
. Here is a utrl
program that implements basic binary arithmetic:
// Converts a flat list of bits into our internal binary number
// representation:
//
// (bin 1 0 1) --> 1 + 2 * (2 * (1 + 2 * 0))
(def (bin)
0)
(def (bin $b .. 0)
(2 * (bin $b ..)))
(def (bin $b .. 1)
(1 + 2 * (bin $b ..)))
// add : Bin -> Bin -> Bin
(def (add 0 0)
0)
(def (add (2 * $n) 0)
(2 * $n))
(def (add 0 (2 * $m))
(2 * $m))
(def (add (1 + 2 * $n) 0)
(1 + 2 * $n))
(def (add 0 (1 + 2 * $m))
(1 + 2 * $m))
(def (add (2 * $n) (2 * $m))
(2 * (add $n $m)))
(def (add (1 + 2 * $n) (1 + 2 * $m))
(add (2 * (1 + 2 * 0))
(2 * (add $n $m))))
(def (add (1 + 2 * $n) (2 * $m))
(1 + 2 * (add $n $m)))
(def (add (2 * $n) (1 + 2 * $m))
(1 + 2 * (add $n $m)))
// mul : Bin -> Bin -> Bin
(def (mul $n 0)
0)
(def (mul $n (2 * $m))
(2 * (mul $n $m)))
(def (mul $n (1 + 2 * $m))
(add $n (2 * (mul $n $m))))
// equal : Bin -> Bin -> Bool
(def (equal 0 0)
true)
(def (equal (1 + 2 * $n) 0)
false)
(def (equal 0 (1 + 2 * $m))
false)
(def (equal (2 * $n) (1 + 2 * $m))
false)
(def (equal (1 + 2 * $n) (2 * $m))
false)
(def (equal (2 * $n) 0)
(equal $n 0))
(def (equal 0 (2 * $m))
(equal 0 $m))
(def (equal (2 * $n) (2 * $m))
(equal $n $m))
(def (equal (1 + 2 * $n) (1 + 2 * $m))
(equal $n $m))
Let’s test this code by computing 146 + 296 = 34 * 13
(in binary, of course):
// 1 1 0 1 1 1 0 1 0
(equal (add (bin 1 0 0 1 0 0 1 0)
(bin 1 0 0 1 0 1 0 0 0))
(mul (bin 1 0 0 0 1 0)
(bin 1 1 0 1)))
This computation requires 83 definition applications to produce the output true
. I’ve shown the entirety of it in full because I think it’s fun to see all the work that goes on under the hood:
$ cabal run utrl -- --defs test/programs/binary-arithmetic.defs --input test/programs/binary-arithmetic.input --trace
0. (equal (add (bin 1 0 0 1 0 0 1 0) (bin 1 0 0 1 0 1 0 0 0)) (mul (bin 1 0 0 0 1 0) (bin 1 1 0 1)))
1. (equal (add (2 * (bin 1 0 0 1 0 0 1)) (bin 1 0 0 1 0 1 0 0 0)) (mul (bin 1 0 0 0 1 0) (bin 1 1 0 1)))
2. (equal (add (2 * (bin 1 0 0 1 0 0 1)) (2 * (bin 1 0 0 1 0 1 0 0))) (mul (bin 1 0 0 0 1 0) (bin 1 1 0 1)))
3. (equal (2 * (add (bin 1 0 0 1 0 0 1) (bin 1 0 0 1 0 1 0 0))) (mul (bin 1 0 0 0 1 0) (bin 1 1 0 1)))
4. (equal (2 * (add (bin 1 0 0 1 0 0 1) (bin 1 0 0 1 0 1 0 0))) (mul (2 * (bin 1 0 0 0 1)) (bin 1 1 0 1)))
5. (equal (2 * (add (bin 1 0 0 1 0 0 1) (bin 1 0 0 1 0 1 0 0))) (mul (2 * (bin 1 0 0 0 1)) (1 + 2 * (bin 1 1 0))))
6. (equal (2 * (add (bin 1 0 0 1 0 0 1) (bin 1 0 0 1 0 1 0 0))) (add (2 * (bin 1 0 0 0 1)) (2 * (mul (2 * (bin 1 0 0 0 1)) (bin 1 1 0)))))
7. (equal (2 * (add (bin 1 0 0 1 0 0 1) (bin 1 0 0 1 0 1 0 0))) (2 * (add (bin 1 0 0 0 1) (mul (2 * (bin 1 0 0 0 1)) (bin 1 1 0)))))
8. (equal (add (bin 1 0 0 1 0 0 1) (bin 1 0 0 1 0 1 0 0)) (add (bin 1 0 0 0 1) (mul (2 * (bin 1 0 0 0 1)) (bin 1 1 0))))
9. (equal (add (1 + 2 * (bin 1 0 0 1 0 0)) (bin 1 0 0 1 0 1 0 0)) (add (bin 1 0 0 0 1) (mul (2 * (bin 1 0 0 0 1)) (bin 1 1 0))))
10. (equal (add (1 + 2 * (bin 1 0 0 1 0 0)) (2 * (bin 1 0 0 1 0 1 0))) (add (bin 1 0 0 0 1) (mul (2 * (bin 1 0 0 0 1)) (bin 1 1 0))))
11. (equal (1 + 2 * (add (bin 1 0 0 1 0 0) (bin 1 0 0 1 0 1 0))) (add (bin 1 0 0 0 1) (mul (2 * (bin 1 0 0 0 1)) (bin 1 1 0))))
12. (equal (1 + 2 * (add (bin 1 0 0 1 0 0) (bin 1 0 0 1 0 1 0))) (add (1 + 2 * (bin 1 0 0 0)) (mul (2 * (bin 1 0 0 0 1)) (bin 1 1 0))))
13. (equal (1 + 2 * (add (2 * (bin 1 0 0 1 0)) (bin 1 0 0 1 0 1 0))) (add (1 + 2 * (bin 1 0 0 0)) (mul (2 * (bin 1 0 0 0 1)) (bin 1 1 0))))
14. (equal (1 + 2 * (add (2 * (bin 1 0 0 1 0)) (2 * (bin 1 0 0 1 0 1)))) (add (1 + 2 * (bin 1 0 0 0)) (mul (2 * (bin 1 0 0 0 1)) (bin 1 1 0))))
15. (equal (1 + 2 * (2 * (add (bin 1 0 0 1 0) (bin 1 0 0 1 0 1)))) (add (1 + 2 * (bin 1 0 0 0)) (mul (2 * (bin 1 0 0 0 1)) (bin 1 1 0))))
16. (equal (1 + 2 * (2 * (add (bin 1 0 0 1 0) (bin 1 0 0 1 0 1)))) (add (1 + 2 * (2 * (bin 1 0 0))) (mul (2 * (bin 1 0 0 0 1)) (bin 1 1 0))))
17. (equal (1 + 2 * (2 * (add (bin 1 0 0 1 0) (bin 1 0 0 1 0 1)))) (add (1 + 2 * (2 * (bin 1 0 0))) (mul (2 * (bin 1 0 0 0 1)) (2 * (bin 1 1)))))
18. (equal (1 + 2 * (2 * (add (bin 1 0 0 1 0) (bin 1 0 0 1 0 1)))) (add (1 + 2 * (2 * (bin 1 0 0))) (2 * (mul (2 * (bin 1 0 0 0 1)) (bin 1 1)))))
19. (equal (1 + 2 * (2 * (add (bin 1 0 0 1 0) (bin 1 0 0 1 0 1)))) (1 + 2 * (add (2 * (bin 1 0 0)) (mul (2 * (bin 1 0 0 0 1)) (bin 1 1)))))
20. (equal (2 * (add (bin 1 0 0 1 0) (bin 1 0 0 1 0 1))) (add (2 * (bin 1 0 0)) (mul (2 * (bin 1 0 0 0 1)) (bin 1 1))))
21. (equal (2 * (add (2 * (bin 1 0 0 1)) (bin 1 0 0 1 0 1))) (add (2 * (bin 1 0 0)) (mul (2 * (bin 1 0 0 0 1)) (bin 1 1))))
22. (equal (2 * (add (2 * (bin 1 0 0 1)) (1 + 2 * (bin 1 0 0 1 0)))) (add (2 * (bin 1 0 0)) (mul (2 * (bin 1 0 0 0 1)) (bin 1 1))))
23. (equal (2 * (1 + 2 * (add (bin 1 0 0 1) (bin 1 0 0 1 0)))) (add (2 * (bin 1 0 0)) (mul (2 * (bin 1 0 0 0 1)) (bin 1 1))))
24. (equal (2 * (1 + 2 * (add (bin 1 0 0 1) (bin 1 0 0 1 0)))) (add (2 * (2 * (bin 1 0))) (mul (2 * (bin 1 0 0 0 1)) (bin 1 1))))
25. (equal (2 * (1 + 2 * (add (bin 1 0 0 1) (bin 1 0 0 1 0)))) (add (2 * (2 * (bin 1 0))) (mul (2 * (bin 1 0 0 0 1)) (1 + 2 * (bin 1)))))
26. (equal (2 * (1 + 2 * (add (bin 1 0 0 1) (bin 1 0 0 1 0)))) (add (2 * (2 * (bin 1 0))) (add (2 * (bin 1 0 0 0 1)) (2 * (mul (2 * (bin 1 0 0 0 1)) (bin 1))))))
27. (equal (2 * (1 + 2 * (add (bin 1 0 0 1) (bin 1 0 0 1 0)))) (add (2 * (2 * (bin 1 0))) (2 * (add (bin 1 0 0 0 1) (mul (2 * (bin 1 0 0 0 1)) (bin 1))))))
28. (equal (2 * (1 + 2 * (add (bin 1 0 0 1) (bin 1 0 0 1 0)))) (2 * (add (2 * (bin 1 0)) (add (bin 1 0 0 0 1) (mul (2 * (bin 1 0 0 0 1)) (bin 1))))))
29. (equal (1 + 2 * (add (bin 1 0 0 1) (bin 1 0 0 1 0))) (add (2 * (bin 1 0)) (add (bin 1 0 0 0 1) (mul (2 * (bin 1 0 0 0 1)) (bin 1)))))
30. (equal (1 + 2 * (add (1 + 2 * (bin 1 0 0)) (bin 1 0 0 1 0))) (add (2 * (bin 1 0)) (add (bin 1 0 0 0 1) (mul (2 * (bin 1 0 0 0 1)) (bin 1)))))
31. (equal (1 + 2 * (add (1 + 2 * (bin 1 0 0)) (2 * (bin 1 0 0 1)))) (add (2 * (bin 1 0)) (add (bin 1 0 0 0 1) (mul (2 * (bin 1 0 0 0 1)) (bin 1)))))
32. (equal (1 + 2 * (1 + 2 * (add (bin 1 0 0) (bin 1 0 0 1)))) (add (2 * (bin 1 0)) (add (bin 1 0 0 0 1) (mul (2 * (bin 1 0 0 0 1)) (bin 1)))))
33. (equal (1 + 2 * (1 + 2 * (add (bin 1 0 0) (bin 1 0 0 1)))) (add (2 * (2 * (bin 1))) (add (bin 1 0 0 0 1) (mul (2 * (bin 1 0 0 0 1)) (bin 1)))))
34. (equal (1 + 2 * (1 + 2 * (add (bin 1 0 0) (bin 1 0 0 1)))) (add (2 * (2 * (bin 1))) (add (1 + 2 * (bin 1 0 0 0)) (mul (2 * (bin 1 0 0 0 1)) (bin 1)))))
35. (equal (1 + 2 * (1 + 2 * (add (2 * (bin 1 0)) (bin 1 0 0 1)))) (add (2 * (2 * (bin 1))) (add (1 + 2 * (bin 1 0 0 0)) (mul (2 * (bin 1 0 0 0 1)) (bin 1)))))
36. (equal (1 + 2 * (1 + 2 * (add (2 * (bin 1 0)) (1 + 2 * (bin 1 0 0))))) (add (2 * (2 * (bin 1))) (add (1 + 2 * (bin 1 0 0 0)) (mul (2 * (bin 1 0 0 0 1)) (bin 1)))))
37. (equal (1 + 2 * (1 + 2 * (1 + 2 * (add (bin 1 0) (bin 1 0 0))))) (add (2 * (2 * (bin 1))) (add (1 + 2 * (bin 1 0 0 0)) (mul (2 * (bin 1 0 0 0 1)) (bin 1)))))
38. (equal (1 + 2 * (1 + 2 * (1 + 2 * (add (bin 1 0) (bin 1 0 0))))) (add (2 * (2 * (1 + 2 * (bin)))) (add (1 + 2 * (bin 1 0 0 0)) (mul (2 * (bin 1 0 0 0 1)) (bin 1)))))
39. (equal (1 + 2 * (1 + 2 * (1 + 2 * (add (bin 1 0) (bin 1 0 0))))) (add (2 * (2 * (1 + 2 * (bin)))) (add (1 + 2 * (2 * (bin 1 0 0))) (mul (2 * (bin 1 0 0 0 1)) (bin 1)))))
40. (equal (1 + 2 * (1 + 2 * (1 + 2 * (add (bin 1 0) (bin 1 0 0))))) (add (2 * (2 * (1 + 2 * (bin)))) (add (1 + 2 * (2 * (bin 1 0 0))) (mul (2 * (bin 1 0 0 0 1)) (1 + 2 * (bin))))))
41. (equal (1 + 2 * (1 + 2 * (1 + 2 * (add (bin 1 0) (bin 1 0 0))))) (add (2 * (2 * (1 + 2 * (bin)))) (add (1 + 2 * (2 * (bin 1 0 0))) (add (2 * (bin 1 0 0 0 1)) (2 * (mul (2 * (bin 1 0 0 0 1)) (bin)))))))
42. (equal (1 + 2 * (1 + 2 * (1 + 2 * (add (bin 1 0) (bin 1 0 0))))) (add (2 * (2 * (1 + 2 * (bin)))) (add (1 + 2 * (2 * (bin 1 0 0))) (2 * (add (bin 1 0 0 0 1) (mul (2 * (bin 1 0 0 0 1)) (bin)))))))
43. (equal (1 + 2 * (1 + 2 * (1 + 2 * (add (bin 1 0) (bin 1 0 0))))) (add (2 * (2 * (1 + 2 * (bin)))) (1 + 2 * (add (2 * (bin 1 0 0)) (add (bin 1 0 0 0 1) (mul (2 * (bin 1 0 0 0 1)) (bin)))))))
44. (equal (1 + 2 * (1 + 2 * (1 + 2 * (add (bin 1 0) (bin 1 0 0))))) (1 + 2 * (add (2 * (1 + 2 * (bin))) (add (2 * (bin 1 0 0)) (add (bin 1 0 0 0 1) (mul (2 * (bin 1 0 0 0 1)) (bin)))))))
45. (equal (1 + 2 * (1 + 2 * (add (bin 1 0) (bin 1 0 0)))) (add (2 * (1 + 2 * (bin))) (add (2 * (bin 1 0 0)) (add (bin 1 0 0 0 1) (mul (2 * (bin 1 0 0 0 1)) (bin))))))
46. (equal (1 + 2 * (1 + 2 * (add (2 * (bin 1)) (bin 1 0 0)))) (add (2 * (1 + 2 * (bin))) (add (2 * (bin 1 0 0)) (add (bin 1 0 0 0 1) (mul (2 * (bin 1 0 0 0 1)) (bin))))))
47. (equal (1 + 2 * (1 + 2 * (add (2 * (bin 1)) (2 * (bin 1 0))))) (add (2 * (1 + 2 * (bin))) (add (2 * (bin 1 0 0)) (add (bin 1 0 0 0 1) (mul (2 * (bin 1 0 0 0 1)) (bin))))))
48. (equal (1 + 2 * (1 + 2 * (2 * (add (bin 1) (bin 1 0))))) (add (2 * (1 + 2 * (bin))) (add (2 * (bin 1 0 0)) (add (bin 1 0 0 0 1) (mul (2 * (bin 1 0 0 0 1)) (bin))))))
49. (equal (1 + 2 * (1 + 2 * (2 * (add (bin 1) (bin 1 0))))) (add (2 * (1 + 2 * 0)) (add (2 * (bin 1 0 0)) (add (bin 1 0 0 0 1) (mul (2 * (bin 1 0 0 0 1)) (bin))))))
50. (equal (1 + 2 * (1 + 2 * (2 * (add (bin 1) (bin 1 0))))) (add (2 * (1 + 2 * 0)) (add (2 * (2 * (bin 1 0))) (add (bin 1 0 0 0 1) (mul (2 * (bin 1 0 0 0 1)) (bin))))))
51. (equal (1 + 2 * (1 + 2 * (2 * (add (bin 1) (bin 1 0))))) (add (2 * (1 + 2 * 0)) (add (2 * (2 * (bin 1 0))) (add (1 + 2 * (bin 1 0 0 0)) (mul (2 * (bin 1 0 0 0 1)) (bin))))))
52. (equal (1 + 2 * (1 + 2 * (2 * (add (1 + 2 * (bin)) (bin 1 0))))) (add (2 * (1 + 2 * 0)) (add (2 * (2 * (bin 1 0))) (add (1 + 2 * (bin 1 0 0 0)) (mul (2 * (bin 1 0 0 0 1)) (bin))))))
53. (equal (1 + 2 * (1 + 2 * (2 * (add (1 + 2 * (bin)) (2 * (bin 1)))))) (add (2 * (1 + 2 * 0)) (add (2 * (2 * (bin 1 0))) (add (1 + 2 * (bin 1 0 0 0)) (mul (2 * (bin 1 0 0 0 1)) (bin))))))
54. (equal (1 + 2 * (1 + 2 * (2 * (1 + 2 * (add (bin) (bin 1)))))) (add (2 * (1 + 2 * 0)) (add (2 * (2 * (bin 1 0))) (add (1 + 2 * (bin 1 0 0 0)) (mul (2 * (bin 1 0 0 0 1)) (bin))))))
55. (equal (1 + 2 * (1 + 2 * (2 * (1 + 2 * (add (bin) (bin 1)))))) (add (2 * (1 + 2 * 0)) (add (2 * (2 * (2 * (bin 1)))) (add (1 + 2 * (bin 1 0 0 0)) (mul (2 * (bin 1 0 0 0 1)) (bin))))))
56. (equal (1 + 2 * (1 + 2 * (2 * (1 + 2 * (add (bin) (bin 1)))))) (add (2 * (1 + 2 * 0)) (add (2 * (2 * (2 * (bin 1)))) (add (1 + 2 * (2 * (bin 1 0 0))) (mul (2 * (bin 1 0 0 0 1)) (bin))))))
57. (equal (1 + 2 * (1 + 2 * (2 * (1 + 2 * (add (bin) (bin 1)))))) (add (2 * (1 + 2 * 0)) (add (2 * (2 * (2 * (bin 1)))) (add (1 + 2 * (2 * (bin 1 0 0))) (mul (2 * (bin 1 0 0 0 1)) 0)))))
58. (equal (1 + 2 * (1 + 2 * (2 * (1 + 2 * (add (bin) (bin 1)))))) (add (2 * (1 + 2 * 0)) (add (2 * (2 * (2 * (bin 1)))) (add (1 + 2 * (2 * (bin 1 0 0))) 0))))
59. (equal (1 + 2 * (1 + 2 * (2 * (1 + 2 * (add (bin) (bin 1)))))) (add (2 * (1 + 2 * 0)) (add (2 * (2 * (2 * (bin 1)))) (1 + 2 * (2 * (bin 1 0 0))))))
60. (equal (1 + 2 * (1 + 2 * (2 * (1 + 2 * (add (bin) (bin 1)))))) (add (2 * (1 + 2 * 0)) (1 + 2 * (add (2 * (2 * (bin 1))) (2 * (bin 1 0 0))))))
61. (equal (1 + 2 * (1 + 2 * (2 * (1 + 2 * (add (bin) (bin 1)))))) (1 + 2 * (add (1 + 2 * 0) (add (2 * (2 * (bin 1))) (2 * (bin 1 0 0))))))
62. (equal (1 + 2 * (2 * (1 + 2 * (add (bin) (bin 1))))) (add (1 + 2 * 0) (add (2 * (2 * (bin 1))) (2 * (bin 1 0 0)))))
63. (equal (1 + 2 * (2 * (1 + 2 * (add (bin) (bin 1))))) (add (1 + 2 * 0) (2 * (add (2 * (bin 1)) (bin 1 0 0)))))
64. (equal (1 + 2 * (2 * (1 + 2 * (add (bin) (bin 1))))) (1 + 2 * (add 0 (add (2 * (bin 1)) (bin 1 0 0)))))
65. (equal (2 * (1 + 2 * (add (bin) (bin 1)))) (add 0 (add (2 * (bin 1)) (bin 1 0 0))))
66. (equal (2 * (1 + 2 * (add (bin) (bin 1)))) (add 0 (add (2 * (bin 1)) (2 * (bin 1 0)))))
67. (equal (2 * (1 + 2 * (add (bin) (bin 1)))) (add 0 (2 * (add (bin 1) (bin 1 0)))))
68. (equal (2 * (1 + 2 * (add (bin) (bin 1)))) (2 * (add (bin 1) (bin 1 0))))
69. (equal (1 + 2 * (add (bin) (bin 1))) (add (bin 1) (bin 1 0)))
70. (equal (1 + 2 * (add (bin) (bin 1))) (add (1 + 2 * (bin)) (bin 1 0)))
71. (equal (1 + 2 * (add (bin) (bin 1))) (add (1 + 2 * (bin)) (2 * (bin 1))))
72. (equal (1 + 2 * (add (bin) (bin 1))) (1 + 2 * (add (bin) (bin 1))))
73. (equal (add (bin) (bin 1)) (add (bin) (bin 1)))
74. (equal (add 0 (bin 1)) (add (bin) (bin 1)))
75. (equal (add 0 (1 + 2 * (bin))) (add (bin) (bin 1)))
76. (equal (1 + 2 * (bin)) (add (bin) (bin 1)))
77. (equal (1 + 2 * 0) (add (bin) (bin 1)))
78. (equal (1 + 2 * 0) (add 0 (bin 1)))
79. (equal (1 + 2 * 0) (add 0 (1 + 2 * (bin))))
80. (equal (1 + 2 * 0) (1 + 2 * (bin)))
81. (equal 0 (bin))
82. (equal 0 0)
83. true
true
Fibonacci Sequence
We can use the existing definitions in our natural numbers file to define the Fibonacci sequence:
/* ... natural number definitions (addition, equality, etc.)
ellided for brevity ... */
(def (fib 0)
0)
(def (fib (S 0))
(S 0))
(def (fib (S (S $n)))
(add (fib $n)
(fib (S $n))))
Let’s check that fib 9 = 34 = 4 + 3 * 10
:
(equal // 34 = fib 9
(fib 9)
// 34 = 4 + 3 * (1 + 9)
(add 4 (mul 3 (add 1 9))))
Computing this answer takes 639 steps:
$ cabal run utrl -- --defs test/programs/peano-fibonacci.defs --input test/programs/peano-fibonacci.input --trace | head
0. (equal (fib 9) (add 4 (mul 3 (add 1 9))))
1. (equal (fib (S 8)) (add 4 (mul 3 (add 1 9))))
2. (equal (fib (S 8)) (add (S 3) (mul 3 (add 1 9))))
3. (equal (fib (S (S 7))) (add (S 3) (mul 3 (add 1 9))))
4. (equal (add (fib 7) (fib (S 7))) (add (S 3) (mul 3 (add 1 9))))
5. (equal (add (fib (S 6)) (fib (S 7))) (add (S 3) (mul 3 (add 1 9))))
6. (equal (add (fib (S 6)) (fib (S 7))) (add (S (S 2)) (mul 3 (add 1 9))))
7. (equal (add (fib (S 6)) (fib (S 7))) (add (S (S 2)) (mul (S 2) (add 1 9))))
8. (equal (add (fib (S (S 5))) (fib (S 7))) (add (S (S 2)) (mul (S 2) (add 1 9))))
9. (equal (add (add (fib 5) (fib (S 5))) (fib (S 7))) (add (S (S 2)) (mul (S 2) (add 1 9))))
$ cabal run utrl -- --defs test/programs/peano-fibonacci.defs --input test/programs/peano-fibonacci.input --trace | tail
631. (equal (S (S (S (S (S 0))))) (add (S (S (S (S 0)))) (S 0)))
632. (equal (S (S (S (S (S 0))))) (S (add (S (S (S (S 0)))) 0)))
633. (equal (S (S (S (S 0)))) (add (S (S (S (S 0)))) 0))
634. (equal (S (S (S (S 0)))) (S (S (S (S 0)))))
635. (equal (S (S (S 0))) (S (S (S 0))))
636. (equal (S (S 0)) (S (S 0)))
637. (equal (S 0) (S 0))
638. (equal 0 0)
639. true
true
Functional list processing
We had to go through great lengths to define natural numbers and binary numbers. It might be surprising then to learn that list processing functions (head
, take
, repeat
, map
, foldr
, zip-with
) are actually quite simple to implement:
/* ... natural number definitions (addition, equality, etc.)
ellided for brevity ... */
(def (head (cons $a $d))
(some $a))
(def (head nil)
none)
(def (take 0 $c)
nil)
(def (take (S $n) (cons $a $d))
(cons $a (take $n $d)))
(def (repeat $x)
(cons $x (repeat $x)))
(def (map $f nil)
nil)
(def (map $f (cons $a $d))
(cons ($f $a)
(map $f $d)))
(def (foldr $f $z nil)
$z)
(def (foldr $f $z (cons $a $d))
($f $a (foldr $f $z $d)))
(def (zip-with $f (cons $a1 $d1) nil)
nil)
(def (zip-with $f nil (cons $a2 $d2))
nil)
(def (zip-with $f (cons $a1 $d1) (cons $a2 $d2))
(cons ($f $a1 $a2)
(zip-with $f $d1 $d2)))
Let’s use this to prove that 4 * (2 + 3)
is equivalent to summing the first four elements of an infinite list produced by summing together the infinite lists [2, 2, 2, ...]
and [3, 3, 3, ...]
:
(equal
(mul 4 (add 2 3))
(foldr add 0
(take 4
(zip-with add
(repeat 2)
(repeat 3)))))
It takes 189 steps to compute the answer:
$ cabal run utrl -- --defs test/programs/list-processing.defs --input test/programs/list-processing.input --trace | head
0. (equal (mul 4 (add 2 3)) (foldr add 0 (take 4 (zip-with add (repeat 2) (repeat 3)))))
1. (equal (mul (S 3) (add 2 3)) (foldr add 0 (take 4 (zip-with add (repeat 2) (repeat 3)))))
2. (equal (mul (S (S 2)) (add 2 3)) (foldr add 0 (take 4 (zip-with add (repeat 2) (repeat 3)))))
3. (equal (mul (S (S 2)) (add (S 1) 3)) (foldr add 0 (take 4 (zip-with add (repeat 2) (repeat 3)))))
4. (equal (mul (S (S 2)) (add (S 1) (S 2))) (foldr add 0 (take 4 (zip-with add (repeat 2) (repeat 3)))))
5. (equal (mul (S (S 2)) (S (add (S 1) 2))) (foldr add 0 (take 4 (zip-with add (repeat 2) (repeat 3)))))
6. (equal (add (S (S 2)) (mul (S (S 2)) (add (S 1) 2))) (foldr add 0 (take 4 (zip-with add (repeat 2) (repeat 3)))))
7. (equal (add (S (S 2)) (mul (S (S 2)) (add (S 1) 2))) (foldr add 0 (take (S 3) (zip-with add (repeat 2) (repeat 3)))))
8. (equal (add (S (S (S 1))) (mul (S (S 2)) (add (S 1) 2))) (foldr add 0 (take (S 3) (zip-with add (repeat 2) (repeat 3)))))
9. (equal (add (S (S (S 1))) (mul (S (S 2)) (add (S 1) (S 1)))) (foldr add 0 (take (S 3) (zip-with add (repeat 2) (repeat 3)))))
$ cabal run utrl -- --defs test/programs/list-processing.defs --input test/programs/list-processing.input --trace | tail
181. (equal (S (add (S (S (S (S 0)))) 0)) (add (S (S (S (S (S 0))))) 0))
182. (equal (S (add (S (S (S (S 0)))) 0)) (S (S (S (S (S 0))))))
183. (equal (add (S (S (S (S 0)))) 0) (S (S (S (S 0)))))
184. (equal (S (S (S (S 0)))) (S (S (S (S 0)))))
185. (equal (S (S (S 0))) (S (S (S 0))))
186. (equal (S (S 0)) (S (S 0)))
187. (equal (S 0) (S 0))
188. (equal 0 0)
189. true
true
Installation
To run utrl
programs you’ll need the interpreter, which can be
downloaded using git
:
git clone https://github.com/MichaelMMacLeod/utrl.git
cd utrl
I use Nix to manage dependencies and the build process. Nix version 2.8 (released on 2022-04-19) or greater is required. To build and test the interpreter use:
nix --extra-experimental-features 'nix-command flakes' build
This installs the interpreter to ./result/bin/utrl.
To run it, use:
./result/bin/utrl --help
A development environment containing VSCodium with Haskell/Nix extensions pre-installed, along with the Haskell build tool Cabal and compiler GHC can be entered using:
nix --extra-experimental-features 'nix-command flakes' develop
These are some available commands inside the development environment:
codium # open VSCodium IDE
cabal build # build interpreter
cabal test # run tests
cabal run utrl -- --help # run interpreter
It’s a good idea when developing to use cabal build instead of nix build; the former will selectively build only the parts of the code that have changed, while the latter will always perform a full-rebuild.
Implementation Details
Compilation Process
Let’s consider the following simple program and walk through how it is compiled:
(def (copy $xs ..) ($xs ..))
The file is parsed via parser combinators written in megaparsec into a list of values of type
Ast0
, one for each definition in the program being compiled. AST is short for Abstract Syntax Tree, and the zero on the end communicates that this is the first such tree. As we will see, there is more than just one type of abstract syntax tree.-- File Ast0.hs: data Ast = Symbol Text | Compound [Ast]
The definition of
Ast0
closely resembles the source program. I chose the word “Compound” instead of “List” because there are too many things named “List” already in Haskell and I didn’t want to run into name-clashes.Each
Ast0
is transformed into two differentAst0
values: one for the pattern, and one for the constructor. In our example this means we separate the definition into(copy $xs ..)
, the pattern part, and($xs ..)
, the constructor part.Every pattern and constructor
Ast0
values are transformed intoAst1
values, which make ellipses explicit.-- File Ast1.hs data Ast = Symbol Text | Compound [Ast] | Ellipses Ast
This change might not seem like much, but it makes the subsequent steps much easier. Having a separate variant for ellipses means we don’t have to check for the presence of ellipses; we simply have ellipses, or we don’t.
Here is a textual representation of the generated
Ast1
values from our example:$ cabal run utrl -- --defs copy.defs --emit '1' error[E013]: [--emit] generated code for stage 1 (pattern part) copy.defs:1:6 | 1 | (def (copy $xs ..) ($xs ..)) | ^ (copy {Ellipses $xs}) error[E013]: [--emit] generated code for stage 1 (constructor part) copy.defs:1:20 | 1 | (def (copy $xs ..) ($xs ..)) | ^ ({Ellipses $xs})
All
Ast1
pattern values are transformed intoAstP0
values, a type which makes explicit the terms preceding, at, and following each ellipsis. TheP
inAstP0
stands for “pattern”. This is where the compilation process for patterns end; theAstP0
values are used directly in the interpreter as-is. Future work could be done to compile these further.-- File AstP0.hs data Ast = Symbol Text | CompoundWithoutEllipses [Ast] | CompoundWithEllipses AstP0CompoundWtihEllipses data AstP0CompoundWtihEllipses = AstP0CompoundWtihEllipses before :: [Ast], { ellipses :: Ast, after :: [Ast] }
To radically simplify the implementation of the interpreter, I made the choice to limit pattern terms to each having one ellipsis per list. This is why it is possible to talk about the parts “before” and “after” an ellipsis, given that there is never going to be more than one of them in each list. This same design choice is made in other similar systems, such as Racket’s
syntax-case
.Here is a textual representation of the generated
AstP0
value from our example. Notice thatcopy
is in the “before” field, as it precedes$x ..
, and there is nothing in the “after” field, as no terms follow the ellipses.$ cabal run utrl -- --defs copy.defs --emit 'P0' error[E013]: [--emit] generated code for stage P0 copy.defs:1:6 | 1 | (def (copy $xs ..) ($xs ..)) | ^ {Before = [copy], Ellipses = $xs, after = []}
All
Ast1
constructor values are transformed intoAstC0
values, a type which makes explicit the location in the pattern that each variable binds to. TheC
inAstC0
stands for “constructor”.-- File AstC0.hs data Ast = Symbol Text | Compound [Ast] | Ellipses Ast | Variable Index Text type Index = [IndexElement] data IndexElement = ZeroPlus Int | LenMinus Int | Between AstC0Between data AstC0Between = AstC0Between { zeroPlus :: Int, lenMinus :: Int }
The only difference between
Ast1
andAstC0
is the introduction of theIndex
type on theVariable
variant. This is a special kind of index that requires some explanation.Programming languages across the board define array access syntax such as
xs[n]
to access then
th element of an array. These are commonly zero-based indices, meaning thatxs[0]
retrieves the first element ofxs
. Some programming languages such as Python also define array access syntax that starts from the end. In Python,xs[-1]
accesses the last element ofxs
, and is essentially a simplified way of writingxs[xs.length-1]
, i.e., the-1
represents the length of the array, minus one.In patterns that do not have any ellipses it is sufficient to use zero-based indexing. For example, in the pattern
(copy $x)
we can denote the location ofcopy
as zero and the location of$x
as one. This is represented by theZeroPlus
variant of theIndexElement
type.One difference from syntax such as
xs[n]
is that we are indexing into trees, not arrays. This means we may need more than one index if our target is located anywhere other than at the surface of the tree. For example, the symbolx
is at tree-index[1,0]
in the following term:(a (x b) c)
. This is whyIndex
is defined as a list ofIndexElement
s, and not simply as a singleIndexElement
.Some patterns we may encounter include symbols or variables that occur after an ellipsis. For example, the variable
$y
occurs after an ellipsis in the pattern(copy $xs .. $y)
. There is no way to use zero-based indexing to refer to the location of$y
given that it is unknown at compile time how many elements will bind to$xs
. What we can do instead is use Python-like negative array indexing to denote the location of$y
as the length of the input matched against the pattern minus one. This gives us a simple way to locate$y
given that the length of the input can be readily computed at runtime. This is represented by theLenMinus
variant of theIndexElement
type.When faced with compiling constructors with variables bound in patterns such as
(copy $xs ..)
, we must use a combination of zero-based and Python-like negative array indexing. In the list(copy $xs ..)
, while the symbolcopy
is at index zero, the variable$xs
matches all indices starting at one and ending at the length of the list (exclusive). We can represent this by saying that$xs
has two indices: one, and length minus zero (exclusive). These describe the range of indices that$xs
will bind to at runtime, and is represented by theBetween
variant of theIndexElement
type.You may wonder how we refer to variables that occur between two ellipses, such as
$y
in($x .. $y $z ..)
. The answer is that we disallow such patterns altogether. This makes our lives considerably easier, and is the same trade-off made by other similar systems. If this rule is violated you will receive a compilation error:$ cabal run utrl -- --defs ./test/programs/error-pattern-3-ellipses.defs error[E005]: too many ellipses in single term of pattern ./test/programs/error-pattern-3-ellipses.defs:1:6 | 1 | (def ($x .. $y .. $z ..) constructor) | ^^^^^^^^^^^^^^^^^^^ this term has more than one ellipsis ./test/programs/error-pattern-3-ellipses.defs:1:16 | 1 | (def ($x .. $y .. $z ..) constructor) | ^^ ellipsis #2 ./test/programs/error-pattern-3-ellipses.defs:1:22 | 1 | (def ($x .. $y .. $z ..) constructor) | ^^ ellipsis #3 help: each term in a definitions's pattern may have at most one ellipsis
Here is a textual representation of the
AstC0
value in our running example:$ cabal run utrl -- --defs copy.defs --emit 'C0' error[E013]: [--emit] generated code for stage C0 copy.defs:1:20 | 1 | (def (copy $xs ..) ($xs ..)) | ^ ({Ellipses {$xs in input[1..len]}})
The dots in the index represent the
Between
variant, withzeroPlus = 1
andlenMinus = 0
. Because$xs
is located at the surface level of the tree, we only need oneIndexElement
to properly identify it.All
AstC0
values are transformed intoAstC1
values, a type which replaces ellipses and variables with a series offor
-loops and assignments. Here is a textual representation of our example’sAstC1
code:$ cabal run utrl -- --defs copy.defs --emit 'C1' error[E013]: [--emit] generated code for stage C1 copy.defs:1:20 | 1 | (def (copy $xs ..) ($xs ..)) | ^ ({let $1 = input; {for $0 in $1[1..len]; $0}})
What were once ellipses are now
for
loops (actually, foreach loops), and surface-level variables such as$xs
have been replaced with generated variables with numeric names such as$0
and$1
. These new variables are assigned to each other or to the input supplied to a definition (denoted byinput
). In our example, the generated variable$1
is assigned to the input, and then$0
is assigned, in afor
loop, to the terms at indices1..len(gth of $1)
(exclusive). The body of thefor
loop is$0
, which has the effect of inserting each term bound to$0
into the surrounding context. Because the entire{let $1 = input; ... $0 }
is surrounded in parentheses, this “context” is a list, and thus the result is to fill the list with all the terms matched to our original variable$xs
.The code for this stage, due to some issues I ran into with Haskell’s record types, is split across multiple files and is rather verbose, so I will not be including it here. It can be found at this link if you are curious.
All
AstC1
values are transformed intoAstC2
values, a custom assembly language that at runtime will be used as-is in the interpreter. Thefor
-loops found inAstC1
are replaced with conditional jumps and assignment statements. Here is a textual representation of our example’sAstC2
code:$ cabal run utrl -- --defs copy.defs --emit 'C2' error[E013]: [--emit] generated code for stage C2 copy.defs:1:20 | 1 | (def (copy $xs ..) ($xs ..)) | ^ 0. $2 = 0 | 1. $1 = input | 2. $3 = 1 | 3. $4 = $1.length - 0 | 4. jump 9 if True | 5. $0 = $1[$3] | 6. push $0 | 7. $3 = $3 + 1 | 8. $2 = $2 + 1 | 9. jump 5 if $3 < $4 | 10. build $2
There are four different kinds of assembly instructions.
Assignment instructions of the form
<var> = <expr>
. These evaluate the right hand side expression and assign it to a variable. The interpreter contains an array whose elements store the values of variables, so the statement$2 = 0
assigns the integer zero to the third element of the variable-array.Conditional jump instructions of the form
jump <line-number> if <condition>
. These evaluate the condition expression and, if true, instructs the interpreter to continue evaluation at the given line number. If the condition is false the interpreter treats the jump instruction as a no-op and continues to the next line.Push instructions of the form
push <expr>
. These evaluate the given expression and store its result at the end of a dynamically-expanding array. When the interpreter begins evaluatingAstC2
code it creates this (initially empty) array and uses the same one for every push instruction. There is no way to explicitly refer to this array, and there are no expressions that can access elements from it. The only way to interact with it is through pushing elements onto it viapush <expr>
and, as explained below, viabuild <expr>
.Build instructions of the form
build <expr>
. The expression is evaluated; its result is guaranteed to be an integer due to howAstC2
code is generated. The build instruction then takes this integer and removes that number of elements from the end of an array, the same array pushed to viapush
instructions. Once the elements are removed they are collected into a list which is then pushed back onto the end of the array.
The result of running a program is the first element on the stack (as described in (3) and (4)) when the interpreter finishes processing every instruction. It is possible to write programs that run forever, in which case there is no result. For example, the following program when receiving the input
LOOP
will only halt when the interpreter process runs out of available memory to allocate:(def LOOP (LOOP LOOP))
Here’s what happens when you try to run this program:
$ cabal run utrl -- --defs loop.defs --input input.txt --trace 0. LOOP 1. (LOOP LOOP) 2. ((LOOP LOOP) LOOP) 3. ((LOOP LOOP) (LOOP LOOP)) 4. (((LOOP LOOP) LOOP) (LOOP LOOP)) 5. (((LOOP LOOP) (LOOP LOOP)) (LOOP LOOP)) 6. (((LOOP LOOP) (LOOP LOOP)) ((LOOP LOOP) LOOP)) 7. (((LOOP LOOP) (LOOP LOOP)) ((LOOP LOOP) (LOOP LOOP))) 8. ((((LOOP LOOP) LOOP) (LOOP LOOP)) ((LOOP LOOP) (LOOP LOOP))) 9. ((((LOOP LOOP) (LOOP LOOP)) (LOOP LOOP)) ((LOOP LOOP) (LOOP LOOP))) (... this goes on almost forever ...)
Recursion Schemes
A recursion
scheme is a
function that implements a common recursive pattern. This project
makes frequent use of these functions. The most commonly used
recursion scheme in this project is cata
(I count 37 uses), followed
para
(5 uses), and lastly ana
(1 use).
The recursion scheme cata
implements a bottom-up traversal of a data
structure. I use cata
to do the following:
- Return the set of all variables in an AST.
- Attach line numbers to
AstC2
statements (a zip-like operation). - Implement functions which convert various ASTs into
Text
. See Display.hs for the code. - Implement most compilation steps, e.g.,
Ast0 -> Ast1
andAst1 -> AstC0
. See Compile.hs for the code. - Implement most analysis steps, e.g., the step that checks that there are no undefined variables and the step that checks that each variable is not defined more than once. See Analyze.hs for the code.
The recursion scheme para
is like cata
except that it keeps extra
information around during the bottom-up traversal about the data that
was previously processed. I use para
to do the following:
- Compile
AstC1
toAstC2
. We need the extra information here when compiling lists (which may contain loops) to count the number of loops inside each list. That number is used to help determine how big each list (at runtime) will be, so we can accurately allocate space for each element. If we usedcata
, while we would have access to theAstC2
assembly statements corresponding to each loop, we would not have access to the originalAstC1
which produced those instructions. This means that we would be unable to easily tell if the assembly statements were compiled from loops or not, and thus could not easily count the number of loops. - Implement the
analyzeEllipsesAppliedToSymbols
(See Analyze.hs) analysis step, which checks for code like(x ..)
(the correct code would be($x ..)
). We needpara
here to get access to the symbols inside the ellipses variant ofAst1
. Withoutpara
, while we would have access to any errors produced from terms inside the ellipses, we wouldn’t have the originalAst1
that produced those errors. - Implement certain other compilation and analysis steps.
The recursion scheme ana
is, in a sense, the opposite of cata
:
instead of consuming a data structure from the bottom-up, ana
produces a data structure from the top down. The one use of ana
in
this project is to implement the function iterateMaybe :: (b -> Maybe b) -> b -> [b]
which creates a list by repeatedly applying a function
to a seed value until the function returns Nothing
. This function,
iterateMaybe
, is what implements the main loop of the interpreter.
We pass iterateMaybe
a transition function (which applies one
definition at a time), and receive a function which applies
definitions over and over again until no definition matches. The list
that this produces is the one that can be seen via passing the
--trace
command line flag. For example, consider this program:
// File nat.defs
// Addition
(def ($n + 0) $n)
(def ($n + (S $m)) (S ($n + $m)))
// Equality
(def (0 == 0) true)
(def ((S $n) == 0) false)
(def (0 == (S $m)) false)
(def ((S $n) == (S $m)) ($n == $m))
// Peano natural numbers 1-5
(def 1 (S 0))
(def 2 (S 1))
(def 3 (S 2))
(def 4 (S 3))
(def 5 (S 4))
// File nat.input
(5 == (2 + 3)) // this should evaluate to 'true'
The list iterateMaybe
produces using these definitions and input is
20 elements long, since it takes 20 applications of definitions to
reduce the input to its final result:
$ cabal run utrl -- --defs nat.defs --input nat.input --trace
0. (5 == (2 + 3))
1. ((S 4) == (2 + 3))
2. ((S (S 3)) == (2 + 3))
3. ((S (S 3)) == ((S 1) + 3))
4. ((S (S 3)) == ((S 1) + (S 2)))
5. ((S (S 3)) == (S ((S 1) + 2)))
6. ((S 3) == ((S 1) + 2))
7. ((S (S 2)) == ((S 1) + 2))
8. ((S (S 2)) == ((S 1) + (S 1)))
9. ((S (S 2)) == (S ((S 1) + 1)))
10. ((S 2) == ((S 1) + 1))
11. ((S (S 1)) == ((S 1) + 1))
12. ((S (S 1)) == ((S 1) + (S 0)))
13. ((S (S 1)) == (S ((S 1) + 0)))
14. ((S 1) == ((S 1) + 0))
15. ((S 1) == (S 1))
16. (1 == 1)
17. ((S 0) == 1)
18. ((S 0) == (S 0))
19. (0 == 0)
20. true
true
Error Code Index
I spent a good bit of time and energy ensuring the compiler produces helpful, good-looking error messages. Each error message contains a description of the error, a source location, at least one annotated code block, and possibly some ‘help’ text at the end. All the following errors are compile errors (i.e., they occur only at compile time). In fact, there are no runtime errors other than the possibility of reaching an out-of-memory state. These errors are diagnosed in the Analyze.hs file.
Of particular interest is E003, a diagnosis of ‘zipping together’ variables which are bound to (possibly) different numbers of terms. The reason this is interesting is because—as far as I know—this is the only language which detects this error when the rewrite-rule/macro is defined, instead of when it is used. For example, consider the following Rust code:
macro_rules! bad_list {
( ($( $x:expr ),*) ($( $y:expr ),*) ) => {
{
let mut temp_vec = Vec::<(i32, &'static str)>::new();
$(
temp_vec.push(($x, $y));
)*
temp_vec
}
};
}
pub fn main() {
bad_list!((1, 2, 3) ("a", "b", "c", "d", "e"));
}
This code defines a macro, bad_list
, which attempts to zip together
two input lists into a single vector. This, generally speaking, does
not work, as there is no guarantee that the two lists being zipped
together will be of the same length. Despite the fact that defining a
macro this way is almost certainly a bad idea, Rust is okay with it,
and will only signal an error when it is used incorrectly:
error: meta-variable `x` repeats 3 times, but `y` repeats 5 times
--> <source>:5:14
|
5 | $(
| ______________^
6 | | temp_vec.push(($x, $y));
7 | | )*
| |_____________^
The fact that the error message mentions the macro definition site instead of the macro use site suggests that this is fundamentally a problem with the macro definition as opposed to its use. It is strange, then, that Rust does not detect this error when the macro is defined, but only when it is used with lists of differing length.
Another language, Racket (an inspiration for this language, as well as Rust), behaves similarly:
#lang racket
(define-syntax (bad-list stx)
(syntax-case stx ()
[(_ (x ...) (y ...))
#'(list '(x . y) ...)]))
(bad-list (1 2 3) ("a" "b" "c" "d" "e"))
Just like in Rust, Racket is okay with this ill-formed macro, and will
only signal an error when bad-list
is used with lists of differing
lengths:
test.rkt:5:13: syntax: incompatible ellipsis match counts for template
at: ...
in: ((quote (x . y)) ...)
location...:
test.rkt:5:13
Again, we see that the error message mentions only the macro definition site, not its use site. Again, It is strange Racket does not detect this error when the macro is defined, but only when it is used with lists of differing length.
This language detects these kinds of problematic definitions before
they are ever used. For example, consider the equivalent bad-list
definition in utrl
:
(def (bad-list ($x ..) ($y ..))
(list ($x $y) ..))
The resulting error is simple, clean and easy to understand. It gets to the heart of the issue (two variables were matched under different ellipses but used with the same ellipsis) instead of complaining about “incompatible ellipsis match counts” or variables repeating differing amounts:
error[E003]: variables matched under different ellipses used with same ellipsis
./bad-list.defs:1:20
|
1 | (def (bad-list ($x ..) ($y ..))
| ^^ $x matched under this ellipsis
./bad-list.defs:1:28
|
1 | (def (bad-list ($x ..) ($y ..))
| ^^ $y matched under this ellipsis
./bad-list.defs:2:17
|
2 | (list ($x $y) ..))
| ^^ both used with this ellipsis
help: variables matched under different ellipses can't be used with the same ellipsis
E001
There are unbalanced parentheses or unbalanced block-comments (/*
and */
).
Example (missing right parenthesis)
(def A B
error[E001]: bad syntax
./test/programs/error-missing-right-paren.defs:1:9
|
1 | (def A B
| ^ unexpected end of input
| expecting '(' or ')'
E002
There is a definition with a variable used in the constructor with a different number of ellipses (..
) than it was matched with in the pattern.
Example (too few ellipses)
(def ($x ..) $x)
error[E002]: too few ellipses
./test/programs/error-ellipses-1-0.defs:1:7
|
1 | (def ($x ..) $x)
| ^^ matched with 1 ellipsis in pattern
./test/programs/error-ellipses-1-0.defs:1:14
|
1 | (def ($x ..) $x)
| ^^ used with 0 ellipses in constructor
help: variables must be used with the same number of ellipses they were matched with
Example (too many ellipses)
(def (flatten (list (list $x ..) ..))
(list $x .. .. ..))
error[E002]: too many ellipses
./test/programs/error-ellipses-2-3.defs:1:27
|
1 | (def (flatten (list (list $x ..) ..))
| ^^ matched with 2 ellipses in pattern
./test/programs/error-ellipses-2-3.defs:2:9
|
2 | (list $x .. .. ..))
| ^^ used with 3 ellipses in constructor
help: variables must be used with the same number of ellipses they were matched with
E003
There are two variables that are used with the same ellipsis, but not matched under the same ellipsis.
Example
(def (capture ($x ..) ($y ..))
(result ($x $y) ..))
error[E003]: variables matched under different ellipses used with same ellipsis
./test/programs/error-captures-1-level.defs:1:19
|
1 | (def (capture ($x ..) ($y ..))
| ^^ $x matched under this ellipsis
./test/programs/error-captures-1-level.defs:1:27
|
1 | (def (capture ($x ..) ($y ..))
| ^^ $y matched under this ellipsis
./test/programs/error-captures-1-level.defs:2:19
|
2 | (result ($x $y) ..))
| ^^ both used with this ellipsis
help: variables matched under different ellipses can't be used with the same ellipsis
Example (multiple ellipses)
(def (capture (($x ..) ..) (($y ..) ..))
(result ($x .. $y ..) ..))
error[E003]: variables matched under different ellipses used with same ellipsis
./test/programs/error-captures-2-level-inside.defs:1:24
|
1 | (def (capture (($x ..) ..) (($y ..) ..))
| ^^ $x matched under this ellipsis
./test/programs/error-captures-2-level-inside.defs:1:37
|
1 | (def (capture (($x ..) ..) (($y ..) ..))
| ^^ $y matched under this ellipsis
./test/programs/error-captures-2-level-inside.defs:2:25
|
2 | (result ($x .. $y ..) ..))
| ^^ both used with this ellipsis
help: variables matched under different ellipses can't be used with the same ellipsis
E004
An ellipsis follows a symbol. Ellipses may only follow variables or lists which contain one or more variables.
Example
(def (copy (list x ..))
(list x ..))
error[E004]: ellipsis following symbol
./test/programs/error-ellipsis-on-symbol.defs:1:18
|
1 | (def (copy (list x ..))
| ^ this symbol doesn't begin with a dollar-sign ('$'),
| so it is not considered a variable
help: perhaps you meant '$x'?
error[E004]: ellipsis following symbol
./test/programs/error-ellipsis-on-symbol.defs:2:9
|
2 | (list x ..))
| ^ this symbol doesn't begin with a dollar-sign ('$'),
| so it is not considered a variable
help: perhaps you meant '$x'?
E005
There is more than one ellipsis in a list of some definition’s pattern.
Example
(def ($x .. $y .. $z ..) constructor)
error[E005]: too many ellipses in single term of pattern
./test/programs/error-pattern-3-ellipses.defs:1:6
|
1 | (def ($x .. $y .. $z ..) constructor)
| ^^^^^^^^^^^^^^^^^^^ this term has more than one ellipsis
./test/programs/error-pattern-3-ellipses.defs:1:16
|
1 | (def ($x .. $y .. $z ..) constructor)
| ^^ ellipsis #2
./test/programs/error-pattern-3-ellipses.defs:1:22
|
1 | (def ($x .. $y .. $z ..) constructor)
| ^^ ellipsis #3
help: each term in a definitions's pattern may have at most one ellipsis
It is okay to have multiple ellipses in a definition’s pattern, but there cannot ever be more than one in each list in the pattern. The following is perfectly acceptable, as no single list contains more than one ellipsis:
(def (($x ..) ($y ..) $z ..) constructor)
// (___________________..) See how each
// (___..) list contains only
// (___..) a single ellipsis?
This rule does not apply to constructors, which may have as many ellipses per list as desired. For example, here is a valid definition that uses three ellipses in a single list in its constructor:
(def
// All the ellipses here must be in different lists,
// as this is the definition's pattern.
(flatten3 (list (list (list $x ..) ..) ..))
// All the ellipses here can be in the same list,
// as this is the definition's constructor.
(list $x .. .. ..))
This restriction is may be lifted in future versions of utrl
.
E006
The same variable occurs more than once in a definition’s pattern.
Example
(def (twice $x $x) constructor)
error[E006]: variable occurs more than once in pattern
./test/programs/error-var-pattern-use-2.defs:1:13
|
1 | (def (twice $x $x) constructor)
| ^^ use #1
./test/programs/error-var-pattern-use-2.defs:1:16
|
1 | (def (twice $x $x) constructor)
| ^^ use #2
help: a variable may occur at most once in a definition's pattern
E007
Two definitions exist which match the same term(s). A given input term may be matched by at most one definition. This prevents ambiguity about which definition to apply to a given input term.
Example (exactly the same pattern)
(def A Y)
(def A Z)
error[E007]: overlapping patterns
./test/programs/error-overlapping-patterns-simple.defs:1:6
|
1 | (def A Y)
| ^ this pattern may match the same term as ...
./test/programs/error-overlapping-patterns-simple.defs:2:6
|
2 | (def A Z)
| ^ ... this other pattern
help: patterns possibly matching the same term are not allowed
Example (overlapping via variables)
(def ($n 0) _)
(def (0 $m) _)
error[E007]: overlapping patterns
./test/programs/error-overlapping-patterns-variables.defs:1:6
|
1 | (def ($n 0) _)
| ^^^^^^ this pattern may match the same term as ...
./test/programs/error-overlapping-patterns-variables.defs:2:6
|
2 | (def (0 $m) _)
| ^^^^^^ ... this other pattern
help: patterns possibly matching the same term are not allowed
Example (overlapping via variables and ellipses)
(def (a b $x ..) _)
(def (a b $x .. c) _)
error[E007]: overlapping patterns
./test/programs/error-overlapping-patterns-multiple.defs:1:6
|
1 | (def (a b $x ..) _)
| ^^^^^^^^^^^ this pattern may match the same term as ...
./test/programs/error-overlapping-patterns-multiple.defs:2:6
|
2 | (def (a b $x .. c) _)
| ^^^^^^^^^^^^^ ... this other pattern
help: patterns possibly matching the same term are not allowed
E008
A term in either the pattern or constructor is followed by ellipses but contains no variables.
Example
(def pattern
((no vars here) ..))
error[E008]: no variables in term preceding ellipsis
./test/programs/error-no-vars-in-ellipsis-constructor.defs:2:19
|
2 | ((no vars here) ..))
| ^^ the ellipsis
./test/programs/error-no-vars-in-ellipsis-constructor.defs:2:4
|
2 | ((no vars here) ..))
| ^^^^^^^^^^^^^^ the term preceding the ellipsis
help: there must be at least one variable in the term preceding an ellipsis
E009
A symbol is present at the top-level of the definitions.
Example
(def I_am_okay because_I_am_a_definition)
I_am_a_symbol_not_a_definition
error[E009]: expected definition, found symbol
./test/programs/error-def-is-symbol.defs:2:1
|
2 | I_am_a_symbol_not_a_definition
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ this should be a definition
help: definitions look like this: '(def <pattern> <constructor>)'
E010
A definition was found with too many or too few terms. Each definition must have exactly three terms: (def[1] <pattern>[2] <constructor>[3])
Example (too many terms)
(def a b c)
error[E010]: definition has too many terms
./test/programs/error-def-4-terms.defs:1:1
|
1 | (def a b c)
| ^^^^^^^^^^^ has 4 terms, but should have 3
help: definitions look like this: '(def <pattern> <constructor>)'
Example (too few terms)
(def (blah (blah) blah))
error[E010]: definition has too few terms
./test/programs/error-def-2-terms.defs:1:1
|
1 | (def (blah (blah) blah))
| ^^^^^^^^^^^^^^^^^^^^^^^^ has 2 terms, but should have 3
help: definitions look like this: '(def <pattern> <constructor>)'
E011
A definition did not start with def
. All definitions must start with the symbol def
.
Example
(DEF A B)
error[E011]: definition does not start with 'def'
./test/programs/error-def-missing-def.defs:1:2
|
1 | (DEF A B)
| ^^^ this term should be 'def'
help: definitions look like this: '(def <pattern> <constructor>)'
E012
A variable was used in the constructor that was not matched in the pattern. This may be due to a missing dollar-sign ($
) prefix somewhere in the pattern.
Example
(def (copy x) $x)
error[E012]: variable not matched in pattern
./test/programs/error-var-not-matched.defs:1:15
|
1 | (def (copy x) $x)
| ^^ the variable
./test/programs/error-var-not-matched.defs:1:6
|
1 | (def (copy x) $x)
| ^^^^^^^^ the pattern
help: variables cannot be used without first being matched
Reflections on Haskell
I originally attempted to write this language in Rust, but switched over to Haskell after being unhappy with the clarity of the compilation process implementation. It became clear to me during development that the compilation process needed to be a series of rather complicated recursive traversals of various abstract syntax trees. Rust encourages implementing these traversals via ad hoc depth/breadth-first searches on data stored using some kind of container (such as a slotmap). In practice, this meant that I had to deal with a lot of busy work (managing slotmap keys, implementing and testing traversals, fighting the borrow checker) for not much benefit.
Switching to Haskell, I now no longer needed to worry as much about data lifetimes and could substitute ad hoc breadth/depth-first search implementations with functions from the recursion-schemes package. The code became simpler, clearer, and more maintainable.
Overall, Haskell is an excellent language. My favorite features are its algebraic data types, type classes, and type inference. That being said, Haskell is older than Java, so some rough edges are to be expected. Here are some I cut myself on.
Record Field Name Clashes
Consider the following Haskell code:
data Person = Person { name :: String, age :: Int }
data Animal = Animal { name :: String, age :: Int }
In any other language this would be no problem at all. We could
imagine that if bob :: Person
and bingo :: Animal
then we could
access their names via bob.name
and bingo.name
, and their ages via
bob.age
and bingo.age
. It’s not so simple in Haskell land, where
the dot (.
) is reserved for function composition. Instead of
bob.name
and bingo.name
, we use name bob
and name bingo
, where
name :: Person -> String
and name :: Animal -> String
. This is
similar to how some languages like Rust allow name-qualified field
accesses such as Person::name(bob)
and
Animal::name(bingo)
.
Unfortunately, Haskell doesn’t introduce each name
and age
function into their own nested namespaces—it spits them out verbatim
into the surrounding module:
data Person = Person { name :: String, age :: Int }
data Animal = Animal { name :: String, age :: Int }
^^^^ ^^^ Multiple declarations of ‘age’
Multiple declarations of ‘name’
Herein lies the problem. It’s not possible in Haskell to define two record types which share a field name in the same module, because Haskell (sensibly) does not allow multiple function definitions with the same name. This is a well-worn topic with several diverging solutions. Here is a short list of the ones I considered when building this language:
Put the data types in different modules.
Advantages:
- You get to keep the existing names.
- Requires no dependencies or extra language extensions.
Disadvantages:
- Impossible with mutually-recursive data definitions (i.e., data types defined in terms of each other). This is because Haskell disallows cyclic module importation.
- Importing both data types and their fields into the same module requires qualification and/or aliasing, thereby increasing verbosity.
Use the
DuplicateRecordFields
GHC extension.Advantages:
- You get to keep the existing names.
- Works with mutually-recursive data definitions.
- No qualification/aliasing needed when importing data type and fields.
Disadvantages:
- Increases technical debt (an extra language extension).
- Ambiguity sometimes requires explicit type annotations.
Prefix field names with type names, as in:
data Person = Person { personName :: String, personAge :: Int } data Animal = Animal { animalName :: String, animalAge :: Int }
Advantages:
- Works with mutually-recursive data definitions.
- Requires no dependencies or extra language extensions.
Disadvantages:
- Increased verbosity.
- Renaming the structure is more annoying now.
The simplest, most maintainable solution is arguably #3 as it is the
only one that doesn’t increase technical debt, allows for
mutually-recursive definitions, and is the least
fragile. Unfortunately, I only realized this when I was knee-deep in
#2. In addition to DuplicateRecordFields
I turned on
OverloadedRecordDot
, which makes syntax like bob.name
“just
work”. This is convenient but, alas, increases technical debt; if I
had to write this language again, I’d use solution #3.
Shadowing Woes
It is common when dealing with linearly-consumed resources to use shadowing, thereby eliminating the possibility of referencing consumed values. This pattern is encouraged in Rust:
let resource = create_resource();
let resource = transform_1(resource);
let resource = transform_2(resource);
...
In this example the transform_x
functions consume resource
; it
would be an error to access resource
once its ownership was passed
to transform_x
. By shadowing resource
it becomes impossible to
talk about what came before, thus eliminating the possibility that we
might later accidentally refer to a consumed version.
This pattern is impossible in Haskell’s let
and where
clauses. The
problem is that the x
on the right-hand side of the assignment
statement in let x = transform_1 x in ...
in Haskell is the same x
as the one on the left-hand side; Haskell’s let
(and where
)
clauses are recursive, and there is no way to turn this off. This is
not the case with Haskell’s left-arrow “bind” operation in
do
-blocks:
do
resource <- createResource
resource <- transform_1 resource
resource <- transform_2 resource
...
This code works because bindings introduced via <-
are not
recursive. The resource
on the right-hand side is the previous
resource
, not the resource
on the left-hand side.
On a couple of occasions I needed to refactor do
blocks into let
and
where
clauses, and was bitten by the fact that what used to be
shadowing was now recursion. This did not produce a compile
error. Instead, it produced an infinite loop at runtime which would
consume all of my computer’s memory and eventually crash. Even with
some rudimentary “stack tracing” options turned on, Haskell’s laziness
made tracking down these infinite loops frustratingly difficult.
Certain other programming languages such as OCaml, Lean, and Racket
address this (and other) issue(s) with two variants of let
clauses,
namely let
(non-recursive) and let rec
(recursive).
My suggestion is to not use shadowing at all in Haskell (as is
recommended by a GHC warning). If this pattern is really important to
you, it might just be worth using the Identity
monad to keep the
non-recursive bindings in do
blocks instead of refactoring code into
let
and where
clauses.