Bullet: Adventures in a minimally delimited lisp
Introduction
This document details my meanderings in language design; specifically, my attempt to banish (to the extent possible) parentheses from lisp, and the result of that effort: bullet, a minimally delimited lisp on the JVM. In this article, the reader will be introduced to bullet via a sampling of code snippets, familiarize himself with the bullet line pre-processor, learn about the extra structural operators in bullet's reader, as well as bullet's approach to Java interop.
The reasons I embarked on this foolhardy endeavour are twofold. First, after moving into my new apartment, I was left stranded without internet for a few weeks, so I couldn't pursue my main programming activity (which makes extensive use of Amazon AWS). So I started writing a lisp, just for fun. The second reason is that having used clojure for part of my main project, I found that although although it is extremely useful, interesting, and well-executed, I nonetheless came away feeling it wasn't exactly for me. For instance, its approach to Java objects is more adverserial than I would like: you're given the facility to do everything you need to do, but it does stick out quite obviously when you do it, because Java is imperative, whereas in clojure you're shepherded to the eden of functional purity. I feel that clojure in some sense lost an opportunity to be the useful lisp on the JVM. I think its focus on relatively esoteric (and perhaps unproven) concurrency and STM mechanisms, as well as its insistence on non-mutability detract from its attraction as a language in which I can just get things done.
In any case, while looking at my clojure code, it was clear that much of the structural information conveyed by the parentheses was mirrored in the indentation of the source code. Some of parentheses are essentially redundant. Following in the path of arc, it seemed desirable to contemplate a language which removed this redundancy. This would arguably improve readability. It would also improve editability, since we wouldn't need to undo and redo closing parentheses for lines we have altered.
Bullet is not the first attempt at banishing parentheses from lisp. generys is a similar attempt, discussed here. Another attempt is David Moon's PLOT, which looks quite nice, although I haven't actually tried it in anger. Finally, some lispers consider dylan to be an attempt along the same lines.
Bullet is currently an interpreted language at an exploratory stage of development. Bullet is a lisp-like, dynamic, lexically scoped, (currently) interpreted language which features tight interop with Java. It is available at code.google.com/p/bullet-lang. Please expect bugs! Also, error messages are not necessarily informative, though I have made every effort to make them so, and the facility (of B-expressions) to trace forms to their (sugared) origin was added at a very early date, so most errors accurately point to the correct error location in the source code. This makes correcting coding mistakes, for me at least, quite straightforward. With that said, bullet in its current form is little more than a slick toy language, and should certainly not be used for production code.
Why am I publishing this article? Quite simple: I'd like your feedback (ideally on hacker news or progit)! All comments, positive or negative, are welcome. If I've lost the plot with bullet, then it's probably best that I be tethered back to planet Earth sooner rather than later :)
First Kill
Following are some short, illustrated snippets of bullet code.
Hello, world!
First, the obligatory hello world example.
func greet () print "Hello, " main.args.0 "!" // main.args is an array of arg line parameters. greet
Example function: factorial
Following is an implementation of the factorial function. In bullet, functions are introduced via the func macro:
func factorial (n)
if (> n 1)
* n: factorial: dec n
1This example shows the central (and blindingly obvious) concept of line indentation, which via the line preprocessor tranforms the above to:
(func factorial (n)
(if (> n 1)
(* n: factorial: dec n)
(1)))This already looks a lot more like a lisp. But what happened in the third line? We'll cover that soon enough. And eagle-eyed lispers will note that your lisp next door will puke when it encounters the form (1). In bullet, a form containing a single object evaluates to the object, which means that (1) ==> 1, as required. (Note that even without this added, perhaps bad (it's not clear, at least to me, yet), alteration to standard lisp evaluation of forms, the problem is not so huge: we simply write val 1 instead of 1 above, val being a function which simply returns its argument -- i.e. the identity function)
One-liner for functional programmers
mkList 0 1 2 3 4 | map (fn (x): + x 2) | print
This example should be immediately familiar to programmers coming from haskell or ML -- or even bash! It prints the following:
[2, 3, 4, 5, 6]
The | operator is an infix operator and is called the pipe operator. In its quest to liquidate parentheses, bullet introduces a number of such infix operators, of which we will see more soon.
Macros
Bullet is a kinda-sorta-lisp, and no lisp merits the name without a macro system. Bullet's macro system closely follows that of clojure:
macro foreach (varname collection :rest body)
qquote: begin
var uniq.i: ,collection iterator
while uniq.i.hasNext
var ,varname uniq.i.next
begin ,@bodyThis code creates a macro called foreach with roughly the same semantics as the foreach keyword in Csharp. It is used like this:
func map (mapFn xs)
var result: new ArrayList
foreach x xs
result add: mapFn x
result, which in turn shows how easy it is to use Java objects within bullet: the line
result add: mapFn x
desugars to
(result add (mapFn x)),
and is essentially equivalent to the java code:
result.add(mapFn(x));
Which ought to be familiar to just about everyone out there.
Given this first taste of metal, let's examine bullet in a bit more detail.
Bullet's LineProcessor
Bullet's LineProcessor is a class which transforms line-oriented and indented code into forms for further consumption by the Reader (i.e. parser). Essentially the goal is to transform something like:
var xs: new ArrayList for (var i 0; < i 4; inc i) xs add: * i 4 print xs
into this:
(var xs: new ArrayList) (for (var i 0; < i 4; inc i) (xs add: * i 4)) (print xs)
The naive solution is simply to surround every line in the source code with parentheses. This works for the first and last lines, but fails on the second and third, since it translates those to:
(for (var i 0; < i 4; inc i)) (xs add: * i 4)
The python solution to this problem is to keep a stack of indentation levels progressively encountered in the source code. At every new step of indentation (up or down) the stack is either pushed or popped, and the appropriate INDENT or DEDENT token is emitted. Bullet follows the same approach, with a slight twist: the closing parenthesis on a line followed by indented code is removed -- that closing parenthesis is reintroduced when the indententation level drops back to the line's level. Conceptually, we can imagine the indented code being surrounded by parentheses and shunted at the end of the previous line:
for (var i 0; < i 4; inc i) xs add: * i 4 xs add: + i 10
becomes
(for (var i 0; < i 4; inc i) (xs add: * i 4) (xs add: + i 10))
This is quite nice; many constructs in lisp support an arbitrary number of forms being added at the end (for example, lambda functions, macro definitions, cond statements, when statements). Also, since everying in lisp in expression, you can basically insert arbitrary code wherever you want. For example:
var populate true
var xs
if populate
begin
var tmp: new ArrayList
tmp add "foo"
tmp add "bar"
tmp
begin
print "This is the else-clause, which you won't see!"
null // return null if we don't want to populateHowever, it's desirable at times to be able to split lengthy argument lists to functions over several lines, and here our simple preprocessor stumbles onto some difficulties: breaking the line results in subsequent lines being surrounded by parentheses! The solution is to define an extra indent level (in bullet, it is indent >= 4), for which, as with 2-indents, lines are shunted into the previous line at the end, but in this case they are shunted without being surrounded by parentheses. So:
mkList +
-
/translates to
(mkList + - /)
rather than
(mktList (+) (-) (/))
, as would happen if we had used a 2-indent, and which would have resulted in (erroneous, due to missing arguments) primitive invocations.
Bullet's Reader
Bullet's reader follows the general mould of lisp readers, taking data from an input stream and outputing essentially S-expressions. In bullet, B-expressions are implemented using Java's lowly LinkedList. In addition, they carry some extra baggage to identify their location in the source code. The reader was written by hand, making use of Java's low-level string routines (which are very fast). One of the beauties of lisp is that the prefix notation is simple, making the grammar almost trivial to parse. Bullet's reader has to work a little harder, needing one token of look-ahead to be able to handle its infix operators, but is relatively simple in comparison to more conventional infix languages like Java, Scala, etc.
The need for syntax extensions
We quickly run into the limitations of the line-oriented paradigm when considering cond:
var name "Martin" cond (name equals "Martin") (print "You are a fool.") (name equals "Frodo") (print "You like saved everyone.") (name equals "Alex") (print "You rock.")
Lines 4-7 are somewhat less ugly from what they would be in regular lisp (owing to one set of parentheses being removed in bullet w.r.t lisp), but parentheses still abund in the code. If we are to remain faithful to our bracket jihad, we must remove them. Also, more seriously, such expressions are a pain when refactored or edited: you end up having to navigate till the end of the line and delete (or add) one (or more -- until it matches the beginning where you started changing [which admittedly is automatic or easy if you're using emacs or vi or eclipse]) parentheses. Even though your editor makes it easy, it's tedious and a waste of type-time. Bullet's solution to this problem is to use infix operators to delimit and group forms in hopefully useful ways. Bullet's postulate is that infix notations, when used to structure B-expressions, can help comprehension of code, in addition to making it quicker to write and edit/refactor.
The : and $ Operators
The : operator gives the code distinctly python-like flavour, and has an obvious use-case. Syntactically:
A : B : C ==> A (B (C))
The operator has a very low precedence, so it in some sense covers the line until its end (and other operators that follow it). The operator appears very frequently in code:
var xs: new ArrayList // ==> var xs (new ArrayList) xs add: * 1 1 xs add: * 2 2 map (fn (x): * x 2) xs // the ':' is terminated by a raw close paren // example from the core _bullet_ library macro != (left right) qquote: not: == ,left ,right
To my eye, the notation is a clear win. It provides a very clear visual cue that what follows the colon is to be grouped, and indeed this is exactly what occurs.
The $ operator has a similar precedence (only one level lower ) to :. It is the analogue of a four-line indent (versus the two-line indent for :) on a single line: it groups what follows it, and allows other operators to be terminated. More on when we encounter the Semicolon operator.
The Pipe Operator
We have already encountered the Pipe operator, and it ought to imbue functional programmers with sensations of warm familiarity. Here's an example of it at work:
var rawxs: mkList 0 1 2 3 4 var sum rawxs | map (fn (x): - 4: * x 2) | filter (fn (x): > x 0)
Syntactically:
A | B ==> B A A | B | C ==> C (B A) A | B | C | D ==> D (C (B A)) ...
The pipe operator is extremely useful when used in conjuntion with lambda functions and a functional programming style. It is very natural in case to think of data being weaved or piped through a series of transformations (effected by funcs). It can perform the same function (albeit in reversed order) as function composition.
The Semicolon Operator
The semi-colon acts as a separator for a sequence of forms (on a single line):
A ; B ; C ==> (A) (B) (C)
An example use case is bullet's low-level loop construct, the for loop. It looks like this:
for (var i 0; < i 10; inc i) print i
This loop prints the integers from 0 to 9 inclusive. The code translates to:
(for ((var i 0) (< i 10) (inc i)) (print i))
Please note that we're not advocating the use of the for loop in your code. However, one could imagine it being useful if variables could optionally be given types, so that we could make a tight loop iterating over unboxed integers. Also, while it's currently not implemented in bullet, it would be nice to have Java's break and continue (ideally supporting labels too!) features.
Another use of the semicolon is in the cond macro:
var n 10 cond < n 0 ; print "n is negative" == n 0 ; print "n is zero" > n 0 ; print "n is positive"
The cond macro is an interesting example: Paul Graham discusses it in his exposition of arc. Specifically, he deems the language construct to be excessively verbose and thus a poor design decision. He proposes instead to peel a layer from the onion, and make cond accept a list of alternating conditions and consequences, instead of the perhaps more parsimonious list of condition-consequence clauses. With bullet's semicolon, this austerity isn't necessary, and we receive a visual cue of the separation between the condition and clause on each line to boot.
As mentioned above, the ; operator can be combined with the $ operator like so:
var xs: mkList 0 1 2 3 foreach x xs $ print "--"; print x
which could be equivalently written:
var xs: mkList 0 1 2 3 foreach x xs print "--" print x
In this case, the semicolon can be read almost like the standard semicolon in Java or C.
The lookup operators: the "." and ". " notations
A simple example should make matters clear very quickly:
var s1 "foobar".toUpperCase // == s "FOOBAR"
var x 10f.toString // could be written: var x (10 toString)
var y 40f.toString // could be written: var y: 40 toString
// prints (x, y)
// note how spaces, i.e. ". " makes the operator precedence
// lower, which means we can add arguments to the method calls
// inherent in the clauses:
print
"(". concat x . concat ", " . concat y
. concat ")"
// next line prints: "up in bull"
print
"lookup in _bullet_".toUpperCase . substring 4 13 . toLowerCaseThe syntax transformation that occurs when encountering a dot is as follows:
a.b ==> (a b) [VERY HIGH PRECEDENCE] a.b.c ==> ((a b) c) etc a. b ==> a b [VERY LOW PRECEDENCE] a. b. c ==> (a b) c
To make the difference clear:
a.b. c d.e ==> ((a b) c (d e)) // NB! a.b. c d. e ==> ((a b) c d) e
The dot operator, and bullet's extensions to the traditional lisp form evaluation mechanism, work together to allow completely transparent access to Java code. Although underneath the hood, everything is B-expressions of the form (a b c ...), the dotted notation is far more economical and clear than direct notation. Given the quantity of preexisting libraries available in Java, this is a consideration one is well advised to take seriously as a language designer. Bullet compares favourably in this area with for example ABCL or even clojure. In fact, amazingly enough, Bullet compares favourably even with Java for calling Java, which is rather remarkable given that it is a lisp (-like-thingy).
The beauty of the notation is also its extensibility. Since juxtaposition in bullet is equivalent to invocation, we can imagine custom "invokers" which for instance allow (someArray index) to be array access. (This is a dynamic language, after all.) The dot notation would then allow code like:
// set C: * A B where A, B are matrices
func mmul (A B)
var C: new Matrix A.rows B.cols
for (var i 0; < i A.rows; inc i)
for (var j 0; < j B.cols; inc k)
for (var k 0; < k A.cols; inc k)
+= C.i.j (* A.i.k B.k.j)
COperators: worth the bother?
So, in the end, are infix grouping operators worth the trouble? After all, they do eat up considerable amount of our operator-symbol budget (indeed, bullet makes a conscious decision to use operators for B-expression grouping rather than e.g. arithmetic/logical operators).
My opinion, and it is only my opinion, is that they are a net win. That we type less is incontrovertible. Whether the changes result in improved comprehension or readability is open for debate. In my opinion, at least the ':' and '.', and most likely the '|' operator, improve readability. The ';' case is less clearcut, although I have no plans to remove the feature.
One disadvantage in this scheme (pun unintended) is that you arguably still need to understand the "hidden" brackets in order to write code. Also, it is easy to make the mistake of adding one too many ':' (because, for example, you previously had a one-liner, using ':' to separate, and then changed it into a multi-liner using 2-indent [so you should remove the ':']). The good news is that your author thought of this issue very early on, and made corresponding efforts to keep the error messages clear, most importantly: furnished with accurate locations. Normally, once you're pointed to the line/char pos pair where the error code occured, it's pretty clear what went wrong. For example, in the example above, your friendly bullet interpreter should point you to the location of the superfluous ':'. It would not be hard to add a little more logic to bullet to catch additional "typical" mistakes, so that the error messages would be even more friendly.
Also, sundry list of complaints levelled at python and e.g. haskell apply here also. Whitespace conscious interpreters have been accused of being brittle under editor changes. In fact whitespace is the one area in bullet where we take all decisions away from the programmer: (unescaped) tabs are simply banned. That's right, bullet will simply puke with the location of the errant tab character. Still, for some programmers, whitespace sensitivity is a bad thing and I probably won't convince that constituency of bullet's merits.
Bullet Interpreter and Language
Having looked at bullet's line processor, and the unique features of its reader (i.e. parser), let's turn our attention to the core language and interpreter.
Bullet is a lisp in some senses: there is a reader, we use essentially S-Exprs (even though that's not what the programmer writes directly, we let the preprocessor handle that tedium), functions are values (of class Lambda), macros are part of the language (bullet offers an unhygienic macro facility including quasiquote, unquote, and splicing unquote, as well as a gensym facility via the uniq.varname idiom (see macro example above)). Also, code is data, indeed B-expression implements List
