seertaak's Space

Bullet: Adventures in a minimally delimited lisp

Introduction

This document details my meanderings in language design; specifically, my attempt to banish (to the extent possible) parentheses from lisp, and the result of that effort: bullet, a minimally delimited lisp on the JVM. In this article, the reader will be introduced to bullet via a sampling of code snippets, familiarize himself with the bullet line pre-processor, learn about the extra structural operators in bullet's reader, as well as bullet's approach to Java interop.

The reasons I embarked on this foolhardy endeavour are twofold. First, after moving into my new apartment, I was left stranded without internet for a few weeks, so I couldn't pursue my main programming activity (which makes extensive use of Amazon AWS). So I started writing a lisp, just for fun. The second reason is that having used clojure for part of my main project, I found that although although it is extremely useful, interesting, and well-executed, I nonetheless came away feeling it wasn't exactly for me. For instance, its approach to Java objects is more adverserial than I would like: you're given the facility to do everything you need to do, but it does stick out quite obviously when you do it, because Java is imperative, whereas in clojure you're shepherded to the eden of functional purity. I feel that clojure in some sense lost an opportunity to be the useful lisp on the JVM. I think its focus on relatively esoteric (and perhaps unproven) concurrency and STM mechanisms, as well as its insistence on non-mutability detract from its attraction as a language in which I can just get things done.

In any case, while looking at my clojure code, it was clear that much of the structural information conveyed by the parentheses was mirrored in the indentation of the source code. Some of parentheses are essentially redundant. Following in the path of arc, it seemed desirable to contemplate a language which removed this redundancy. This would arguably improve readability. It would also improve editability, since we wouldn't need to undo and redo closing parentheses for lines we have altered.

Bullet is not the first attempt at banishing parentheses from lisp. generys is a similar attempt, discussed here. Another attempt is David Moon's PLOT, which looks quite nice, although I haven't actually tried it in anger. Finally, some lispers consider dylan to be an attempt along the same lines.

Bullet is currently an interpreted language at an exploratory stage of development. Bullet is a lisp-like, dynamic, lexically scoped, (currently) interpreted language which features tight interop with Java. It is available at code.google.com/p/bullet-lang. Please expect bugs! Also, error messages are not necessarily informative, though I have made every effort to make them so, and the facility (of B-expressions) to trace forms to their (sugared) origin was added at a very early date, so most errors accurately point to the correct error location in the source code. This makes correcting coding mistakes, for me at least, quite straightforward. With that said, bullet in its current form is little more than a slick toy language, and should certainly not be used for production code.

Why am I publishing this article? Quite simple: I'd like your feedback (ideally on hacker news or progit)! All comments, positive or negative, are welcome. If I've lost the plot with bullet, then it's probably best that I be tethered back to planet Earth sooner rather than later :)

First Kill

Following are some short, illustrated snippets of bullet code.

Hello, world!

First, the obligatory hello world example.

func greet ()
  print "Hello, " main.args.0 "!" // main.args is an array of arg line parameters.

greet

Example function: factorial

Following is an implementation of the factorial function. In bullet, functions are introduced via the func macro:

func factorial (n)
  if (> n 1)
    * n: factorial: dec n
    1

This example shows the central (and blindingly obvious) concept of line indentation, which via the line preprocessor tranforms the above to:

(func factorial (n)
  (if (> n 1)
    (* n: factorial: dec n)
    (1)))

This already looks a lot more like a lisp. But what happened in the third line? We'll cover that soon enough. And eagle-eyed lispers will note that your lisp next door will puke when it encounters the form (1). In bullet, a form containing a single object evaluates to the object, which means that (1) ==> 1, as required. (Note that even without this added, perhaps bad (it's not clear, at least to me, yet), alteration to standard lisp evaluation of forms, the problem is not so huge: we simply write val 1 instead of 1 above, val being a function which simply returns its argument -- i.e. the identity function)

One-liner for functional programmers

mkList 0 1 2 3 4 | map (fn (x): + x 2) | print

This example should be immediately familiar to programmers coming from haskell or ML -- or even bash! It prints the following:

[2, 3, 4, 5, 6]

The | operator is an infix operator and is called the pipe operator. In its quest to liquidate parentheses, bullet introduces a number of such infix operators, of which we will see more soon.

Macros

Bullet is a kinda-sorta-lisp, and no lisp merits the name without a macro system. Bullet's macro system closely follows that of clojure:

macro foreach (varname collection :rest body)
  qquote: begin 
    var uniq.i: ,collection iterator
    while uniq.i.hasNext
      var ,varname uniq.i.next
      begin ,@body

This code creates a macro called foreach with roughly the same semantics as the foreach keyword in Csharp. It is used like this:

func map (mapFn xs)
  var result: new ArrayList
  foreach x xs
    result add: mapFn x
  result

, which in turn shows how easy it is to use Java objects within bullet: the line

result add: mapFn x

desugars to

(result add (mapFn x)),

and is essentially equivalent to the java code:

result.add(mapFn(x));

Which ought to be familiar to just about everyone out there.

Given this first taste of metal, let's examine bullet in a bit more detail.

Bullet's LineProcessor

Bullet's LineProcessor is a class which transforms line-oriented and indented code into forms for further consumption by the Reader (i.e. parser). Essentially the goal is to transform something like:

var xs: new ArrayList
for (var i 0; < i 4; inc i)
  xs add: * i 4
print xs

into this:

(var xs: new ArrayList)
(for (var i 0; < i 4; inc i)
  (xs add: * i 4))
(print xs)

The naive solution is simply to surround every line in the source code with parentheses. This works for the first and last lines, but fails on the second and third, since it translates those to:

(for (var i 0; < i 4; inc i))
  (xs add: * i 4)

The python solution to this problem is to keep a stack of indentation levels progressively encountered in the source code. At every new step of indentation (up or down) the stack is either pushed or popped, and the appropriate INDENT or DEDENT token is emitted. Bullet follows the same approach, with a slight twist: the closing parenthesis on a line followed by indented code is removed -- that closing parenthesis is reintroduced when the indententation level drops back to the line's level. Conceptually, we can imagine the indented code being surrounded by parentheses and shunted at the end of the previous line:

for (var i 0; < i 4; inc i)
  xs add: * i 4
  xs add: + i 10

becomes

(for (var i 0; < i 4; inc i) (xs add: * i 4) (xs add: + i 10))

This is quite nice; many constructs in lisp support an arbitrary number of forms being added at the end (for example, lambda functions, macro definitions, cond statements, when statements). Also, since everying in lisp in expression, you can basically insert arbitrary code wherever you want. For example:

var populate true
var xs 
  if populate
    begin
      var tmp: new ArrayList
      tmp add "foo"
      tmp add "bar"
      tmp
    begin
      print "This is the else-clause, which you won't see!"
      null // return null if we don't want to populate

However, it's desirable at times to be able to split lengthy argument lists to functions over several lines, and here our simple preprocessor stumbles onto some difficulties: breaking the line results in subsequent lines being surrounded by parentheses! The solution is to define an extra indent level (in bullet, it is indent >= 4), for which, as with 2-indents, lines are shunted into the previous line at the end, but in this case they are shunted without being surrounded by parentheses. So:

mkList +
       -
       /

translates to

(mkList + - /)

rather than

(mktList (+) (-) (/))

, as would happen if we had used a 2-indent, and which would have resulted in (erroneous, due to missing arguments) primitive invocations.

Bullet's Reader

Bullet's reader follows the general mould of lisp readers, taking data from an input stream and outputing essentially S-expressions. In bullet, B-expressions are implemented using Java's lowly LinkedList. In addition, they carry some extra baggage to identify their location in the source code. The reader was written by hand, making use of Java's low-level string routines (which are very fast). One of the beauties of lisp is that the prefix notation is simple, making the grammar almost trivial to parse. Bullet's reader has to work a little harder, needing one token of look-ahead to be able to handle its infix operators, but is relatively simple in comparison to more conventional infix languages like Java, Scala, etc.

The need for syntax extensions

We quickly run into the limitations of the line-oriented paradigm when considering cond:

var name "Martin"
cond
  (name equals "Martin") (print "You are a fool.")
  (name equals "Frodo")  (print "You like saved everyone.")
  (name equals "Alex")   (print "You rock.")

Lines 4-7 are somewhat less ugly from what they would be in regular lisp (owing to one set of parentheses being removed in bullet w.r.t lisp), but parentheses still abund in the code. If we are to remain faithful to our bracket jihad, we must remove them. Also, more seriously, such expressions are a pain when refactored or edited: you end up having to navigate till the end of the line and delete (or add) one (or more -- until it matches the beginning where you started changing [which admittedly is automatic or easy if you're using emacs or vi or eclipse]) parentheses. Even though your editor makes it easy, it's tedious and a waste of type-time. Bullet's solution to this problem is to use infix operators to delimit and group forms in hopefully useful ways. Bullet's postulate is that infix notations, when used to structure B-expressions, can help comprehension of code, in addition to making it quicker to write and edit/refactor.

The : and $ Operators

The : operator gives the code distinctly python-like flavour, and has an obvious use-case. Syntactically:

A : B : C ==> A (B (C))

The operator has a very low precedence, so it in some sense covers the line until its end (and other operators that follow it). The operator appears very frequently in code:

var xs: new ArrayList // ==> var xs (new ArrayList)
xs add: * 1 1
xs add: * 2 2
map (fn (x): * x 2) xs // the ':' is terminated by a raw close paren
// example from the core _bullet_ library
macro != (left right)
  qquote: not: == ,left ,right

To my eye, the notation is a clear win. It provides a very clear visual cue that what follows the colon is to be grouped, and indeed this is exactly what occurs.

The $ operator has a similar precedence (only one level lower ) to :. It is the analogue of a four-line indent (versus the two-line indent for :) on a single line: it groups what follows it, and allows other operators to be terminated. More on when we encounter the Semicolon operator.

The Pipe Operator

We have already encountered the Pipe operator, and it ought to imbue functional programmers with sensations of warm familiarity. Here's an example of it at work:

var rawxs: mkList 0 1 2 3 4
var sum
  rawxs | map (fn (x): - 4: * x 2) | filter (fn (x): > x 0)

Syntactically:

A | B         ==> B A
A | B | C     ==> C (B A)
A | B | C | D ==> D (C (B A))
...

The pipe operator is extremely useful when used in conjuntion with lambda functions and a functional programming style. It is very natural in case to think of data being weaved or piped through a series of transformations (effected by funcs). It can perform the same function (albeit in reversed order) as function composition.

The Semicolon Operator

The semi-colon acts as a separator for a sequence of forms (on a single line):

A ; B ; C ==> (A) (B) (C)

An example use case is bullet's low-level loop construct, the for loop. It looks like this:

for (var i 0; < i 10; inc i)
  print i

This loop prints the integers from 0 to 9 inclusive. The code translates to:

(for ((var i 0) (< i 10) (inc i))
  (print i))

Please note that we're not advocating the use of the for loop in your code. However, one could imagine it being useful if variables could optionally be given types, so that we could make a tight loop iterating over unboxed integers. Also, while it's currently not implemented in bullet, it would be nice to have Java's break and continue (ideally supporting labels too!) features.

Another use of the semicolon is in the cond macro:

var n 10
cond
  < n 0  ; print "n is negative"
  == n 0 ; print "n is zero"
  > n 0  ; print "n is positive"

The cond macro is an interesting example: Paul Graham discusses it in his exposition of arc. Specifically, he deems the language construct to be excessively verbose and thus a poor design decision. He proposes instead to peel a layer from the onion, and make cond accept a list of alternating conditions and consequences, instead of the perhaps more parsimonious list of condition-consequence clauses. With bullet's semicolon, this austerity isn't necessary, and we receive a visual cue of the separation between the condition and clause on each line to boot.

As mentioned above, the ; operator can be combined with the $ operator like so:

var xs: mkList 0 1 2 3
foreach x xs $ print "--"; print x

which could be equivalently written:

var xs: mkList 0 1 2 3
foreach x xs 
  print "--"
  print x

In this case, the semicolon can be read almost like the standard semicolon in Java or C.

The lookup operators: the "." and ". " notations

A simple example should make matters clear very quickly:

var s1 "foobar".toUpperCase // == s "FOOBAR"
var x 10f.toString // could be written: var x (10 toString)
var y 40f.toString // could be written: var y: 40 toString
// prints (x, y)
// note how spaces, i.e. ". " makes the operator precedence
// lower, which means we can add arguments to the method calls
// inherent in the clauses:
print
  "(". concat x . concat ", " . concat y
       . concat ")"
// next line prints: "up in bull"
print
  "lookup in _bullet_".toUpperCase . substring 4 13 . toLowerCase

The syntax transformation that occurs when encountering a dot is as follows:

a.b   ==> (a b)     [VERY HIGH PRECEDENCE]
a.b.c ==> ((a b) c)
etc
a. b ==> a b        [VERY LOW PRECEDENCE]
a. b. c ==> (a b) c

To make the difference clear:

a.b. c d.e ==> ((a b) c (d e))  // NB!
a.b. c d. e ==> ((a b) c d) e

The dot operator, and bullet's extensions to the traditional lisp form evaluation mechanism, work together to allow completely transparent access to Java code. Although underneath the hood, everything is B-expressions of the form (a b c ...), the dotted notation is far more economical and clear than direct notation. Given the quantity of preexisting libraries available in Java, this is a consideration one is well advised to take seriously as a language designer. Bullet compares favourably in this area with for example ABCL or even clojure. In fact, amazingly enough, Bullet compares favourably even with Java for calling Java, which is rather remarkable given that it is a lisp (-like-thingy).

The beauty of the notation is also its extensibility. Since juxtaposition in bullet is equivalent to invocation, we can imagine custom "invokers" which for instance allow (someArray index) to be array access. (This is a dynamic language, after all.) The dot notation would then allow code like:

// set C: * A B where A, B are matrices
func mmul (A B)
  var C: new Matrix A.rows B.cols
  for (var i 0; < i A.rows; inc i)
    for (var j 0; < j B.cols; inc k)
      for (var k 0; < k A.cols; inc k)
        += C.i.j (* A.i.k B.k.j)
  C

Operators: worth the bother?

So, in the end, are infix grouping operators worth the trouble? After all, they do eat up considerable amount of our operator-symbol budget (indeed, bullet makes a conscious decision to use operators for B-expression grouping rather than e.g. arithmetic/logical operators).

My opinion, and it is only my opinion, is that they are a net win. That we type less is incontrovertible. Whether the changes result in improved comprehension or readability is open for debate. In my opinion, at least the ':' and '.', and most likely the '|' operator, improve readability. The ';' case is less clearcut, although I have no plans to remove the feature.

One disadvantage in this scheme (pun unintended) is that you arguably still need to understand the "hidden" brackets in order to write code. Also, it is easy to make the mistake of adding one too many ':' (because, for example, you previously had a one-liner, using ':' to separate, and then changed it into a multi-liner using 2-indent [so you should remove the ':']). The good news is that your author thought of this issue very early on, and made corresponding efforts to keep the error messages clear, most importantly: furnished with accurate locations. Normally, once you're pointed to the line/char pos pair where the error code occured, it's pretty clear what went wrong. For example, in the example above, your friendly bullet interpreter should point you to the location of the superfluous ':'. It would not be hard to add a little more logic to bullet to catch additional "typical" mistakes, so that the error messages would be even more friendly.

Also, sundry list of complaints levelled at python and e.g. haskell apply here also. Whitespace conscious interpreters have been accused of being brittle under editor changes. In fact whitespace is the one area in bullet where we take all decisions away from the programmer: (unescaped) tabs are simply banned. That's right, bullet will simply puke with the location of the errant tab character. Still, for some programmers, whitespace sensitivity is a bad thing and I probably won't convince that constituency of bullet's merits.

Bullet Interpreter and Language

Having looked at bullet's line processor, and the unique features of its reader (i.e. parser), let's turn our attention to the core language and interpreter.

Bullet is a lisp in some senses: there is a reader, we use essentially S-Exprs (even though that's not what the programmer writes directly, we let the preprocessor handle that tedium), functions are values (of class Lambda), macros are part of the language (bullet offers an unhygienic macro facility including quasiquote, unquote, and splicing unquote, as well as a gensym facility via the uniq.varname idiom (see macro example above)). Also, code is data, indeed B-expression implements List , so a B-expression is truly just data.

 

Another area where bullet differs from other lisps is the transparency with which JVM objects are available. Indeed, the fundamental types of bullet are imported wholesale from java: Byte, Integer, Character, etc. That means that when a integer or string or character literal are read, the syntax tree contains bog-standard java integers, strings, and characters. Bullet also has Java import feature, which is used to import stuff we want (following is an excerpt of bullet.bt, the core bullet file which is loaded automatically and slotted into the current environment [via the "include" primitive]):

import org.bullet.LineProcessor
import org.bullet.Interpreter
import org.bullet.Reader
// ..
import org.bullet.value.Lambda
import org.bullet.value.Macro
import org.bullet.value.BExpr
import org.bullet.value.Primitive
import org.bullet.value.Symbol

import java.lang.Object
import java.lang.String
// ....

import java.util.ArrayList
import java.util.HashMap
import java.util.Arrays
// ....

(The above imports are quite verbose: at some point I'd like to add more functionality to the import primitive, allowing e.g. lists and maybe wildcards. But it was easy to code and gets the job done!) With these imports, we can now play Java:

var x: Arrays asList "foo" 22 '\n' 3.14
var m: new HashMap
m put "Bullet"  "very cool"
m put "Clojure" "very cool"
print m
var bt: m get "Bullet"
when (bt equals "very cool")
  print "That's swell."

These examples bring us to the heart of what makes bullet different from other lisps. Crucial to every lisp is a strategy for evaluating forms. The standard thing to happen is to evaluate the datums in the form in turn, and then invoke the function (ignoring for the moment macros) represented by the first datum. Bullet adds two twists to this scheme, both of which you've encountered. The first is that B-expressions consisting of a single object (say, an integer) evaluate to the object itself - in other words, when an object is presented with no arguments, we do not invoke any method or code. The second is that when the datum in first position is an object (which is, of course, always the case! [and is the reason why we apply this logic only if the first datum is not a macro or lambda]) and there is:

  • One argument: then this is field access. An example is the bullet code

    print Integer.MAX_INT
    print: Integer MAX_INT // same as previous line

    which retrieves the MAX_INT (static) field from the class object Integer. Currently, there is no way to set a (public) field on an object, other than by using its accessor methods. But it would be straightforward build such functionality into set, e.g. allowing

    set Integer.MAX_INT 5
  • More than one argument: then this is method invocation. Some examples:

    var mad: "madrid" substring 0 3
    var i10: Integer valueOf "10"

This is rather bold and perhaps somewhat foolhardy approach. Negative points are that it will certainly complicate macro writing, since we can no longer assume that we can "pivot" on the first position. Still, if we adopt some conventions and don't mix macros operating on data streams (i.e. functional style) and those operating on actors (i.e. object oriented style) we should be ok. Also, it is possible that overloading of evaluation has negative unforeseen consequences. I find that, in turn, hard to believe, starting from a new language as is the case here. The reason why I find it hard to believe is that the notations are themselves absolutely logical:

object-or-class field-or-static-field
object-or-class method-or-static-method arguments

All of the notations above essentially boil down to JVM invocations of some sort, whether field access, invokevirtual, invokeinterface, invokestatic, etc. It makes sense to underscore the operational similarity through a unified syntax. If we compare the above list with (some of) clojure, we see:

Object / static-field              
Object / static-method arguments   
. object instance-method arguments

So the snippets above would look like:

(print Integer/MAX_INT)
(def mad (. "madrid" substring 0 3))
(def i10 (Integer/valueOf "10"))

Which is actually not too bad, especially if you compare it to ABCL, though not as concise or consistent as bullet's notation. Clojure, in some sense, doesn't truly see Java objects as first class citizens, since invocation needs to occur through a special primitive. And Clojure also encourages you to use custom types and type hierarchies, allowing for example multimethods, which are not fully compatible with Java's typing. This in turn forces developers to make a tradeoff between compatibility with Java libraries (and in some cases performance) and conciseness/idiomaticity of code. In bullet, Java objects are not only equal citizens: they're pretty much the only citizens in town. But if that's the case, the reader may wonder, what's the point? Why simply wrap Java's type system? The answer is that we don't intend to simply wrap Java's type system -- we want to do much more. But:

  • the JVM is built to handle Java objects and is optimized for that use case, so it makes sense to handle those fluently and with no performance penalty
  • that means imperative programming and mutability of variables are allowed (but so are functional approaches!)
  • extensions to the standard Java type system should be in the direction of increased dynamicity and reflectivity, and should essentially produce fully native java objects (perhaps loaded via the very cool AnonymousClassLoader), which from Java code are indistinguishable from native code, and yet which offer the ability dynamically tack on implemented interfaces, extended classes, adding methods, deleting methods, etc.

The wonderful thing is that, with the advent of Java 1.7's dynamic language features, the JVM internals are basically to open to language designers, as least as far as linkage and invocation are concerned. It holds out the promise, along with libraries like asm and dynalink, that via code-generation of disposable, anonymous Classes, the developer can be presented with a type system that any particular point consists perfectly legal Java objects -- and thus usable natively --, but we can in addition do dynamic things to the object, like adding methods, extending a new class or implementing an interface, on the fly. The magic of InvokeDynamic and dynalink make accessing object fields and methods (and specialized invocation forms, like array acccess, hash map access, etc.) completely transparent (and much simpler). Moreover, the resulting invoke dynamic calls will, ideally, perform on a par with statically-linked method calls. The future of the JVM is bright indeed!

Conclusion

In conclusion, we've seen some bullet code, learned how its LineProcessor can annotate cleanly indented code with the parentheses inherent in its structure. We've seen additional attempts at banishing the poor bracket via infix grouping notations like :, |, and `$'. And you've seen bullet's stupidly simple approach to object field access and method invocation, and gotten a feeling for its approach to dynamism and reflectivity.

Future Directions

  • Write a full example program in bullet, e.g. a program that reads stat info on a unix system and displays an updating chart with system load.
  • General usability improvements, e.g. REPL, scripting, etc.
  • Move to compilation rather than interpreting
  • Rock InvokeDynamic and dynalink for a totally flexible class based object system that works natively with java and performs as close as possible to native linkage
  • Create a module system that plays ball with Java but allows treating modules more like first-class objects;
  • Experiment with gradual typing. One idea, more a postulate, is that for programming in the small, dynamic languages win, because of their flexibility. However, for programming in the large, static languages win because of the extra enforced cooperation in terms of implementing classes. The other , often overlooked, advantage of static languages is the ability of IDEs for them to make use of type information to generate relevant autocomplete hints. In recognition of this, we could allow var declarations to accept optional types, and introduce the regulation that in libraries -- (largsih) bundles of modules package for use by third-parties -- exported functionality must have type declarations (and maybe even some minimal documentation, possibly even example code, etc). Libraries would be integrated with Maven, like the very cool Leiningen. The goal is that the technical barriers to releasing code are lowered, while the type-precision and general quality and documentation requirements are raised. (One language which does this nicely is Dart).

Your Feedback Desired :)

In conclusion, I'd love to hear your feedback on bullet and its features. Do you think bullet's features merit more development and upgrade to production status? By all means, grab the source or build jars, and have a go at writing a script. If you think it sucks, and the bracketless syntax idea is a waste of time, then I'd like to hear! If you think the infix notation is a recipe for disaster at some point in the future, I'd like to hear! This idea is a bit wierd and irreverent, so feel free to tear me apart :) Thanks for your time.