Skip to content

The Magic Behind Parser Combinators

24
Mar
2009

If you’re like me, one of the first things that attracted you to Scala was its parser combinators.  Well, maybe that wasn’t the first thing for me, but it was pretty far up there.  Parser combinators make it almost too easy to create a parser for a complex language without ever leaving the comfortable play-pen afforded by Scala.  Incidentally, if you aren’t familiar with the fundamentals of text parsing, context-free grammars and/or parser generators, then you might want to do some reading before you continue with this article.

Intro to Parser Combinators

In most languages, the process of creating a text parser is usually an arduous and clumsy affair involving a parser generator (such as ANTLR, JavaCC, Beaver or <shamelessPlug>ScalaBison</shamelessPlug>) and (usually) a lexer generator such as JFlex.  These tools do a very good job of generating sources for efficient and powerful parsers, but they aren’t exactly the easiest tools to use.  They generally have a very steep learning curve, and due to their unique status as compiler-compilers, an unintuitive architecture.  Additionally, these tools can be somewhat rigid, making it very difficult to implement unique or experimental features.  For this reason alone, many real world compilers and interpreters (such as javac, ruby, jruby and scalac) actually use hand-written parsers.  These are usually easier to tweak, but they can be very difficult to develop and test.  Additionally, hand-written parsers have a tendency toward poor performance (think: the Scala compiler).

When creating a compiler in Scala, it is perfectly acceptable to make use of these conventional Java-generating tools like ANTLR or Beaver, but we do have other options.  Parser combinators are a domain-specific language baked into the standard library.  Using this internal DSL, we can create an instance of a parser for a given grammar using Scala methods and fields.  What’s more, we can do this in a very declarative fashion.  Thanks to the magic of DSLs, our sources will actually look like a plain-Jane context-free grammar for our language.  This means that we get most of the benefits of a hand-written parser without losing the maintainability afforded by parser generators like bison.  For example, here is a very simple grammar for a simplified Scala-like language, expressed in terms of parser combinators:

object SimpleScala extends RegexpParsers {
 
  val ID = """[a-zA-Z]([a-zA-Z0-9]|_[a-zA-Z0-9])*"""r
 
  val NUM = """[1-9][0-9]*"""r
 
  def program = clazz*
 
  def classPrefix = "class" ~ ID ~ "(" ~ formals ~ ")"
 
  def classExt = "extends" ~ ID ~ "(" ~ actuals ~ ")"
 
  def clazz = classPrefix ~ opt(classExt) ~ "{" ~ (member*) ~ "}"
 
  def formals = repsep(ID ~ ":" ~ ID, ",")
 
  def actuals = expr*
 
  def member = (
      "val" ~ ID ~ ":" ~ ID ~ "=" ~ expr
    | "var" ~ ID ~ ":" ~ ID ~ "=" ~ expr
    | "def" ~ ID ~ "(" ~ formals ~ ")" ~ ":" ~ ID ~ "=" ~ expr
    | "def" ~ ID ~ ":" ~ ID ~ "=" ~ expr
    | "type" ~ ID ~ "=" ~ ID
  )
 
  def expr: Parser[Expr] = factor ~ (
      "+" ~ factor
    | "-" ~ factor
  )*
 
  def factor = term ~ ("." ~ ID ~ "(" ~ actuals ~ ")")*
 
  def term = (
      "(" ~ expr ~ ")"
    | ID
    | NUM
  )
}

This is all valid and correct Scala.  The program method returns an instance of Parser[List[Class_]], assuming that Class_ is the AST class representing a syntactic class in the language (and assuming that we had added all of the boiler-plate necessary AST generation).  This Parser instance can then be used to process a java.io.Reader, producing some result if the input is valid, otherwise producing an error.

How the Magic Works

The really significant thing to notice here is that program is nothing special; just another method which returns an instance of class Parser.  In fact, all of these methods return instances of Parser.  Once you realize this, the magic behind all of this becomes quite a bit more obvious.  However, to really figure it all out, we’re going to need to take a few steps back.

Conceptually, a Parser represents a very simple idea:

Parsers are invoked upon an input stream.  They will consume a certain number of tokens and then return a result along with the truncated stream.  Alternatively, they will fail, producing an error message.

Every Parser instance complies with this description.  To be more concrete, consider the keyword parser (what I like to call the “literal” parser) which consumes a single well-defined token.  For example (note that the keyword method is implicit in Scala’s implementation of parser combinators, which is why it doesn’t appear in the long example above):

def boring = keyword("bore")

The boring method returns a value of type Parser[String].  That is, a parser which consumes input and somehow produces a String as a result (along with the truncated stream).  This parser will either parse the characters b-o-r-e in that order, or it will fail.  If it succeeds, it will return the string "bore" as a result along with a stream which is shortened by four characters.  If it fails, it will return an error message along the lines of “Expected 'bore', got 'Boer'“, or something to that effect.

By itself, such a parser is really not very useful.  After all, it’s easy enough to perform a little bit of String equality testing when looking for a well-defined token.  The real power of parser combinators is in what happens when you start combining them together (hence the name).  A few literal parsers combined in sequence can give us a phrase in our grammar, and a few of these sequences combined in a disjunction can give us the full power of a non-terminal with multiple productions.  As it turns out, all we need is the literal parser (keyword) combined with these two types of combinators to express any LL(*) grammar.

Before we get into the combinators themselves, let’s build a framework.  We will define Parser[A] as a function from Stream[Character] to Result[A], where Result[A] has two implementations: Success[A] and Failure.  The framework looks like the following:

trait Parser[+A] extends (Stream[Character]=>Result[A])
 
sealed trait Result[+A]
 
case class Success[+A](value: A, rem: Stream[Character]) extends Result[A]
 
case class Failure(msg: String) extends Result[Nothing]

Additionally, we must add a concrete parser, keyword, to handle our literals.  For the sake of syntactic compatibility with Scala’s parser combinators, this parser will be defined within the RegexpParsers singleton object (despite the fact that we don’t really support regular expressions):

object RegexpParsers {
  implicit def keyword(str: String) = new Parser[String] {
    def apply(s: Stream[Character]) = {
      val trunc = s take str.length
      lazy val errorMessage = "Expected '%s' got '%s'".format(str, trunc.mkString)
 
      if (trunc lengthCompare str.length != 0) 
        Failure(errorMessage)
      else {
        val succ = trunc.zipWithIndex forall {
          case (c, i) => c == str(i)
        }
 
        if (succ) Success(str, s drop str.length)
        else Failure(errorMessage)
      }
    }
  }
}

For those of you who are still a little uncomfortable with the more obscure higher-order utility methods in the Scala collections framework: don’t worry about it.  While the above may be a bit obfuscated, there isn’t really a need to understand what’s going on at any sort of precise level.  The important point is that this Parser defines an apply method which compares str to an equally-lengthed prefix from s, the input character Stream.  At the end of the day, it returns either Success or Failure.

The Sequential Combinator

The first of the two combinators we need to look at is the sequence combinator.  Conceptually, this combinator takes two parsers and produces a new parser which matches the first and the second in order.  If either one of the parsers produces a Failure, then the entire sequence will fail.  In terms of classical logic, this parser corresponds to the AND operation.  The code for this parser is almost ridiculously simple:

class SequenceParser[+A, +B](l: =>Parser[A], 
                             r: =>Parser[B]) extends Parser[(A, B)] {
  lazy val left = l
  lazy val right = r
 
  def apply(s: Stream[Character]) = left(s) match {
    case Success(a, rem) => right(rem) match {
      case Success(b, rem) => Success((a, b), rem)
      case f: Failure => f
    }
 
    case f: Failure => f
  }
}

This is literally just a parser which applies its left operand and then applies its right to whatever is left.  As long as both parsers succeed, a composite Success will be produced containing a tuple of the left and right parser’s results.  Note that Scala’s parser combinator framework actually yields an instance of the ~ case class from its sequence combinator.  This is particularly convenient as it allows for a very nice syntax in pattern matching for semantic actions (extracting parse results).  However, since we will not be dealing with the action combinators in this article, it seemed simpler to just use a tuple.

One item worthy of note is the fact that both left and right are evaluated lazily.  This means that we don’t actually evaluate our constructor parameters until the parser is applied to a specific stream.  This is very important as it allows us to define parsers with recursive rules.  Recursion is really what separates context-free grammars from regular expressions.  Without this laziness, any recursive rules would lead to an infinite loop in the parser construction.

Once we have the sequence combinator in hand, we can add a bit of syntax sugar to enable its use.  All instances of Parser will define a ~ operator which takes a single operand and produces a SequenceParser which handles the receiver and the parameter in order:

trait Parser[+A] extends (Stream[Character]=>Result[A]) {
  def ~[B](that: =>Parser[B]) = new SequenceParser(this, that)
}

With this modification to Parser, we can now create parsers which match arbitrary sequences of tokens.  For example, our framework so far is more than sufficient to define the classPrefix parser from our earlier snippet (with the exception of the regular expression defined in ID, which we currently have no way of handling):

def classPrefix = "class" ~ ID ~ "(" ~ formals ~ ")"

The Disjunctive Combinator

This is a very academic name for a very simple concept.  Let’s think about the framework so far.  We have both literal parsers and sequential combinations thereof.  Using this framework, we are capable of defining parsers which match arbitrary token strings.  We can even define parsers which match infinite token sequences, simply by involving recursion:

def fearTheOnes: Parser[Any] = "1" ~ fearTheOnes

Of course, this parser is absurd, since it only matches an infinite input consisting of the '1' character, but it does serve to illustrate that we have a reasonably powerful framework even in its current form.  This also provides a nice example of how the lazy evaluation of sequence parsers is an absolutely essential feature.  Without it, the fearTheOnes method would enter an infinite loop and would never return an instance of Parser.

However, for all its glitz, our framework is still somewhat impotent compared to “real” parser generators.  It is almost trivial to derive a grammar which cannot be handled by our parser combinators.  For example:

e ::= '1' | '2'

This grammar simply says “match either the '1' character, or the '2' character”.  Unfortunately, our framework is incapable of defining a parser according to this rule.  We have no facility for saying “either this or that”.  This is where the disjunctive combinator comes into play.

In boolean logic, a disjunction is defined according to the following truth table:

P Q P V Q
T T T
T F T
F T T
F F F

In other words, the disjunction is true if one or both of its component predicates are true.  This is exactly the sort of combinator we need to bring our framework to full LL(*) potential.  We need to define a parser combinator which takes two parsers as parameters, trying each of them in order.  If the first parser succeeds, we yield its value; otherwise, we try the second parser and return its Result (whether Success or Failure).  Thus, our disjunctive combinator should yield a parser which succeeds if and only if one of its component parsers succeeds.  This is very easily accomplished:

class DisParser[+A](left: Parser[A], right: Parser[A]) extends Parser[A] {
  def apply(s: Stream[Character]) = left(s) match {
    case res: Success => res
    case _: Failure => right(s)
  }
}

Once again, we can beautify the syntax a little bit by adding an operator to the Parser super-trait:

trait Parser[+A] extends (Stream[Character]=>Result[A]) {
  def ~[B](that: =>Parser[B]) = new SequenceParser(this, that)
 
  def |(that: Parser[A]) = new DisParser(this, that)
}

…is that all?

It’s almost as if by magic, but the addition of the disjunctive combinator to the sequential actually turns our framework into something really special, capable of chewing through any LL(*) grammar.  Just in case you don’t believe me, consider the grammar for the pure-untyped lambda calculus, expressed using our framework (alph definition partially elided for brevity):

object LambdaCalc extends RegexpParsers {
 
  def expr: Parser[Any] = term ~ (expr | "")
 
  def term = (
      "fn" ~ id ~ "=>" ~ expr
    | id
  )
 
  val alph = "a"|"b"|"c"|...|"X"|"Y"|"Z"
  val num = "0"|"1"|"2"|"3"|"4"|"5"|"6"|"7"|"8"|"9"
 
  def id = alph ~ idSuffix
 
  def idSuffix = (
      (alph | num) ~ idSuffix
    | ""
  )
}

While this grammar may seem a bit obfuscated, it is only because I had to avoid the use of regular expressions to define the ID rule.  Instead, I used a combination of sequential and disjunctive combinators to produce a Parser which matches the desired pattern.  Note that the “...” is not some special syntax, but rather my laziness and wish to avoid a code snippet 310 characters wide.

We can also use our framework to define some other, useful combinators such as opt and * (used in the initial example). Specifically:

trait Parser[+A] extends (Stream[Character]=>Result[A]) {
  ...
 
  def *: Parser[List[A]] = (
      this ~ * ^^ { case (a, b) => a :: b }
    | "" ^^^ Nil
  )
}
 
object RegexpParsers {
  ...
 
  def opt[A](p: Parser[A]) = (
      p ^^ { Some(_) }
    | "" ^^^ None
  )
}

Readers who have managed to stay awake to this point may notice that I’m actually cheating a bit in these definitions.  Specifically, I’m using the ^^ and ^^^ combinators.  These are the semantic action combinators which I promised to avoid discussing.  However, for the sake of completeness, I’ll include the sources and leave you to figure out the rest:

trait Parser[+A] extends (Stream[Character]=>Result[A]) { outer =>
  ...
 
  def ^^[B](f: A=>B) = new Parser[B] {
    def apply(s: Stream[Character]) = outer(s) match {
      case Success(v, rem) => Success(f(v), rem)
      case f: Failure => f
    }
  }
 
  def ^^^[B](v: =>B) = new Parser[B] {
    def apply(s: Stream[Character]) = outer(s) match {
      case Success(_, rem) => Success(v, rem)
      case f: Failure => f
    }
  }
}

In short, these combinators are only interesting to people who want their parsers to give them a value upon completion (usually an AST).  In short, just about any useful application of parser combinators will require these combinators, but since we’re not planning to use our framework for anything useful, there is really no need.

One really interesting parser from our first example which is worthy of attention is the member rule.  If you recall, this was defined as follows:

def member = (
    "val" ~ ID ~ ":" ~ ID ~ "=" ~ expr
  | "var" ~ ID ~ ":" ~ ID ~ "=" ~ expr
  | "def" ~ ID ~ "(" ~ formals ~ ")" ~ ":" ~ ID ~ "=" ~ expr
  | "def" ~ ID ~ ":" ~ ID ~ "=" ~ expr
  | "type" ~ ID ~ "=" ~ ID
)

This is interesting for two reasons.  First: we have multiple disjunctions handled in the same rule, showing that disjunctive parsers chain just as nicely as do sequential.  But more importantly, our chain of disjunctions includes two parsers which have the same prefix ("def" ~ ID).  In other words, if we attempt to parse an input of “def a: B = 42“, one of these deeply nested parsers will erroneously match the input for the first two tokens.

This grammatical feature forces us to implement some sort of backtracking within our parser combinators.  Intuitively, the "def" ~ ID parser is going to successfully match “def a“, but the enclosing sequence ("def" ~ ID ~ "(") will fail as soon as the “:” token is reached.  At this point, the parser has to take two steps back in the token stream and try again with another parser, in this case, "def" ~ ID ~ ":" ~ ID ~ "=" ~ expr.  It is this feature which separates LL(*) parsing from LL(1) and LL(0).

The good news is that we already have this feature almost by accident.  Well, obviously not by accident since I put some careful planning into this article, but at no point so far did we actually set out to implement backtracking, and yet it has somehow dropped into our collective lap.  Consider once more the implementation of the disjunctive parser:

class DisParser[+A](left: Parser[A], right: Parser[A]) extends Parser[A] {
  def apply(s: Stream[Character]) = left(s) match {
    case res: Success => res
    case _: Failure => right(s)
  }
}

Notice what happens if left fails: it invokes the right parser passing the same Stream instance (s).  Recall that Stream is immutable, meaning that there is nothing left can do which could possibly change the value of s.  Each parser is merely grabbing characters from the head of the stream and then producing a new Stream which represents the remainder.  Parsers farther up the line (like our disjunctive parser) are still holding a reference to the stream prior to these “removals”.  This means that we don’t need to make any special effort to implement backtracking, it just falls out as a natural consequence of our use of the Stream data structure.  Isn’t that nifty?

Conclusion

Parser combinators are an incredibly clever bit of functional programming.  Every time I think about them, I am once again impressed by the ingenuity of their design and the simple elegance of their operation.  The fact that two combinators and a single parser can encode the vast diversity of LL(*) grammars is simply mind-boggling.  Despite their simplicity, parser combinators are capable of some very powerful parsing in a very clean and intuitive fashion.  That to me is magical.

Interop Between Java and Scala

9
Feb
2009

Sometimes, the simplest things are the most difficult to explain.  Scala’s interoperability with Java is completely unparalleled, even including languages like Groovy which tout their tight integration with the JVM’s venerable standard-bearer.  However, despite this fact, there is almost no documentation (aside from chapter 29 in Programming in Scala) which shows how this Scala/Java integration works and where it can be used.  So while it may not be the most exciting or theoretically interesting topic, I have taken it upon myself to fill the gap.

Classes are Classes

The first piece of knowledge you need about Scala is that Scala classes are real JVM classes.  Consider the following snippets, the first in Java:

public class Person {
    public String getName() {
        return "Daniel Spiewak";
    }
}

…and the second in Scala:

class Person {
  def getName() = "Daniel Spiewak"
}

Despite the very different syntax, both of these snippets will produce almost identical bytecode when compiled.  Both will result in a single file, Person.class, which contains a default, no-args constructor and a public method, getName(), with return type java.lang.String.  Both classes may be used from Scala:

val p = new Person()
p.getName()       // => "Daniel Spiewak"

…and from Java:

Person p = new Person();
p.getName();      // => "Daniel Spiewak"

In the case of either language, we can easily swap implementations of the Person class without making any changes to the call-site.  In short, you can use Scala classes from Java (as well as Java classes from Scala) without ever even knowing that they were defined within another language.

This single property is the very cornerstone of Scala’s philosophy of bytecode translation.  Wherever possible — and that being more often than not — Scala elements are translated into bytecode which directly corresponds to the equivalent feature in Java.  Scala classes equate to Java classes, methods and fields within those classes become Java methods and fields.

This allows some pretty amazing cross-language techniques.  For example, I can extend a Java class within Scala, overriding some methods.  I can in turn extend this Scala class from within Java once again with everything working exactly as anticipated:

class MyAbstractButton extends JComponent {
  private var pushed = false
 
  def setPushed(p: Boolean) {
    pushed = p
  }
 
  def getPushed = pushed
 
  override def paintComponent(g: Graphics) {
    super.paintComponent(g)
 
    // draw a button
  }
}
public class ProKitButton extends MyAbstractButton {
    // do something uniquely Apple-esque
}

Traits are Interfaces

This is probably the one interoperability note which is the least well-known.  Scala’s traits are vastly more powerful than Java’s interfaces, often leading developers to the erroneous conclusion that they are incompatible.  Specifically, traits allow method definitions, while interfaces must be purely-abstract.  Yet, despite this significant distinction, Scala is still able to compile traits into interfaces at the bytecode level…with some minor enhancements.

The simplest case is when the trait only contains abstract members.  For example:

trait Model {
  def value: Any
}

If we look at the bytecode generated by compiling this trait, we will see that it is actually equivalent to the following Java definition:

public interface Model {
    public Object value();
}

Thus, we can declare traits in Scala and implement them as interfaces in Java classes:

public class StringModel implements Model {
    public Object value() {
        return "Hello, World!";
    }
}

This is precisely equivalent to a Scala class which mixes-in the Model trait:

class StringModel extends Model {
  def value = "Hello, World!"
}

Things start to get a little sticky when we have method definitions within our traits.  For example, we could add a printValue() method to our Model trait:

trait Model {
  def value: Any
 
  def printValue() {
    println(value)
  }
}

Obviously, we can’t directly translate this into just an interface; something else will be required.  Scala solves this problem by introducing an ancillary class which contains all of the method definitions for a given trait.  Thus, when we look at the translation for our modified Model trait, the result looks something like this:

public interface Model extends ScalaObject {
    public Object value();
 
    public void printValue();
}
 
public class Model$class {
    public static void printValue(Model self) {
        System.out.println(self.value());
    }
}

Thus, we can get the effect of Scala’s powerful mixin inheritance within Java by implementing the Model trait and delegating from the printValue() method to the Model$class implementation:

public class StringModel implements Model {
    public Object value() {
        return "Hello, World!";
    }
 
    public void printValue() {
        Model$class.printValue(this);
    }
 
    // method missing here (see below)
}

It’s not perfect, but it allows us to use some of Scala’s more advanced trait-based functionality from within Java.  Incidentally, the above code does compile without a problem.  I wasn’t actually aware of this fact, but “$” is a legal character in Java identifiers, allowing interaction with some of Scala’s more interesting features.

There is, however, one little wrinkle that I’m conveniently side-stepping: the $tag method.  This is a method defined within the ScalaObject trait designed to help optimize pattern matching.  Unfortunately, it also means yet another abstract method which must be defined when implementing Scala traits which contain method definitions.  The correct version of the StringModel class from above actually looks like the following:

public class StringModel implements Model {
    public Object value() {
        return "Hello, World!";
    }
 
    public void printValue() {
        Model$class.printValue(this);
    }
 
    public int $tag() {
        return 0;
    }
}

To be honest, I’m not sure what is the “correct” value to return from $tag.  In this case, 0 is just a stub, and I’m guessing a safe one since StringModel is the only subtype of Model.  Can anyone who knows more about the Scala compiler shed some light on this issue?

Generics are, well…Generics

Generics are (I think) probably the coolest and most well-done part of Scala’s Java interop.  Anyone who has more than a passing familiarity with Scala will know that its type system is significantly more powerful than Java’s.  Some of this power comes in the form of its type parameterization, which is vastly superior to Java’s generics.  For example, type variance can be handled at declaration-site, rather than only call-site (as in Java):

abstract class List[+A] {
  ...
}

The + notation prefixing the A type parameter on the List class means that List will vary covariantly with its parameter.  In English, this means that List[String] is a subtype of List[Any] (because String is a subtype of Any).  This is a very intuitive relationship, but one which Java is incapable of expressing.

Fortunately, Scala is able to exploit one of the JVM’s most maligned features to support things like variance and higher-kinds without sacrificing perfect Java interop.  Thanks to type erasure, Scala generics can be compiled to Java generics without any loss of functionality on the Scala side.  Thus, the Java translation of the List definition above would be as follows:

public abstract class List<A> {
    ...
}

The variance annotation is gone, but Java wouldn’t be able to make anything of it anyway.  The huge advantage to this translation scheme is it means that Java’s generics and Scala’s generics are one and the same at the bytecode level.  Thus, Java can use generic Scala classes without a second thought:

import scala.Tuple2;
 
...
Tuple2<String, String> me = new Tuple2<String, String>("Daniel", "Spiewak");

Obviously, this is a lot more verbose than the Scala equivalent, “("Daniel", "Spiewak")“, but at least it works.

Operators are Methods

One of the most obvious differences between Java and Scala is that Scala supports operator overloading.  In fact, Scala supports a variant of operator overloading which is far stronger than anything offered by C++, C# or even Ruby.  With very few exceptions, any symbol may be used to define a custom operator.  This provides tremendous flexibility in DSLs and even your average, every-day API (such as List and Map).

Obviously, this particular language feature is not going to translate into Java quite so nicely.  Java doesn’t support operator overloading of any variety, much less the über-powerful form defined by Scala.  Thus, Scala operators must be compiled into an entirely non-symbolic form at the bytecode level, otherwise Java interop would be irreparably broken, and the JVM itself would be unable to swallow the result.

A good starting place for deciding on this translation is the way in which operators are declared in Scala: as methods.  Every Scala operator (including unary operators like !) is defined as a method within a class:

abstract class List[+A] {
  def ::[B >: A](e: B) = ...
 
  def +[B >: A](e: B) = ...
}

Since Scala classes become Java classes and Scala methods become Java methods, the most obvious translation would be to take each operator method and produce a corresponding Java method with a heavily-translated name.  In fact, this is exactly what Scala does.  The above class will compile into the equivalent of this Java code:

public abstract class List<A> {
    public <B super A> List<B> $colon$colon(B e) { ... }
 
    public <B super A> List<B> $plus(B e) { ... }
}

Every allowable symbol in Scala’s method syntax has a corresponding translation of the form “$trans“.  A list of supported translations is one of those pieces of documentation that you would expect to find on the Scala website.  However, alas, it is absent.  The following is a table of all of the translations of which I am aware:

Scala Operator Compiles To
= $eq
> $greater
< $less
+ $plus
- $minus
* $times
/ div
! $bang
@ $at
# $hash
% $percent
^ $up
& $amp
~ $tilde
? $qmark
| $bar
\ $bslash
: $colon

Using this table, you should be able to derive the “real name” of any Scala operator, allowing its use from within Java.  Of course, the idea solution would be if Java actually supported operator overloading and could use Scala’s operators directly, but somehow I doubt that will happen any time soon.

Odds and Ends

One final tidbit which might be useful: @BeanProperty.  This is a special annotation which is essentially read by the Scala compiler to mean “generate a getter and setter for this field”:

import scala.reflect.BeanProperty
 
class Person {
  @BeanProperty
  var name = "Daniel Spiewak"
}

The need for this annotation comes from the fact that Scala’s ever-convenient var and val declarations actually generate code which looks like the following (assuming no @BeanProperty annotation):

// *without* @BeanProperty
public class Person {
    private String name = "Daniel Spiewak";
 
    public String name() {
        return name;
    }
 
    public void name_$eq(String name) {
        this.name = name;
    }
}

This works well from Scala, but as you can see, Java-land is not quite paradise.  While it is certainly feasible to use the _$eq syntax instead of the familiar set/get/is triumvirate, it is not an ideal situation.

Adding the @BeanProperty annotation (as we have done in the earlier Scala snippet) solves this problem by causing the Scala compiler to auto-generate more than one pair of methods for that particular field.  Rather than just value and value_$eq, it will also generate the familiar getValue and setValue combination that all Java developers will know and love.  Thus, the actual translation resulting from the Person class in Scala will be as follows:

public class Person {
    private String name = "Daniel Spiewak";
 
    public String name() {
        return name;
    }
 
    public String getName() {
        return name();
    }
 
    public void name_$eq(String name) {
        this.name = name;
    }
 
    public void setName(String name) {
        name_$eq(name);
    }
}

This merely provides a pair of delegates, but it does suffice to smooth out the mismatch between Java Bean-based frameworks and Scala’s elegant instance fields.

Conclusion

This has been a whirlwind, disjoint tour covering a fairly large slice of information on how to use Scala code from within Java.  For the most part, things are all roses and fairy tales.  Scala classes map precisely onto Java classes, generics work perfectly, and pure-abstract traits correspond directly to Java interfaces.  Other areas where Scala is decidedly more powerful than Java (like operators) do tend to be a bit sticky, but there is always a way to make things work.

If you’re considering mixing Scala and Java sources within your project, I hope that this article has smoothed over some of the doubts you may have had regarding its feasibility.  As David Pollack says, Scala is really “just another Java library”.  Just stick scala-library.jar on your classpath and all of your Scala classes should be readily available within your Java application.  And given how well Scala integrates with Java at the language level, what could be simpler?

The Joy of Concatenative Languages Part 3: Kindly Types

22
Dec
2008

In parts one and two of this series, we dipped our toes into the fascinating world that is stack-based languages.  By this point, you should be fairly familiar with how to construct simple algorithms using Cat (the language we have been working with) as well as the core terminology of the paradigm.  In fact, with just the information given so far, you could probably go on to be productive with a real-world concatenative language like Factor.  However, the interest does not just stop there…

One of the interesting challenges in programming language design is the construction of a type system.  So as to clear up any possible misconception before it arises, this is how Pierce defines such a thing:

A type system is a tractable syntactic method for proving the absence of certain program behaviors by classifying phrases according to the kinds of values they compute.

For Java, which has a comparatively weak type system, this usually means preventing you from accidentally using a String as if it were an int.  In other words, Java’s type system generally proves the absence of things like NoSuchMethodError and similar.  C#, which has a slightly more-powerful type system, can also prove the absence of most NullPointerException(s) when code is written in a correct and idiomatic fashion.  Scala goes even further with pattern matching…need I go on?  The point is that type systems do different things in different languages, so the definition needs to be flexible enough to reflect that.

In this article, we’re going to look at how we can define a type system for a functional (meaning that we have quotations) concatenative language.  In a comment on the first part of this series, it was suggested that the task of typing stack-based languages is a fairly trivial one.  This is true, but only to a certain point.  As we will see, there are dragons lurking in the conceptual shadows, waiting for us to disturb their sleep.

Simple Expressions

Let’s start out with typing something simple.  Consider the following program:

 

For those of you reading the RSS, what you see between the previous paragraph and this is exactly what I intended to write: nothing at all.  In a concatenative language, the empty program is usually considered to be valid.  After all, it takes a stack as input and returns the exact same stack.  We could replicate the semantics of this program by writing “dup pop“, but why bother?

The empty program has the following type:

(->)

Or, more properly:

('A -> 'A)

To the left of the -< we have what I like to call the “input constraints”: what types must be on the stack coming into the program (or phrase).  To the right of the arrow are the “output constraints”: what types will be on the stack when we’re done.  For reasons which will become clear later on, 'A in this case represents the whole input stack (regardless of what it contains).  Since we never change anything on the stack (the program is, after all, empty), the output stack has whatever type the input stack was given.  Another way of writing this type would be as follows:

* -> *

This literally symbolizes our intuition that the empty program has no input or output constraints.  However, this is somewhat less correct notationally since it implies that the input and output stacks are unrelated.  In fact, I would go so far as to say that this notation is wrong.  The only reason it is produced here is to serve as a memory aid.  For the remainder of the article, we will be using Cat’s notation for types.

Let’s look at something a little less trivial.  Consider the following program one word at a time:

1 2 +

Remember that an integer literal (or any literal for that matter) is just a function which pushes a specific constant onto the stack.  Let’s assign types based on what we expect the input/output constraints of these functions to be.  Note: I will be using the colon (:) notation to denote a type.  This isn’t conventional coming from C-land, but it is the gold standard of formal type theory:

1 : ('A -> 'A Int)
2 : ('A -> 'A Int)
+ : ('A Int Int -> 'A Int)

This is all very intuitive.  Integer literals work on any stack and just produce that stack with a new Int pushed onto the top.  Both 1 and 2 have the same type, which is a good sign that we’re on the right track.

The + word is a little more interesting.  Its runtime semantics are as follows: pop two integers off the stack, add them together and then push the result back on.  This word will not be able to execute without both integer values on the top of the stack.  Thus, it only makes sense that its input constraints be some stack with two values of type Int at the top.  Likewise, when we’re done, those two integers will be gone and a new Int will be pushed onto the remainder of the stack which was given to us.  Remember that 'A represents any stack, even if it is completely empty.

Coming back to our program, we can see that it is well typed by simply string together the types we have generated.  Starting from the top (using * to symbolize the empty stack):

Word Input Stack Output Stack
1 * * Int
2 * Int * Int Int
+ * Int Int * Int

Do you see how the input stack of each word matches the output stack of the previous?  In this case, this sort of one-to-one matching indicates that the program is well-typed, producing a final stack with a single Int on it.  If we actually run this program, we would see that the evaluation matches the assigned types.

First-Order Functions

This is fine for a simple addition program, but what if we throw functions into the mix?  Consider the same program we just analyzed wrapped up within a function:

define addSome {
  1 2 +
}
 
addSome

Here we define a function which has as a body the program we have already analyzed.  Down at the bottom of our new program, we actually call this function.  Here is the question: what type does the addSome word have?

To answer this question, look back at the table above and consider the Input Stack for the first word in concert with the Output Stack for the last.  Putting these two types together yields the following type for the aggregated whole:

1 2 + : ('A -> 'A Int)

These words (or “phrase”) takes any stack as input, and then through some manipulation produces a single Int on top of that stack as a result.  The stack may grow and shrink within the function, but at the end of the day, only the Int remains.  As we would expect, this matches the runtime semantics perfectly.

Given the fact that the phrase “1 2 +” has the type ('A -> 'A Int), it is reasonable to assign that same type to the function which contains it.  Thus, we can type-check the addSome program in a simple, one-row table:

Word Input Stack Output Stack
addSome * * Int

At the start of execution, the input stack to any program is *, or the empty stack.  However, this is fine with our type checker, since the program has 'A — or any stack — for its input parameters.

This is all so nice and intuitive, so let’s consider the case where we have a function which actually takes some parameters.  Specifically, let’s consider the following definition:

define addTwice {
  + +
}

At runtime, this function will take three values off the stack and then add them all together.  It is the Cat equivalent of the following in Scala:

def addTwice(a: Int, b: Int, c: Int) = a + b + c

The question is: how do we assign this (the Cat function) a type?  As we have done before, let’s look at the types of the individual words:

+ : ('A Int Int -> 'A Int)
+ : ('A Int Int -> 'A Int)

Not much help there.  Let’s try making a table:

Word Input Stack Output Stack
+ * Int Int * Int
+ * Int Int * Int

It’s tempting to look at this and just assign addTwice the type of ('A Int Int -> 'A Int).  However, this would be a mistake.  Notice the problem with our table above: the Input Stack type of the second word does not match the Output Stack of the first.  In other words, this program does not immediately type-check.

The problem is the second word is accessing more of the stack than the first.  We’re effectively “deferring” a parameter access until later in the function, rather than grabbing everything right away and threading the processing through from start to finish.  This is a perfectly reasonable pattern, but it plays havoc with our naive type system.

The solution is to merge the input constraints across both words.  The first word (+) requires two Int(s) to be on the top of the stack.  When it is done, those Int(s) are gone and a single Int has taken their place.  The second word (again +) also requires two Int(s) on the stack.  We only have one that we know of (the output Int from the first word), so we must unify the constraints and merge things back “up the chain” as it were.  In other words, our first word (+) will require not just two Int(s) on the stack but three: two for itself and one for the second word (+).  Our corrected table will look something like the following:

Word Input Stack Output Stack
+ * Int Int Int * Int Int
+ * Int Int * Int

With this new table, all of the Input and Output stacks match, which means that the type is valid and can prove runtime evaluation.  Thus, based on this whole song and dance, we can assign the following type:

addTwice : ('A Int Int Int -> 'A Int)

As expected, this function takes not two, but three Int(s) on the stack and returns the remainder of that stack with a new Int on top.

Polymorphic Words

One mildly-annoying issue that we have just skated over is the problem of polymorphism.  Consider the following two programs:

42 pop

And this…

"fourty-two" pop

The question is: what type do we assign to pop?  We can easily make the following two assertions:

42           : ('A -> 'A Int)
"fourty-two" : ('A -> 'A String)

If we attempt to use this information to type-check the first program (assuming that it is sound), we will arrive at the following type for pop:

pop : ('A Int -> 'A)

That’s intuitive, right?  All that we’re doing here is taking the first value off of the stack (an Int, in the case of the first program) and throwing it away, returning the remainder of the stack.  However, if we use this type, we will run into some serious troubles type-checking the second program:

Word Input Stack Output Stack
"fourty-two" * * String
pop * Int *

Since pop has type ('A Int -> 'A) (as we asserted above), it is inapplicable to a stack with String on top.  Note that we can’t just push these constraints “up the chain”, since it is a case of direct type mismatch, rather than a stack of insufficient depth.  In short: we’re stuck.

The only way to solve this problem is to introduce the concept of parametric types.  Literally, we need to define a type which can be instantiated against a given stack, regardless of what type happens to match the parameters in question.  Java calls this concept “generics”.  Rather than giving pop the overly-restrictive type of ('A Int -> 'A), we will instead allow the value on top of the stack to be of any type (not just Int):

pop : ('A 'a -> 'A)

Note the fact that 'A and 'a are very separate type variables in this snippet.  'A represents the “rest of the stack”, while 'a represents a specific type which just happens to be on top of the input stack.  Using this new, more flexible type, we can produce tables for both of our earlier programs:

Word Input Stack Output Stack
42 * * Int
pop * Int *

 

Word Input Stack Output Stack
"fourty-two" * * String
pop * String *

Everything matches and the world is once again very happy.  Note that we can also apply this parametric type concept to the slightly more interesting example of dup:

dup : ('A 'a -> 'A 'a 'a)

In other words, dup says that whatever type is on top of the stack when it starts, that type will be on top of the stack twice when it is finished.  Just like pop, this type can be instantiated against any stack with at least one type, regardless of whether that type is Int, String, or anything else for that matter.

Higher-Order Functions

We’ve seen how to type-check simple phrases, as well as first-order functions with deferred stack access and the occasional polymorphic word.  However, there is one particularly troublesome aspect of concatenative type systems which we have completely ignored: functions which take quotations off the stack.  In other words: what type do we assign to apply?  Consider the following function:

define trouble {
  apply
}

At runtime, trouble will pop a quotation and then evaluate it against the remainder of the stack.  Intuitively, we need to have some way of representing the type of a quotation, but that’s not even the most serious problem.  Somehow, we need to constrain the quotation to itself accept exactly the stack which remains after it is popped.  We also need to find some way of capturing its output type in order to compute the final output type of trouble.

More concretely, we can make a first attempt at assigning a type for trouble.  The underscores (_) illustrate an area where our type system is incapable of helping us:

trouble : (_ (_ -> _) -> _)

It’s very tempting to just throw an 'A in there and be done with it, but the truth is that for this type expression, there is no “unused stack”.  We don’t really know how much (or how little) of the stack will be used by the quotation; it could pop five elements, twenty or none at all.  It literally needs access to the remainder of the input stack in its entirety, otherwise the expression is useless.  Enter stack polymorphism…

Just as we needed a way to represent any single type in order to type-check pop and dup, we now need a way to represent any stack type in order to type-check apply.  Fortunately, the answer is already nestled within our pre-established notation.  Consider the type of +:

+ : ('A Int Int -> 'A Int)

We have been taking this to mean “any stack with two Int(s) on top resulting in that same stack with only one Int“.  This is true, but we’re being a little hand-wavy about the meaning of “any stack” and how it relates to 'A.  When we really get down to it, what’s happening here is 'A is being instantiated against a particular input stack, whatever that stack happens to be.  When we were type-checking + +, the first word instantiated 'A not to mean the empty stack (*), but rather a stack with at least one Int on it.  This was required to successfully type the second +.

We can very easily extend this notational convenience to represent generalized stack parameters.  Rather than being instantiated to specific types, stack parameters are instantiated to some stack in its entirety.  Just as with type parameters, wherever we see that instantiated stack parameter within a type expression, it will be replaced with whatever stack type it was assigned.  Thus, we can assign trouble the following type:

trouble : ('A ('A -> 'B) -> 'B)

In other words, trouble takes some stack A which has a quotation on top.  This quotation accepts stack A itself and returns some new stack B.  Note that we don’t really know anything about B.  It could be related to A, but it might not be.  The final result of the whole expression is this new stack B.

This concept is remarkably powerful.  With it in combination with the other types we have already examined, we can type check the entirety of Cat and be assured of the absence of type-mismatch and stack-underflow errors.  Considering the fact that Cat is almost exactly as powerful as Joy, that’s a pretty impressive feat.

From a theoretical standpoint, things get even more interesting when we consider the type of the following function:

define y {
  [dup papply] swap compose dup apply
}

This has the following type:

y : ('A ('A ('A -> 'B) -> 'B) -> 'B)

As you may have guessed by the name, this is the Y-combinator1, one of the most well-known mechanisms for producing recursion in a nameless system.  Note that this definition looks a little different from the pure-untyped lambda calculus (call-by-name semantics):

λf . (λx . f (x x)) (λx . f (x x))

What I’m trying to point out here is the fact that Cat is able to leverage its type system to assign a type to the Y-combinator.  This is something which is literally impossible in System F, a typed form of lambda-calculus.  In fact, the only way to type-check this function in a lambda-calculus-derivative system would be to add recursive types.  Cat is able to get by with a very much non-recursive type definition, something which I find fascinating in the extreme.

Update: The above paragraph is somewhat misleading.  It turns out that Cat actually does use a recursive type under the surface to derive the non-recursive type for y.  Specifically:

dup papply : ('A ('B ('B self -> 'C) -> 'C) -> 'A ('B -> 'C))

On a further theoretical note, the device in Cat’s type system which allows this power is in fact the stack type variable (e.g. 'A).  These stack types are conceptually quite similar to the type parameters we used in typing pop (e.g. 'a), but still in a very separate domain.  In fact, stack types have a different kind than regular types.  This is not to say that Cat employs higher-kinds such as Scala’s (e.g. * => *), but it does have two very different type kinds: stacks and values.

And yet, it is not kinds in and of themselves which allows for the typing of the Y-combinator.  Fω is essentially System F with higher-kinds, and yet it is still incapable of handling this tiny little expression.  Most interesting indeed…

Conclusion

As you can see, type systems and concatenative languages do fit together nicely, but it takes a lot more effort than one would initially expect.  While typing simple expressions is easy enough, the waters are muddied as soon as higher-order functions and even deferred stack access enters the mix.  This is an extremely fertile area for research, where a lot of interesting ideas are being developed.  For example, John Nowak’s 5th attempts to apply a type system to the stack-based paradigm, but in a very different way than Cat.

I hope you enjoyed this mini-series of articles on concatenative languages.  While they are a bit of a backwater in the programming language menagerie, I think that studying them can be a very instructive experience.  Furthermore, there remain some problems that are very nicely expressed in languages like Cat while being extremely unwieldy in more conventional languages like Scala.  Despite the obscurity of concatenative languages, it never hurts to have an extra language on hand, ready for those times when it really is the best tool for the job.

1 Technically, this is a little different from the Y-combinator used in conventional lambda-calculus (it executes the quotation rather than returning a fixed-point).  However, conceptually it is the same idea.

The Joy of Concatenative Languages Part 2: Innately Functional

15
Dec
2008

In part one of this series, I introduced the concept of a stack-based language and in particular the syntax and rough ideas behind Cat.  However, to anyone coming into concatenative land for the first time, my examples likely seemed both odd and unconvincing.  After all, why would you ever use point-free programming when everyone else seems to be sold on the idea of name binding?  More importantly, where do these languages fit in with our established menagerie of language paradigms?

The answer to the first question really depends on the situation.  I personally think that the best motivation for concatenative languages is their syntax.  If you want to create an internal DSL, there will be no language better suited to it than one which is concatenative, Cat, Factor or otherwise.  This is because stack-oriented languages can get away with almost no syntax whatsoever.  They say that Lisp is a syntax-free language, but this holds even more strongly for languages like Cat.  Well, that and you don’t have to deal with all the parentheses…

The second question is (I think) the more interesting one: how do we classify these languages and what sort of methodologies should we apply?  At first glance, Cat (and other languages like it) seem to be quite imperative in nature.  After all, you have a single mutable stack that any function can modify.  However, if you turn your head sideways and blink twice, you begin to realize that concatenative languages are really much closer to the functional side of the oyster.

Consider the following Cat program:

define plus { + }
define minus { - }
 
7 2 3
plus minus

Trivial, but to the point.  This program first adds the numbers 2 and 3, then subtracts the result from 7.  Thus, the final result is a value of 2 on the stack.  The only twist is that we have defined functions plus and minus to do the dirty work for us.  This wasn’t strictly necessary, but I wanted to emphasize that + and - really are functions.  We could express the exact same program in Scala:

def plus(a: Int, b: Int) = a + b
def minus(a: Int, b: Int) = a - b
 
minus(7, plus(2, 3))

Do you see how the consecutive invocations of plus and minus in Cat became composed invocations in Scala?  This is where the term “concatenative language” derives from: the whole program is just a series of function compositions.  Wikipedia’s article on Cat has a very nice, mathematical description:

Two adjacent terms in Cat imply the composition of functions that generate stacks, so the Cat program f g is equivalent to the mathematical expressions and , where x is the stack input to the expression.

Strictly speaking, a concatenative language could be implemented without a stack, but such an implementation would likely be a bit harder to use than the average stack-based language.

Coming back to my original premise: concatenative languages are functional in nature.  Absolutely everything in Cat is a function.  Operators, words, even numeric literals like “3” are actually functions at the conceptual level.  Additionally, Cat, Joy and Factor all offer a mechanism for treating functions as first-class values:

2 3
[ + ]
apply

The square-bracket ([]) syntax is representative of a quotation.  Literally this mean “create a function of the enclosed words and place it as a value on the stack”.  We can pop this function off the stack and invoke it by using the apply word.  Incidentally, you may have noticed that this syntax is remarkably close to that which is used in if conditionals:

5 0 <
[ "strange math" ]
[ "all is well" ]
if

This syntax works because if isn’t conceptually a language primitive: it’s just another function which happens to take a boolean and two quotations off the stack.  For the sake of efficiency, Cat does indeed implement if as a primitive, but this was a deliberate optimization rather than an implementation forced by the language design.  Untyped Cat (see Part 3) is equivalent in power to the pure-untyped lambda calculus, and as our friend Alonzo Church showed us, if-style conditionals are easily accomplished:

TRUE = λa . λb . a
FALSE = λa . λb . b

IF = λp . λt . λe . p t e

Yeah, maybe we’re drifting a bit off-point here…

Higher-Order Programming

So if Cat is just another functional programming language, then we should be able to implement all of those higher-order design patterns that we’ve come to know and love in languages like Scala and ML.  To see how, let’s look at implementing some simple list manipulation functions in Cat.  The easiest would be to start with append, which pops two lists off of the stack and pushes a new list which is the end-to-end concatenation of the two originals:

define append {
  empty
  [ pop ]
  [
    uncons
    [append] dip
    cons
  ]
  if
}

This function first starts by checking to see if the top list is empty.  If so, then just pop it off the stack and leave the other right where it is.  Appending an empty list should always yield the original list.  However, if the head list is not empty, then we need to work a bit.  First, we decompose it into its tail and head, which are pushed onto the stack in order by the uncons function.  Next, we need to recursively append the tail with our second list on the stack.  However, the head of the list from uncons is in the way on top of the stack.  We could use stack manipulation to move things around and get our lists up to the head of the stack, but dip provides us with a handy, higher-order shortcut.  We temporarily remove the top of the stack, invoke the quotation “[append]” against the remainder and then push the old top back on top of the result.

The dip operation is surprisingly powerful, making it possible to completely live without either variables or multiple stacks.  Any non-trivial Cat program will need to make use of this handy function at some level.

Once we have the old head and the new appended-list on the stack, all we need to do is put them back together using cons.  This function leaves a new list on the stack in place of the old list and head element.  This Cat program is almost precisely analogous to the following ML:

fun append ls nil = ls
  | append ls (hd :: tail) = hd :: (append ls tail)

Personally, I find the ML a lot easier to read, but that’s just me.  Obviously it’s a lot shorter, but as it turns out, our Cat implementation, while intuitive, was sub-optimal.  Cat already implements append in the guise of the cat function, and it is far more concise than what I showed:

define cat {
  swap [cons] rfold
}

It’s almost frightening how short this is: only three words.  It’s not as if rfold is doing anything mysterious either; it’s just a simple right-fold function that takes a list, an initial value and a quotation, producing a result by traversing the entire list.  We can use something similar back in ML-land, achieving an implementation which is arguably equivalent in subjective elegance:

val append = foldr (op::)

Moving on, we can also implement a length function in Cat, this time using fold to tighten things up:

define length {
  0 [ pop 1 + ] fold
}

You’ll notice that we have to mess around a bit in the quotation in order to avoid the first “parameter”, the current element of the list (which we do not need).  Expressing this in ML yields a very similar degree of cruft:

val length = foldl (fn (n, _) => n + 1) 0

Conclusion

The important take-away from this tangled morass of an article is the fact that Cat is a highly functional language, capable of easily keeping up with some of the stalwart champions of the paradigm.  More significantly, this is a trait which is shared by all concatenative languages.  Rather than throwing away all of the old wisdom learned in language design, stack-based languages build on it by providing an alternative view into the world of functions.

In the next (and final) article of the series, we will take a brief look at the challenges of applying a type system to a concatenative language and the fascinating techniques used by Cat to achieve just that.

The Joy of Concatenative Languages Part 1

8
Dec
2008

Concatenative languages like Forth have been around for a long time.  Hewlett-Packard famously employed a stack-based language called “RPL” on their HP-28 and HP-48 calculators, bringing the concept of Reverse Polish Notation to the mainstream…or as close to the mainstream as a really geeky toy can get.  Surprisingly though, these languages have not seen serious adoption beyond the experimental and embedded device realms.  And by “adoption”, I mean real programmers writing real code, not this whole interpreted bytecode nonsense.

This is a shame, because stack-based languages have a remarkable number of things to teach us.  Their superficial distinction from conventional programming languages very quickly gives way to a deep connection, particularly with functional languages.  However, if we dig even deeper, we find that this similarity has its limits.  There are some truly profound nuggets of truth waiting to be uncovered within these murky depths.  Shall we?

Trivial aside: I’m going to use the terms “concatenative” and “stack-based” interchangeably through the article.  While these are most definitely related concepts, they are not exactly synonyms.  Bear that in mind if you read anything more in-depth on the subject.

The Basics

Before we look at some of those “deeper truths of which I speak, it might be helpful to at least understand the fundamentals of stack-based programming.  From Wikipedia:

The concatenative or stack-based programming languages are ones in which the concatenation of two pieces of code expresses the composition of the functions they express. These languages use a stack to store the arguments and return values of operations.

Er, right.  I didn’t find that very helpful either.  Let’s try again…

Stack-based programming languages all share a common element: an operand stack.  Consider the following program:

2

Yes, this is a real program.  You can copy this code and run/compile it unmodified using most stack-based languages.  However, for reasons which will become clear later in this series, I will be using Cat for most of my examples.  Joy and Factor would both work well for the first two parts, but for part three we’re going to need some rather unique features.

Returning to our example: all this will do is take the numeric value of 2 and push it onto the operand stack.  Since there are no further words, the program will exit.  If we want, we can try something a little more interesting:

2 3 +

This program first pushes 2 onto the stack, then 3, and finally it pops the top two values off of the stack, adds them together and pushes the result.  Thus, when this program exits, the stack will only contain 5.

We can mix and match these operations until we’re blue in the face, but it’s still not a terribly interesting language.  What we really need is some sort of flow control.  To do that, we need to understand quotations.  Consider the following Scala program:

val plus = { (x: Int, y: Int) => x + y }
plus(2, 3)

Notice how rather than directly adding 2 and 3, we first create a closure/lambda which encapsulates the operation.  We then invoke this closure, passing 2 and 3 as arguments.  We can emulate these exact semantics in Cat:

2 3
[ + ]
apply

The first line pushes 2 and 3 onto the stack.  The second line uses square brackets to define a quotation, which is Cat’s version of a lambda.  Note that it isn’t really a closure since there are no variables to enclose.  Joy and Factor also share this construct.  Within the quotation we have a single word: +.  The important thing is the quotation itself is what is put on the stack; the + word is not immediately executed.  This is exactly how we declared plus in Scala.

The final line invokes the apply word.  When this executes, it pops one value off the stack (which must be a quotation).  It then executes this quotation, giving it access to the current stack.  Since the quotation on the head of the stack consists of a single word, +, executing it will result in the next two elements being popped off (2 and 3) and the result (5) being pushed on.  Exactly the same result as the earlier example and the exact same semantics as the Scala example, but a lot more concise.

Cat also provides a number of primitive operations which perform their dirty work directly on the stack.  These operations are what make it possible to reasonably perform tasks without variables.  The most important operations are as follows:

  • swap — exchanges the top two elements on the stack.  Thus, 2 3 swap results in a stack of “3 2” in that order.
  • pop — drops the first element of the stack.
  • dup — duplicates the first element and pushes the result onto the stack.  Thus, 2 dup results in a stack of “2 2“.
  • dip — pops a quotation off the stack, temporarily removes the next item, executes the quotation against the remaining stack and then pushes the old head back on.  Thus, 2 3 1 [ + ] dip results in a stack of “5 1“.

There are other primitives, but these are the big four.  It is possible to emulate any control structure (such as if/then) just using the language shown so far.  However, to do so would be pretty ugly and not very useful.  Cat does provide some other operations to make life a little more interesting.  Most significantly: functions and conditionals.  A function is defined in the following way:

define plus { + }

Those coming from a programming background involving variables (that would be just about all of us) would probably look at this function and feel as if something is missing.  The odd part of this is there is no need to declare parameters, all operands are on the stack anyway, so there’s no need to pass anything around explicitly.  This is part of why concatenative languages are so extraordinarily concise.

Conditionals also look quite weird at first glance, but under the surface they are profoundly elegant:

2 3 plus    // invoke the `plus` function
10 <
[ 0 ]
[ 42 ]
if

Naturally enough, this code pushes 0 onto the stack.  The conditional for an if is just a boolean value pushed onto the stack.  On top of that value, if will expect to find two quotations, one for the “then” branch and the other for the “else” branch.  Since 5 is less than 10, the boolean value will be True.  The if function (and it could just as easily be a function) pops the quotations off of the stack as well as the boolean.  Since the value is True, it discards the second quotation and executes the first, producing 0 on the stack.

I’ll leave you with the more complicated example of the factorial function:

define fac {
  dup 0 eq
  [ pop 1 ]
  [ dup 1 - fac * ]
  if
}

Note that this isn’t even the most concise way of writing this, but it does the job.  To see how, let’s look at how this will execute word-by-word (assuming an input of 4):

Stack Word
4

dup
4 4

0
4 4 0

eq
4 False

[ pop 1 ]
4 False [pop 1]

[ dup 1 - fac * ]
4 False [pop 1] [dup 1 - fac *]

if
4

dup
4 4

1
4 4 1

-
4 3

fac (assume magic recursion)
4 6

*

The final result is 24, a value which is left on the stack.  Pretty nifty, eh?

Conclusion

You’ll notice this is a shorter post than I usually spew forth (no pun intended…this time).  The reason being that I want this to be fairly easy to digest.  Concatenative languages (and Cat in particular) are not all that difficult to digest.  They are a slightly different way of thinking about programming, but as we will see in the next part, not so different as it would seem.

Note: Cat is written in C# and is available under the MIT License.  Don’t fear the CLR though, Cat runs just fine under Mono.  If you really want to experiment with no risk to yourself, a Javascript interpreter is available.