Skip to content

The Joy of Concatenative Languages Part 1


Concatenative languages like Forth have been around for a long time.  Hewlett-Packard famously employed a stack-based language called “RPL” on their HP-28 and HP-48 calculators, bringing the concept of Reverse Polish Notation to the mainstream…or as close to the mainstream as a really geeky toy can get.  Surprisingly though, these languages have not seen serious adoption beyond the experimental and embedded device realms.  And by “adoption”, I mean real programmers writing real code, not this whole interpreted bytecode nonsense.

This is a shame, because stack-based languages have a remarkable number of things to teach us.  Their superficial distinction from conventional programming languages very quickly gives way to a deep connection, particularly with functional languages.  However, if we dig even deeper, we find that this similarity has its limits.  There are some truly profound nuggets of truth waiting to be uncovered within these murky depths.  Shall we?

Trivial aside: I’m going to use the terms “concatenative” and “stack-based” interchangeably through the article.  While these are most definitely related concepts, they are not exactly synonyms.  Bear that in mind if you read anything more in-depth on the subject.

The Basics

Before we look at some of those “deeper truths of which I speak, it might be helpful to at least understand the fundamentals of stack-based programming.  From Wikipedia:

The concatenative or stack-based programming languages are ones in which the concatenation of two pieces of code expresses the composition of the functions they express. These languages use a stack to store the arguments and return values of operations.

Er, right.  I didn’t find that very helpful either.  Let’s try again…

Stack-based programming languages all share a common element: an operand stack.  Consider the following program:


Yes, this is a real program.  You can copy this code and run/compile it unmodified using most stack-based languages.  However, for reasons which will become clear later in this series, I will be using Cat for most of my examples.  Joy and Factor would both work well for the first two parts, but for part three we’re going to need some rather unique features.

Returning to our example: all this will do is take the numeric value of 2 and push it onto the operand stack.  Since there are no further words, the program will exit.  If we want, we can try something a little more interesting:

2 3 +

This program first pushes 2 onto the stack, then 3, and finally it pops the top two values off of the stack, adds them together and pushes the result.  Thus, when this program exits, the stack will only contain 5.

We can mix and match these operations until we’re blue in the face, but it’s still not a terribly interesting language.  What we really need is some sort of flow control.  To do that, we need to understand quotations.  Consider the following Scala program:

val plus = { (x: Int, y: Int) => x + y }
plus(2, 3)

Notice how rather than directly adding 2 and 3, we first create a closure/lambda which encapsulates the operation.  We then invoke this closure, passing 2 and 3 as arguments.  We can emulate these exact semantics in Cat:

2 3
[ + ]

The first line pushes 2 and 3 onto the stack.  The second line uses square brackets to define a quotation, which is Cat’s version of a lambda.  Note that it isn’t really a closure since there are no variables to enclose.  Joy and Factor also share this construct.  Within the quotation we have a single word: +.  The important thing is the quotation itself is what is put on the stack; the + word is not immediately executed.  This is exactly how we declared plus in Scala.

The final line invokes the apply word.  When this executes, it pops one value off the stack (which must be a quotation).  It then executes this quotation, giving it access to the current stack.  Since the quotation on the head of the stack consists of a single word, +, executing it will result in the next two elements being popped off (2 and 3) and the result (5) being pushed on.  Exactly the same result as the earlier example and the exact same semantics as the Scala example, but a lot more concise.

Cat also provides a number of primitive operations which perform their dirty work directly on the stack.  These operations are what make it possible to reasonably perform tasks without variables.  The most important operations are as follows:

  • swap — exchanges the top two elements on the stack.  Thus, 2 3 swap results in a stack of “3 2” in that order.
  • pop — drops the first element of the stack.
  • dup — duplicates the first element and pushes the result onto the stack.  Thus, 2 dup results in a stack of “2 2“.
  • dip — pops a quotation off the stack, temporarily removes the next item, executes the quotation against the remaining stack and then pushes the old head back on.  Thus, 2 3 1 [ + ] dip results in a stack of “5 1“.

There are other primitives, but these are the big four.  It is possible to emulate any control structure (such as if/then) just using the language shown so far.  However, to do so would be pretty ugly and not very useful.  Cat does provide some other operations to make life a little more interesting.  Most significantly: functions and conditionals.  A function is defined in the following way:

define plus { + }

Those coming from a programming background involving variables (that would be just about all of us) would probably look at this function and feel as if something is missing.  The odd part of this is there is no need to declare parameters, all operands are on the stack anyway, so there’s no need to pass anything around explicitly.  This is part of why concatenative languages are so extraordinarily concise.

Conditionals also look quite weird at first glance, but under the surface they are profoundly elegant:

2 3 plus    // invoke the `plus` function
10 <
[ 0 ]
[ 42 ]

Naturally enough, this code pushes 0 onto the stack.  The conditional for an if is just a boolean value pushed onto the stack.  On top of that value, if will expect to find two quotations, one for the “then” branch and the other for the “else” branch.  Since 5 is less than 10, the boolean value will be True.  The if function (and it could just as easily be a function) pops the quotations off of the stack as well as the boolean.  Since the value is True, it discards the second quotation and executes the first, producing 0 on the stack.

I’ll leave you with the more complicated example of the factorial function:

define fac {
  dup 0 eq
  [ pop 1 ]
  [ dup 1 - fac * ]

Note that this isn’t even the most concise way of writing this, but it does the job.  To see how, let’s look at how this will execute word-by-word (assuming an input of 4):

Stack Word

4 4

4 4 0

4 False

[ pop 1 ]
4 False [pop 1]

[ dup 1 - fac * ]
4 False [pop 1] [dup 1 - fac *]


4 4

4 4 1

4 3

fac (assume magic recursion)
4 6


The final result is 24, a value which is left on the stack.  Pretty nifty, eh?


You’ll notice this is a shorter post than I usually spew forth (no pun intended…this time).  The reason being that I want this to be fairly easy to digest.  Concatenative languages (and Cat in particular) are not all that difficult to digest.  They are a slightly different way of thinking about programming, but as we will see in the next part, not so different as it would seem.

Note: Cat is written in C# and is available under the MIT License.  Don’t fear the CLR though, Cat runs just fine under Mono.  If you really want to experiment with no risk to yourself, a Javascript interpreter is available.


  1. Vanilla RPN is directly equivalent to an expression tree; in fact, one can view RPN as a serialization of an expression tree. A post-order traversal of an expression tree is pretty much all that’s needed, and one does this when doing code generation for a stack machine such as the CLR or JVM.

    Turning the RPN back into an expression tree requires symbolically evaluating the RPN, but capturing the full semantics of things like control flow requires a bit more intelligence, but basically not much more than the inverse of the code generation process.

    And just like both JVM and CLR, and expression trees also, it’s relatively trivial to type such programs. Even though they logically use on a runtime stack, to the degree that the primitives are not polymorphic, everything is typeable statically. (E.g. if + operation must have type int -> int -> int, then you know the type of a function that looks like [ + ].)

    Barry Kelly Monday, December 8, 2008 at 2:02 am
  2. I agree that this is pretty nifty. However, I think the mental burden is significantly heavier than with imperative or functional languages. When a programmer is writing code in such a language he cannot ignore the stack: he must be aware of its state. He does not have to manipulate the stack directly – the structure of the code will implicitly do it for him – but he must know the exact state of the stack at each point in the code. So the complexity is still there. It is in the programmer’s mind and not in the source code. I have this feeling despite the fact that I have extensive experience writing JVM programs in bytecodes.

    That said, I need to admit that it can still be a matter of education. Maybe if Cat were the first language that I learned, then I would see things differently.

    Anyway, enjoyed the post.

    That sI am not sure which

    Itay Maman Monday, December 8, 2008 at 3:09 am
  3. Not sure about your ‘interpreted bytecode nonsense’ comment about JVM. I hope you were being sarcastic.

    Dmitriy Kopoylenko Monday, December 8, 2008 at 4:48 am
  4. An interesting book (although quite old) on Stack Machines and their languages can be found online at:


    Fogus Monday, December 8, 2008 at 7:54 am
  5. @Dmitriy

    Of course the bytecode nonsense was sarcasm. :-) More importantly, I was trying to differentiate between commonly used high-level languages and rarely used low-levels. Bytecode is an awesome stack oriented language, but it is rarely written by hand. At least, hardly at all in proportion to the amount of auto-generation which takes place. Also, it’s filled with a number of low-level operations more reminiscent of assembly than ML.


    Actually, I’ve been playing with stack-based languages for a few weeks now on-and-off, and I’ve found very much to my surprise that the stack “goes away” over time. It’s like the parentheses in Lisp. :-) I can’t say that it’s as natural as writing code involving named variables with random access, but it’s not too bad. One thing Cat has going for it that the JVM lacks is higher-order functions (quotations). These simplify things tremendously. In fact, so much so that Cat doesn’t have separate stacks and entirely lacks any variable mechanism (unlike Forth and Factor).


    Thanks for the clarification on RPN! I threw in the reference half for coolness sake, I really know absolutely nothing about the language.

    With regards to your comments on typing, just wait for Part 3. :-) It is true that conventional stack-based expressions are very easy to type: it just falls out without any effort whatsoever. The first problem is in polymorphic arguments (e.g. pop, swap, dup). That’s easily overcome by adding universal types to the stack typing, but it doesn’t solve every problem. It turns out that higher-order functions are the real issue. If you’re impatient with regards to why and how it is resolved, I suggest you play around with Cat a little bit: it is a statically typed stack-based language with full inference.

    Daniel Spiewak Monday, December 8, 2008 at 9:50 am
  6. How did you highlight the Cat snippets? Does wp_syntax support Cat out of the box? I couldn’t find Cat in the list of languages.

    Jason Noakes Monday, December 8, 2008 at 9:56 am
  7. @Jason

    Nope, Cat isn’t one of the supported languages. I had to write a custom GeSHi mode for it based on the language specification. I’ve had to do that a few times now, including with Scala and SML. Most of the time I just base the mode on jEdit’s highlighting, but even that editor doesn’t have a Cat mode that I know of. :-)

    Daniel Spiewak Monday, December 8, 2008 at 10:07 am
  8. Daniel,

    I believe that programming is about trade offs. I think that almost every principle, rule, guideline, model, paradigm, you name it, has its pros and cons. Focusing on PL paradigms then one can say (forgive me for the over simplification) that FP languages are good for writing computations/algorithms but not as natural for handling I/O. Given your experience with Cat, could you highlight its weak points? (its strong points are obvious).

    Itay Maman Monday, December 8, 2008 at 10:42 am
  9. @Itay

    I think Cat’s main weak point is that some algorithms that require random-access variables can be extremely clumsy to express. It’s not terribly hard to think in terms of the stack, but that doesn’t mean that it’s the cleanest way to do things.

    I’m also unsure whether or not Cat has any meaningful integration with the common object system. If not, then it is absolutely not worth looking at for any real work.

    Daniel Spiewak Monday, December 8, 2008 at 11:02 am
  10. Interesting, I’d never heard of these kinds of languages. It took me a while to figure out why:

    2 3 plus

    Mwanji Ezana Tuesday, December 9, 2008 at 6:36 am
  11. As a Scala newcomer, I was a little curious about your syntax. I would have left out the curly braces:

    val plus = (x: Int, y: Int) => x + y
    plus(2, 3)

    Running that through the interpreter seems to work fine. I’m wondering if there is any difference in meaning between the two forms. Was that just for grouping, or does it mean something very different?

    Daniel Yankowsky Wednesday, December 17, 2008 at 1:02 am
  12. @Daniel

    Both forms work just fine. In fact, the Scala parser turns them into the same thing under the surface. I tend to prefer the curly-brace form for stylistic reasons (I think it’s more clear that a closure is involved, rather than just a normal construct). There are times that I use the ()=> form, but not many.

    Daniel Spiewak Wednesday, December 17, 2008 at 9:39 am
  13. If you find you need to be continually aware of the stack, then, plain and simple, “You’re Doing It Wrong.” With RPN-based languages, you should never have any words which accepts more than 3 items on the stack. If you do, you almost certainly have to refactor. This aids significantly in program comprehension as a side benefit, so it’s actually GOOD that your awareness of the stack sticks out.

    I’ve had the “luxury” of maintaining Forth programs with poor factoring, and it’s these kinds of programs that results in Forth’s “write-only” conception. This is absurdly false; Forth is no more write-only than Java, which I’ve also had to maintain, particularly programs with six page-long methods. You read that right — SIX PAGES LONG. I’ll take write-only Forth over a six-page method definition in Java any day.

    I’ve also had the abject pleasure of maintaining (and even authoring) Forth code that follows the aforementioned guideline, along with the “one definition, one line” guideline, which resulted in a plethora of positive e-mails suggesting how readable the code was — less than HALF of the respondents were proficient in Forth (many claimed it was their first exposure to the language). So what does this tell us?

    It tells me that people who b***h and moan over RPN syntax and how oh-so-hard it is to “get it”, blah blah blah, really needs to wake up and evaluate how they write their own code. Nearly always, these are the kinds of programmers who don’t give a hoot about the quality, readability, or stability of their code, because they feel that their proficiency in the language of their choice and training is such that they can do no wrong. Therefore, RPN languages are useless toys, and can’t possibly be used for grunt programming in real-world situations.

    (Except that UPS and Fedex use digital signature tablets coded in Forth. Power companies use electrical switches coded in Forth. GE released a SONET switch whose OS **IS** Forth — plug in a serial console, and you’ll get the OK prompt! Half of the orbiters around Mars run on Forth. Our space shuttle’s robotic arm and flight nav computers are programmed in Forth. Etc. Etc. Etc.)

    A program written well in a concatenative language will be shorter, easier to understand, and easier to reason about than a comparable program in an applicative language. Because the stack replaces the use of local variables in most cases (of Factor’s entire codebase, for example, locals were used in only 60 places, IIRC), issues concerning aliasing and update order disappear. The explicit DUPs and SWAPs and so forth make register allocation somewhat easier, since the compiler now has unequivocal knowledge of the lifetime of a register’s value. It encourages a more lucid style of programming that isn’t so math-centric (so-called “fluent” style of coding). Ironically, concatenation is simply function composition, and as such can be reasoned about in a functional manner.

    While I respect other’s opinions on concatenative languages, to which they’re free to express, I can’t help but feel quite upset and even threatened when people make such absurd notions as the “difficulty” of learning such languages public. Research shows people *naturally* follow a noun-verb order when communicating with others without a common language, and one need only remember just how difficult it was for you to learn algebra during highschool to understand the “difficulty of learning” infix notations! Prior to your algebra lessons, learning about the semantics of variable substitution, the difficulties of memorizing the FOIL method and factoring, et. al., is something that concatenative notation would have TOTALLY eliminated from the get-go. I’ve seen five year olds do some mighty sophisticated arithmetic using concatenative notations, but few understand how to properly parse the quadratic formula.

    So give me a break with this hard to learn stuff. It’s not hard to learn. What *IS* hard is your inability to break out of your intellectual gutter.

    Samuel A. Falvo II Friday, December 19, 2008 at 1:41 pm
  14. Interesting, I’d never heard of these kinds of languages. Thanks for sharing, I’m smarter now !

    Amir Wednesday, December 29, 2010 at 9:50 am

Post a Comment

Comments are automatically formatted. Markup are either stripped or will cause large blocks of text to be eaten, depending on the phase of the moon. Code snippets should be wrapped in <pre>...</pre> tags. Indentation within pre tags will be preserved, and most instances of "<" and ">" will work without a problem.

Please note that first-time commenters are moderated, so don't panic if your comment doesn't appear immediately.