Skip to content

Hacking Buildr: Interactive Shell Support

12
Jan
2009

Last week, we looked at the unfortunately-unexplored topic of Scala/Java joint compilation.  Specifically, we saw several different ways in which this functionality may be invoked covering a number of different tools.  Among these tools was Buildr, a fast Ruby-based drop-in replacement for Maven with a penchant for simple configuration.  In the article I mentioned that Buildr doesn’t actually have support for the Scala joint compiler out of the box.  In fact, this feature actually requires the use of a Buildr fork I’ve been using to experiment with different extensions.  Among these extensions is a feature I’ve been wanting from Buildr for a long time: the ability to launch a pre-configured interactive shell.

For those coming from a primarily-Java background, the concept of an interactive shell may seem a bit foreign.  Basically, an interactive shell — or REPL, as it is often called — is a line-by-line language interpreter which allows you to execute snippets of code with immediate result.  This has been a common tool in the hands of dynamic language enthusiasts since the days of LISP, but has only recently found its way into the world of mainstream static languages such as Scala.

interactive-shells.png

One of the most useful applications of these tools is the testing of code (particularly frameworks) before the implementations are fully completed.  For example, when working on my port of Clojure’s PersistentVector, I would often spin up a Scala shell to quickly test one aspect or another of the class.  As a minor productivity plug, JavaRebel is a truly invaluable tool for development of this variety.

The problem with this pattern of work is it requires the interactive shell in question to be pre-configured to include the project’s output directory on the CLASSPATH.  While this isn’t usually so bad, things can get very sticky when you’re working with a project which includes a large number of dependencies.  It isn’t too unreasonable to imagine shell invocations stretching into the tens of lines, just to spin up a “quick and dirty” test tool.

Further complicating affairs is the fact that many projects do away with the notion of fixed dependency paths and simply allow tools like Maven or Buildr to manage the CLASSPATH entirely out of sight.  In order to fire up a Scala shell for a project with any external dependencies, I must first manually read my buildfile, parsing out all of the artifacts in use.  Then I have to grope about in my ~/.m2/repository directory until I find the JARs in question.  Needless to say, the productivity benefits of this technique become extremely suspect after the first or second expedition.

For this reason, I strongly believe that the launch of an interactive shell should be the responsibility of the tool managing the dependencies, rather than that of the developer.  Note that Maven already has some support for shells in conjunction with certain languages (Scala among them), but it is as crude and verbose as the tool itself.  What I really want is to be able to invoke the following command and have the appropriate shell launched with a pre-configured CLASSPATH.  I shouldn’t have to worry about the language of my project, the location of my repository or even if the shell requires extra configuration on my platform.  The idea is that everything should all work auto-magically:

$ buildr shell

This is exactly what the interactive-shell branch of my Buildr fork is designed to accomplish.  Whenever the shell task is invoked, Buildr looked through the current project and attempts to guess the language involved.  This guesswork is required for a number of other features, so Buildr is actually pretty accurate in this area.  If the language in question is Groovy or Scala, then the desired shell is obvious.  Java does not have an integrated shell, which means that the default behavior on a Java project would be to raise an error.

However, the benefits of interactive shells are not limited to just the latest-and-greatest languages.  I often use a Scala shell with Java projects, and for certain things a JRuby shell as well (jirb).  Thus, my interactive shell extension also provides a mechanism to allow users to override the default shell on a per-project basis:

define 'my-project' do
  shell.using :clj
end

With this configuration, regardless of the language used by the compiler for “my-project”, Buildr will launch the Clojure REPL whenever the shell task is invoked.  The currently supported shells and their corresponding Buildr identifiers:

  • Clojure’s REPL — :clj
  • Groovy’s Shell — :groovysh
  • JRuby’s IRB — :jirb
  • Scala’s Shell — :scala

It is also possible to explicitly launch a specific shell.  This is useful for situations where you might want to use the Scala shell for testing some things and the JRuby IRB for quickly prototyping other ideas (I do this a lot).  The command to launch the JIRB shell in the context of my-project would be as follows:

$ buildr my-project:shell:jirb

As a special value-added feature, all of these shells (except for Groovy’s, which is weird) will be automatically configured to use JavaRebel for the project compilation target classes if it can be automatically detected.  This detection is performed by examining REBEL_HOME, JAVA_REBEL, JAVAREBEL and JAVAREBEL_HOME environment variables in order.  If any one of these points to a directory which contains javarebel.jar or points directly to javarebel.jar itself, the configuration is assumed and the respective shell invocation is appropriately modified.

javarebel-integration.png

Best of all, this support is implemented using a highly-extensible framework similar to Buildr’s own Compiler API.  It’s very easy for plugin implementors or even average developers to simply drop-in a new shell provider, perhaps for an internal language or even some unexpected application.  The core functionality of shell detection is integrated into Buildr itself, but this in no way hampers extensibility.  For example, I could easily create a third-party .rake plugin for Buildr which added support for a whole new language (e.g. Haskell).  In this plugin, I could also define a new shell provider which would be the default for projects using that language (e.g. GHCi).

Open Question

The good news is that this feature has been discussed extensively on the buildr-user mailing-list and the prevailing opinion seems to be that it should be folded into the main Buildr distribution.  Exactly what form this will take has yet to be decided.  The bad news is that there is still some dispute about a fundamental aspect of this feature’s operation.

The question revolves around what the exact behavior should be when the shell task is invoked.  Should Buildr detect the project (or sub-project) you are in and automatically configure the shell’s CLASSPATH accordingly?  This would give the interactive shell access to different classes depending on the current working directory.  Alternatively, should there be one all-powerful shell per-buildfile configured at the root level?  This would allow your shell to remain consistent throughout the project, regardless of your current directory.  However, it would also mean that some configuration would be required in order to enable the functionality.  (more details of this debate can be found on the mailing-list).

Additionally, what should the exact syntax be for invoking a specific shell?  Rake 0.8 allows tasks to take parameters enclosed within square brackets.  Thus, the syntax would be something more like the following:

$ buildr collection:shell[jirb]

In some sense, this is more logical since it reflects the fact that a single task, shell, is taking care of the work of invoking stuff.  On the other hand, it’s a little less consistent with the rest of Buildr’s tasks, particularly things like “test:TestClass” and so on.  This too is a matter which has yet to be settled.

All in all, this is a pretty experimental branch which is very open (and desirous) of outside input.  How would you use a feature like this?  Is there anything missing from what I have presented?  What design path should be we take with regards to project-local vs global shell configurations?

If you feel like adding your voice to the chorus, feel free to leave a comment or (better yet) post a reply on the mailing-list thread.  You’re also perfectly free to fork my remote branch at GitHub to better experiment with things yourself.  The root of the whole plate of spaghetti is the lib/buildr/shell.rb file.  Bon appetit!

Higher-Order Fork/Join Operators

22
Sep
2008

I think we can all agree that concurrency is a problem.  Not really a problem as in “lets get rid of it”, but more the type of problem that really smart people spend their entire lives trying to solve.  Over the years, many different solutions have been proposed, some of them low-level, some more abstract.  However, despite their differences, a common thread runs through all of these ideas: each of them attempts to ease the pain of decomposing operations in a reorderable fashion.

Surprisingly, the word “ordering” is not often heard in conjunction with parallelism.  Most of the time, people are thinking in terms of server/client or broker/investor.  If you really deconstruct the issue though, there is actually a deeper question underlying all concurrency: what operations do not depend upon each other in a sequential fashion?  As soon as we identify these critical operations, we’re one step closer to being able to effectively optimize a particular algorithm with respect to asynchronous processing.

By the way, I really will get to fork/join a little later, but I wanted to be sure that I had laid a solid groundwork for the road ahead.  Without understanding some of the fundamental theory behind fork/join, it will be impossible to see how it can be applied effectively to your next project.

Factorial

One of the odd things about computer science is a depressing lack of imaginative examples.  Not being one to break with tradition, I’ve decided to kick off our little quest with a little time spent in the well-trodden foothills of factorial.  This should help us to establish some terminology (which I’m arbitrarily assigning for the purposes of this article) as well as the basic concepts involved.  A simple recursive implementation (in Scala of course) would look like this:

def fac(n: Int): Int = if (n < 1) 1 else fac(n - 1) * n

For each number, this function performs a number of discrete operations.  First, it checks to see if the index is less than 1.  If so, then the function returns immediately, otherwise it proceeds on a separate course.  This is a branching operation.  Since the “true branch” is uninteresting, we will focus on the case when the index is in fact greater than 1.  In this case, we have three critical operations which must be performed.  They are as follows (temporary names are fictitious):

  • Subtract 1 from n and store the value in some temporary $t1
  • Dispatch to function fac passing the value from $t1
  • Multiply result from dispatch with n and return

All this may seem extremely pedantic but please, bear with me.  Consider these operations very carefully in the topological sense.  What we’re trying to see here is if one (or more) of these operations may be ordered above (or below) one of the others.  For example, could we perhaps dispatch to fac after multiplying and returning?  Or could we perform the subtraction operation after the dispatch?

The answer is quite obviously “of course not”.  There is no way we can change the ordering in this expression because each step depends entirely upon the result from the previous.  As far as our attempts to parallelize are concerned, these three operations are completely atomic, meaning that they form an inseparable computation.

image

Since we’ve drilled down as far as we can possibly go in our implementation and so identified the most atomic computation, let’s move out one step and see if we can find anything with promise.  Stepping back through our execution sequence leads us directly to the branching operation identified previously.  Remember that our goal is to identify operations which can be shuffled around in the execution order without affecting the semantics. (does this feel like pipeline optimization to anyone else?)  Unfortunately, here too we are at an impasse.  We might try moving an atomic computation from one of the branches out before the branching operation, but then we could conceivably do the wrong thing.  Since our function uses recursion, this sort of reordering would be very dangerous indeed. 

The truth is that for factorial, there are absolutely no operations which can be moved around without something going wrong.  Because of this property, we are forced to conclude that the entire factorial operation is atomic, not just its false branch.  Unfortunately, this means that there is no way to effectively transform this function into some sort of asynchronous variant.  That’s not to say that you couldn’t calculate factorial of two separate numbers concurrently, but there is no way to modify this implementation of the factorial function in a parallel fashion1.  This is truly the defining factor of atomic computations: it may be possible to reorder a series of atomic computations, but such a reordering cannot affect the internals of these computations.  Within the “atom”, the order is fixed.

So what does reordering have to do with concurrency?  Everything, as it turns out.  In order to implement an asynchronous algorithm, it is necessary to identify the parts of the algorithm which can be executed in parallel.  In order for one computation to be executed concurrently with another, neither must rely upon the other being at any particular stage in its evaluation.  That is to say, in order to execute computation A at the same time as computation B, the ordering of these two computations must be irrelevant.  Providing that both computations complete prior to some computation C (which presumably depends upon the results of A and B), the aggregate semantics of the algorithm should remain unaffected.  You could prove this, but I really don’t feel like it and frankly, I don’t think anyone reading this will care.  :-)

Fibonacci

Now that we have some simple analysis on factorial under our belt, let’s try something a little tougher.  The Fibonacci series is another of those classic computer science examples.  Curiously enough, the implementation used by every known textbook to explain recursion is actually one of the worst possible ways to implement the calculation.  Wikipedia has an excellent description of why this is, but suffice it to say that the intuitive approach is very, very bad (efficiency wise).

However, the “good” implementations used to calculate the nth number of the Fibonacci series just aren’t as concise or easily recognized.  Also, they’re fairly efficient in their own rights and thus see far less benefit from parallelization at the end of the day.  So rather than taking the high road, we’re going to just bull straight ahead and use the first algorithm which comes to mind:

def fib(n: Int): Int = if (n < 2) n else fib(n - 1) + fib(n - 2)

Like factorial, this function makes an excellent poster child for the syntactic wonders of functional programming.  Despite its big-O properties, one cannot help but stop and appreciate the concise beauty of this single line of code.

As is common in simple recursion, our function begins with a conditional.  We have a simple branching operation testing once again for a range (n < 2), with a base case returning n directly.  It is easy to see how the “true branch” is atomic as it consists of but one operation.  We’ve already made a hand-wavy argument that branches themselves should not be dissected and moved around outside of the conditional, so it would seem that our only hope rests with the recursive “false branch”.  In words, we have the following operations:

  • Subtract 1 from n and store the value in temporary $t1
  • Dispatch to function fib passing the value from $t1; store the value in $t1
  • Subtract 2 from n and store the value in temporary $t2
  • Dispatch to function fib passing the value from $t2; store the value in $t2
  • Add values $t1 and $t2 and return

Ah, this looks promising!  We have two “blocks” of operations which look almost identical.  Printed redundancy should always be a red flag to developers, regardless of the form.  Printed redundancy should always be a red flag to developers, regardless of the form.  In this case though, we don’t want to extract the duplicate functionality into a separate function, that would be absurd.  Rather, we need to observe something about these two operations, specifically: they do not depend on one-another.  It doesn’t matter whether or not we have already computed the value of fib(n - 1), we can still go ahead and compute fib(n - 2) and the result will be exactly the same.  We’re going to get into trouble again as soon as we get to the addition operation, but as long as both dispatches occur before the final aggregation of results, we should be in the clear!

image

Because it does not matter in which order these computations occur, we are able to safely parallelize without fear of subtle semantic errors cropping up at unexpected (and of course, unrepeatable) full-board demonstrations.  Armed with the assurance which only comes from heady, unrealistic trivial algorithm analysis, we can start planning our attack.

Threads Considered Insane

Being a good Java developer (despite the fact that we’re using Scala), the very first thing which should come to mind when thinking of concurrency is the concept of a “thread”.  I’m not going to go into any detail as to what threads are or how they work since they really are concurrency 101.  Suffice it to say though that threads are the absolute lowest-level mechanism we could possibly use (at least on this platform).  Here we are, Fibonacci a-la Thread:

def fib(n: Int): Int = {
  if (n < 2) n else {
    var t1 = 0
    var t2 = 0
 
    val thread1 = new Thread {
      override def run() {
        t1 = fib(n - 1)
      }
    }
 
    val thread2 = new Thread {
      override def run() {
        t2 = fib(n - 2)
      }
    }
 
    thread1.start()
    thread2.start()
 
    thread1.join()
    thread2.join()
 
    t1 + t2
  }
}

I can’t even begin to count all of the things that are wrong with this code.  For starters, it’s ugly.  Gone is that attractive one-liner that compelled us to pause and marvel.  In its place we have a 25 line monster with no apparent virtues.  The intent of the algorithm has been completely obscured, lost in a maze of ceremony.  But the worst flaw of all is the fact that this design will actually require (n - 2)! threads.  So to calculate the 10th Fibonacci number, we will need to create, start and destroy 40,320 Thread instances!  That is a truly frightening value.

At first blush, it seems that we can alleviate at least some of the insanity by using a thread pool.  After all, can’t we just reuse some of these threads rather than throwing them away each time?  Unfortunately, this well-intentioned approach doesn’t quite suffice.  It turns out that we can’t really pool very many threads due to the fact that we’re utilizing a thread in fib to recursively call itself and then wait for the result.  Thus, the “parent” dispatch is still holding a resource when the “child” attempts to obtain an allocation.  Granted, we have reduced the number of required threads to a mere 2n - 4, but with a fixed size thread pool (the most common configuration), we’re still going to run into starvation almost immediately.  Apocalisp has a more in-depth article explaining why this is the case.

Something a Little Better…

For the moment, it looks like we have run into an insurmountable obstacle.  Rather than mash our brains out trying to come up with a solution, let’s move on and conceptualize how we might want things to work, at least in syntax.

Clearly threads are not the answer.  A better approach might be to deal with computational values using indirection.  If we could somehow kick off a task asynchronously and then keep a “pointer” to the final result of that task (which has not yet completed), we could later come back to that result and retrieve it, blocking only if necessary.  It just so happens that the Java 5 Concurrency API introduced a series of classes which fulfill this wish:  (what a coincidence!)

def fib(n: Int): Future[Int] = {
  if (n < 2) future(n) else {
    val t1 = future {
      fib(n - 1).get()
    }
 
    val t2 = future {
      fib(n - 2).get()
    }
 
    future {
      t1.get() + t2.get()
    }
  }
}
 
def future[A](f: =>A) = exec.submit(new Callable[A] {
  def call = f
})

I’m assuming that a variable called exec is defined within the enclosing scope and is of type ExecutorService.  The helper method is just syntax, the real essence of the example is what we’re doing with Future.  You’ll notice that this is much shorter than our threaded version.  It still bears a passing resemblance to that horrific creature of yesteryear, but yet remains far enough removed as to be legible.  We still have our issue of thread starvation to content with, but at least the syntax is getting better.

Along those lines, we should begin to notice a pattern emerging from the chaos: in both implementations so far we have started by asynchronously computing two values which are assigned to their respective variables, we then block and then merge the result via addition.  Do you see the commonality?  We start by forking our reorderable computations and finish by joining the results according to some function.  This right here is the very essence of fork/join.  If you understand this one concept, then everything else falls into place.

Now that we have identified a common pattern, we can work to make it more syntactically palatable.  If indeed fork/join is all about merging asynchronous computations based on a given function, then we can invent a bit of syntax sugar which should make the Fibonacci function more concise and more readable.  To differentiate ourselves from Future, we will call our result “Promise” (catchy, ain’t it?).

def fib(n: Int): Promise[Int] = {
  if (n < 2) promise(n) else {
    { (_:Int) + (_:Int) } =<< fib(n - 1) =<< fib(n - 2)
  }
}

At first glance, it seems that all we have done is reduce a formerly-comprehensible series of verbose constructs to a very concise (but unreadable) equivalent.  We can still make out our recursive calls as well as the construction of the base case, but our comprehension stops there.  Perhaps this would be a bit more understandable:

val add = { (a: Int, b: Int) => a + b }
 
def fib(n: Int): Promise[Int] = {
  if (n < 2) promise(n) else {
    add =<< fib(n - 1) =<< fib(n - 2)
  }
}

The only reason to use an anonymous method assigned to a value (add) rather than a first-class method is the Scala compiler treats the two differently in subtle ways.  Technically, I could use a method and arrive at the same semantic outcome, but we would need a little more syntax to make it happen (specifically, an underscore).

This should be starting to make some more sense.  What we have here is a literal expression of fork/join: given a function which can join two integers, fork a concurrent “process” (not in the literal sense) for each argument and reduce.  The final result of the expression is a new instance of Promise.  As with Future, this operation is non-blocking and very fast.  Since the arguments themselves are passed in as instances of Promise, we literally don’t need to wait for anything.  We have now successfully transformed our original fib function into a non-blocking version.  The only thing left is a little bit of syntax to “unwrap” the final result:

val num = fib(20)
num()                  // 6765

Incidentally, the =<< operator was not chosen arbitrarily, its resemblance to the “bind” operator in Haskell is quite intentional.  That is not to say that the operation itself is monadic in any sense of the word, but it does bear a conceptual relation to the idea of “binding multiple operations together”.  The operator is inverted because the bind operation is effectively happening in reverse.  Rather than starting with a monad value and then successively binding with other monads and finally mapping on the tail value (as Scala does it), we are starting with the map function and then working our way “backwards” from the tail to the head (as it were).  None of the monadic laws apply, but this particular concurrency abstraction should tickle the same region of your brain.

An End to Starvation

I half-promised a little while back that we would eventually solve the issue of thread starvation in the implementation.  As mentioned, this particular issue was the central focus of an article on Apocalisp a few weeks back.  For full details, I will again refer you to the original.  In a nutshell though, it looks like this:

  • Operation A dispatches operations B and C, instructing them to send the result back to A once they are finished
  • Operation A releases the thread
  • Operation B executes and sends the result back to A
  • Operation C executes and sends the result back to A
  • Operation A combines the two results and sends the final result back to whoever dispatched it
  • …and so on, recursively

Rather than stopping the world (or at least, our little thread) while we wait for a sub-operation to complete, we just tell it to give us the result as soon as it’s done and we move on.  The whole thing is based around the idea of asynchronous message passing.  The first person to say “actors” gets a gold star.

Every Promise is an actor, capable of evaluating its calculation and sending the result wherever we need it.  The =<< builds up a “partially-applied asynchronous function” based on the original function value we specified (add), binding each Promise in turn to a successive argument (a nice side-benefit of this is compile-time type checking for argument binding).  Once the final argument is bound, a full-fledged Promise emerges with the ability to receive result messages from the argument Promise(s).  Once every value is available, the results are aggregated in a single collection and then passed in order to the function.  The final result is returned and subsequently passed back to any pending actors.  It’s a classic actor pattern actually: don’t block, just tell someone else to call you as soon as they are ready.

With this strategy, it is actually possible to execute the whole shebang in a single thread!  This is because we never actually need to be executing anything in parallel, everything is based on the queuing of messages.  Of course, a single-threaded execution model would completely ruin the entire exercise, so we will just trust that Scala’s actor library will choose the correct size for its internal thread pool and distribute tasks accordingly.

Conclusion

In case you hadn’t already guessed, I’ve actually gone and implemented this idea.  What I have presented here is a bit of a distillation of the “long think” I’ve had regarding this concept and how it could be done.  The only important item that I’ve left out is what Doug Lea calls the “granularity control”.  Basically, it’s a threshold which describes the point at which the benefits of executing a task asynchronously (using fork/join) are outweighed by the overhead involved.  This threshold can be seen in my benchmark of the library.  Performance numbers look something like this (on a dual-core, 2 Ghz 32-bit processor):

Calculate fib(45)
Sequential Time 14682.403901 ms
Parallel Time 7515.882423 ms
Sequential Memory 0.023438 KB
Parallel Memory 3131.548828 KB

For the mathematically challenged, the results show that the parallel execution using Promise was 95.351698% faster than the same operation run sequentially.  That’s almost linear with the number of cores!  Accounting for the overhead imposed by actors, I would expect that the impact on performance would approach linearity as the number of cores increases.

Fork/join isn’t the answer to the worlds concurrency problems, but it certainly is a step in the right direction.  Also, it’s hardly a new approach, but it has generally remained shrouded in a murky cloud of academic trappings and formalisms.  As time progresses and our industry-wide quest for better concurrency becomes all the more urgent, I hope that we will begin to see more experimentation into improving the outward appearance of this powerful design pattern.

Naïve Text Parsing in Scala

26
May
2008

One of the truly incredible things about Scala is that it really inspires people to consider problems that they never would have attempted before.  Recently, the urge came upon me to try my hand at some more advanced text processing.  Not quite so advanced as a full language, but more complicated than can be easily handled by regular expressions.  As usual, Scala proved more than up to the challenge.

All of us have evolved more-or-less ad hoc methods for handling simple text processing.  In this modern age, it’s almost a requirement to be familiar with basic regular expressions, or at the very least split, subString and find.  These techniques tend to work well in small-scale applications, but things become a bit muddled when trying to deal with more complex or conditional representations.  It gets even worse when you must perform some sort of complex resolution on the results of your parsing, requiring you to devise an intermediate form.

One such example of a more complicated text parse would be the C-style printf function.  Java has had this functionality since 1.5 in the form of PrintStream#printf(...) as well as String.format(String, Object...); but unfortunately, Scala lacks this highly useful method.  Oh, it has a printf function, but it doesn’t support C-style syntax for reasons of backwards compatibility.  This caused me no end of grief when I was trying to construct a quine in Scala.  Since Scala has no printf, I decided to try my hand at implementing one (just for kicks).

Finite State Machines

As I said, the ad hoc parsing techniques may serve us well when we’re just trying to split a full name into a firstname/lastname tuple, but I’m afraid that printf requires a more disciplined approach.  Fortunately, there are a number of beautiful formalisms for dealing with text parsing.  Chief among these are deterministic finite state machines.

If you took formal language theory in college, you’ve probably already worked with DFAs (Deterministic Finite Automata), NFAs (Non-deterministic Finite Automata) and PDAs (Pushdown Automata); but since everyone I know slept through that class, I’ll just assume that you did too and go over some of the basics again.  Finite state machines (automata) are at the core of Turing’s seminal thesis on computability.  Actually, the so-called “Turing Machine” is at the core of his work, but DFAs are really just a limited form of this concept.

The ideas behind the acronyms are very simple: a finite state machine is a collection of “states” which have connections to each other which dictate ensuing states or termination of the execution.  The most common representation of a DFA is a directed graph.  The states are represented by the vertices of the graph.  The double-circle indicates an “accepting state”.

 image

This is a simple DFA which has four accepting states (4, 6, 7 and 8).  There is also a loop transition on state 3.  Each of these states represents a different position (or “state”, hence the term) in the parse process.  The idea is that you consume just one character at a time and based on the character value, the automaton “chooses” the correct transition.  It’s all very mindless, very sequential (hence the name “automaton”).

The only problem here is there may not be a transition for every possible character.  For example, starting from state 1, we know how to handle characters a, b, c, d and g, but what happens if we actually get an s or a 7?  By some definitions, this failing would indicate that we have an invalid DFA, something which is obviously bad.  Most representations however allow unsatisfied input and merely have an implicit transition to an accepting error state.  Most common applications make use of this rule (we’ll get to that in a minute).

If you execute the given automaton, you will find that it accepts the following inputs:

  1. ban
  2. aa120m
  3. da1o
  4. gs
  5. ga

…but rejects (or errors) on these alternative sequences:

  1. aaa7
  2. da6mn
  3. gq88

These claims are fairly easy to verify by mentally consuming each character in turn and transitioning to the corresponding state (if any).  Thus, for the first series of inputs, the state sequences will be as follows:

  1. 1, 2, 3, 4
  2. 1, 2, 3, 3, 3, 3, 4
  3. 1, 2, 3, 3, 7
  4. 1, 5, 6
  5. 1, 5, 8

Notice how every parse starts with the initial state (1).  This may seem sort of academic (since the parse information is all encoded in the transitions), but it turns out that without this formalism, many common every-day tasks which we take for granted would be impossible.

If you look closely at my example, you’ll notice that you can very easily encode the same accept/reject information using a regular expression:

[a-d]a[0-9]*([nm]|[a-lo-z])|g([n-z][0-9]?|[abc])

Ok, so maybe that’s not the easiest connection to make, but I think you get the picture.  As it turns out, regular expressions are a direct textual representation of deterministic finite state automata.  In fact, algorithms for executing regular expressions compile the regular expression into a DFA (using various techniques) and then execute this DFA against the input.  It does require an intervening step to convert from a regular expression to a DFA, but it’s not that difficult to do.

The printf Case Study

Now that I’ve managed to lull all of you to sleep, it’s time to get back to more practical matters.  All of this formal theory actually has some very down-to-earth applications, including the algorithms required to implement printf.

C-style printf has a fairly flexible syntax which allows not only simple substitutions, but also type-dependent formatting, padding and truncation.  For example, we can do something like this in Java:

double pi = 3.14159;
System.out.printf("My favorite number: %n%80.2f", pi);

The result looks like this (including all the space):

My favorite number:
                                                                            3.14

There are a number of different conversions available, denoted by the letter trailing the escape - in this case, n and f respectively.  There are also a large number of flags which can be used, the capability to specify the argument index, etc.  Altogether, the context-free grammar for this format looks like this (source):

format ::= '%' index flags width precision conversion

index ::= INTEGER '$' | '<' | ε

flags ::= flags flag | ε
flag ::= '-' | '#' | '+' | ' ' | '0' | ',' | '('

width ::= INTEGER | ε
precision ::= '.' INTEGER | ε

conversion ::= 'b' | 'B'
             | 'h' | 'H'
             | 's' | 'S'
             | 'c' | 'C'
             | 'd'
             | 'o'
             | 'x' | 'X'
             | 'E' | 'E'
             | 'f'
             | 'g' | 'G'
             | 'a' | 'A'
             | ( 't' | 'T' ) date_format
             | '%'
             | 'n'

date_format ::= ...

I have omitted the grammar for date formatting just for the sake of simplicity.  The epsilon (ε) symbolizes the empty string ("").  In case you found the above confusing, this is a (slightly) more human-readable variant:

%[argument_index$][flags][width][.precision]conversion

Essentially, this boils down to the following: A substitution format is escaped by a percent sign (%) followed by an optional index, flags, width and precision, as well as a mandatory conversion indicator.  The date/time conversion is special and takes a series of formatting parameters immediately trailing the conversion.  For the sake of sanity, our parser implementation will ignore this inconvenient fact.

We could take this CFG (context-free grammar) and feed it into a parser generator (such as the one built into Scala) and generate an AST in that way.  However, in this particular instance there is no need.  A cursory glance at the grammar indicates that there is no case wherein the syntax is self-recursive.  A minor exception to this is the flags terminal, but this is really just a way of expressing a repetition in BNF-ish style.  A moment’s reflection will lead us to the (correct) conclusion that because the grammar is non-recursive, it can also be represented as a regular expression - thus, a DFA.  In fact, you can prove this point, but that bit of academic trivia is unimportant for the moment.

What is important is to realize that as this grammar is expressible in the form of a DFA, we can actually write code which parses it without too much trouble.  Parsers can (and have) been written by hand, but usually when the grammar gets complex, the parser reflects this exponentially.  While it is not difficult to write a simple PDA by hand, doing so would be overkill.  So rather than starting with the BNF grammar and creating a literal representation, we will work off of the one-line informal syntax and produce a stackless automaton (the defining feature of a pushdown automaton is the use of a stack to maintain recursive state).

Implementation

As it turns out, all of this gobbly-gook expresses very elegantly in functional languages.  In truth, I could have written the parser in ML, but it is much more fun to use Scala.  We start out by considering how we want the intermediate form to be expressed.  Since we’re not writing a true compiler, we don’t need to worry about serializing this IF into anything persistent; we can rely entirely on memory state.  For a more complicated grammar, we might write classes to represent a tree structure (commonly referred to as an AST).  However, because printf escapes are so straightforward, we can simply generate a token stream.  We will represent this as List[Token] using the following definitions.

sealed abstract class Token
 
case class CharToken(token: Char) extends Token
case class FormatToken(index: Index, flags: Set[Flag.Value],
                                       width: Option[Int], precision: Option[Int],
                                       format: Char) extends Token
 
sealed abstract class Index
 
case class Value(index: Int) extends Index
case class Previous extends Index
case class Default extends Index
 
object Flag extends Enumeration {
  val LEFT_JUSTIFIED,
      ALTERNATE,
      SIGNED, 
      LEADING_SPACE,
      ZERO_PADDED, 
      GROUP_SEPARATED,
      NEGATIVE_PAREN = Value
}
 
val flagMap = {
    import Flag._
 
    Map('-' -> LEFT_JUSTIFIED, '#' -> ALTERNATE, '+' -> SIGNED, 
        ' ' -> LEADING_SPACE, '0' -> ZERO_PADDED, ',' -> GROUP_SEPARATED,
        '(' -> NEGATIVE_PAREN)
  }

Note that Option is insufficient to represent index because of the < escape (use previous index).  Thus, we define a separate series of types with three alternatives: Value, Previous and Default).  This is similar to Option, but more specific to our needs.  Finally, the flags are represented as an enumeration.  Scala doesn’t have language-level support for enumerations, so the syntax ends up being a fair-bit more verbose than the equivalent Java.  It is for this reason that enumerations aren’t used very much in Scala, instead preferring sealed case classes and object modules (to serve as the namespace).

Our parser will have to consume the entire format string, including non-escapes.  The final representation will be an immutable list of Token(s), either CharToken for a single run-of-the-mill character, or FormatToken which will represent the fully-parsed substitution.  Thus, for the printf example given above (my favorite number), the token stream will look something like this:

CharToken('M') :: CharToken('y') :: CharToken(' ') :: /* ... */ :: 
    FormatToken(Default(), Set(), None, None, 'n') :: 
    FormatToken(Default(), Set(), Some(80), Some(2), 'f') :: Nil

For those of you unfamiliar with the cons operator (::), it is just about the most useful functional idiom known to exist, especially in conjunction with pattern matching.  All it does is construct a new linked list with the value on the left as the head and the list to the right as the tail.  Nil is the empty list and thus commonly serves as the tail of a compound cons expression.

To produce this token stream, we will need to write an automaton which consumes each character in the stream and inspects it to see if it marks the beginning of a substitution.  If not, then a CharToken should be generated and put in the list.  However, if the character does mark an escape, then the automaton should transition to a different branch, consuming characters as necessary and walking through the algorithmic representation of our one-line syntax.  It is possible to diagram the necessary automaton, but to do so would be both pointless and unhelpful.  It’s probably easier just to dive into the code:

type Input = ()=>Option[Char]
 
def parse(stream: Input): List[Token] = {
  stream() match {
    case Some('%') => parseIndex1(stream) :: parse(stream)
    case Some(x) => CharToken(x) :: parse(stream)
    case None => Nil
  }
}

Rather than trying to efficiently walk through a proper String instance, it is easier to deal with a single-character stream.  The Input type alias defines a function value which will return Some(x) for the next character x in the string, or None if the end of the string has been reached.  It’s like a type-safe EOF.  We will call this method in the following way:

var index = -1
val tokens = parse(() => {
  index = (index + 1)
  if (index < pattern.length) Some(pattern(index)) else None
})

Our cute use of mutable state (index) makes this code far more concise than it would have been had we attempted to do things functionally.  As it turns out, this is the only place is in our parser where we need to maintain state which is not on the stack.  Because no lookahead is required, we can simply march blindly through the syntax, consuming every character we come across and transitioning to a corresponding state.

The first state of our automaton is represented by the parse(Input) method.  It has a transition to a normal character-consuming state, which in turn transitions back to parse (CharToken(x) cons’d with the recursive evaluation).  Our first state also has a transition to a more complicated state represented by parseIndex1(Input).  This transition takes place whenever we consume a percent (%) character.  What happens next is much easier to explain with code than in words (warning: 90-line snippet):

def parseIndex1(stream: Input) = stream() match {
  case Some(c) => {
    if (flagMap contains c) {
      parseFlags(stream, Default(), Set(flagMap(c)))
    } else if (c == '<') {
      parseFlags(stream, Previous(), Set[Flag.Value]())
    } else if (c == '.') {
      parsePrecision(stream, Default(), Set[Flag.Value](), None, 0)
    } else if (Character.isDigit(c)) {
      parseIndex2(stream, Character.digit(c, 10))
    } else {
      parseFormat(stream, Default(), Set[Flag.Value](), None, None, c)
    }
  }
 
  case None => throw new InvalidFormatException("Unexpected end of parse stream")
}
 
def parseIndex2(stream: Input, value: Int): FormatToken = stream() match {
  case Some(c) => {
    lazy val index = if (value == 0) Default() else Value(value)
    lazy val width = if (value == 0) None else Some(value)
 
    if (c == '.') {
      parsePrecision(stream, Default(), Set[Flag.Value](), width, 0)
    } else if (c == '$') {
      parseFlags(stream, index, Set[Flag.Value]())
    } else if (Character.isDigit(c)) {
      parseIndex2(stream, (value * 10) + Character.digit(c, 10))
    } else {
      parseFormat(stream, Default(), Set[Flag.Value](), width, None, c)
    }
  }
 
  case None => throw new InvalidFormatException("Unexpected end of parse stream")
}
 
def parseFlags(stream: Input, index: Index, flags: Set[Flag.Value]): FormatToken =
  stream() match {
    case Some(c) => {
      if (flagMap contains c) {
        parseFlags(stream, index, flags + flagMap(c))
      } else if (c == '.') {
        parsePrecision(stream, index, flags, None, 0)
      } else if (Character.isDigit(c)) {
        parseWidth(stream, index, flags, Character.digit(c, 10))
      } else {
        parseFormat(stream, index, flags, None, None, c)
      }
    }
 
    case None => throw new InvalidFormatException("Unexpected end of parse stream")
  }
 
def parseWidth(stream: Input, index: Index, flags: Set[Flag.Value], 
               value: Int): FormatToken = stream() match {
  case Some(c) => {
    lazy val width = if (value == 0) None else Some(value)
 
    if (c == '.') {
      parsePrecision(stream, index, flags, width, 0)
    } else if (Character.isDigit(c)) {
      parseWidth(stream, index, flags, (value * 10) + Character.digit(c, 10))
    } else {
      parseFormat(stream, index, flags, width, None, c)
    }
  }
 
  case None => throw new InvalidFormatException("Unexpected end of parse stream")
}
 
def parsePrecision(stream: Input, index: Index, flags: Set[Flag.Value], 
                   width: Option[Int], value: Int): FormatToken = stream() match {
  case Some(c) => {
    lazy val precision = if (value == 0) None else Some(value)
 
    if (Character.isDigit(c)) {
      parsePrecision(stream, index, flags, width, (value * 10) + Character.digit(c, 10))
    } else {
      parseFormat(stream, index, flags, width, precision, c)
    }
  }
 
  case None => throw new InvalidFormatException("Unexpected end of parse stream")
}
 
def parseFormat(stream: Input, index: Index, flags: Set[Flag.Value],
                width: Option[Int], precision: Option[Int], c: Char) = {
  FormatToken(index, flags, width, precision, c)
}

If you can get past the sheer volume of code here, it actually turns out to be pretty simple.  Each method represents a single state.  Some of these states have loop transitions (so as to consume multi-digit precisions, for example), but for the most part, flow travels smoothly from each state to the next.  The transitions are defined by the if/else if/else expressions within each method.  Note that due to the fact that every if statement has a corresponding else, we are allowed to treat them has expressions with a proper value and thus avoid the use of any explicit returns (improving the conciseness of the code).

The final state is represented by the parseFormat(...) method.  This method constructs a FormatToken based on the accumulated values and then returns, unwinding our long and recursive automaton branch all the way back to the parse method, which places our token in the list and moves on.  Simple and to the point.

Tail Recursion

As a side-bonus, it is possible to rewrite the parse method so that it is tail recursive, allowing the Scala compiler to overwrite each stack frame with its successor.  Some of the substitution state methods are already tail recursive, but these loop far less frequently than parse.  In fact, if we don’t write a tail recursive parser, we will be unable to handle large strings due to stack overflow.

The tail recursive form of parse is nowhere near as elegant, but it gets the job done.  Like most tail recursive methods, it makes use of an accumulator which is passed from each call to the next.  So rather than parsing the tokens recursively and then constructing the list as we pop back up the stack, we construct the list as we go and return the completed value at the end.  The only problem with this is that cons prepends elements to the list.  This means that when we finally return the accumulated list, it will be the exact inverse of what we want.  Thus, we must must explicitly reverse the list at the termination of the character stream.  This actually means that the tail recursive form will require more bytecode instructions than the original, but it will execute more efficiently due to the local elimination of the stack (effectively, scalac will collapse the method into a while loop at compile-time).

def parse(stream: Input) = tailParse(stream, Nil)
 
def tailParse(stream: Input, back: List[Token]): List[Token] = {
  stream() match {
    case Some('%') => tailParse(stream, parseIndex1(stream) :: back)
    case Some(x) => tailParse(stream, CharToken(x) :: back)
    case None => back.reverse
  }
}

Conclusion

Hopefully, this has been a thoroughly enjoyable visit to the land of parsing and formal language theory (I had fun anyway).  As usual, Scala proves itself to be an extremely expressive language, capable of representing both the theoretical and the practical with ease.  It almost makes me want to write a more complicated parser by hand, just to see how well Scala handles it.

I’m not entirely sure what I want to do with the result.  As I mentioned, Scala needs an in-language implementation of printf, so maybe I’ll flesh out the implementation some more and submit a patch.  The unfortunate problem with this is printf is more than just a parser.  We can’t just take our token stream and pipe it to stdout, hoping for an epsilon transition.  As it turns out, walking this token stream and formatting the substitutions proves to be a very ugly, very tedious task.  I’ve already implemented most of the core substitution functionality, but a lot of the more complicated stuff remains undone.  If anyone’s interested in the full sources + a BDD specification for printf, just let me know.  :-)

Parsing is a very interesting science with a world of representations and experiences to draw upon.  Even for simple grammars like printf, many lessons can be learned about the fundamentals of computing and just what constitutes a language.  And what better language to use in learning these lessons than Scala?

Update: Public interest seems to be high enough to merit uploading the full project.  You should be able to download using the link below (project managed with Buildr).

JRuby: The Future of Scalable Rails?

14
Jun
2007

So I was talking earlier today with my good friend, Lowell Heddings, regarding certain annoyances we had with web frameworks. The conversation started talking about the difficulties of developing PHP applications due to the lack of a debugger, but (as conversations on web frameworks are wont to do) eventually the migrated to Rails.

I mentioned how I’d always been a bit distrustful of a web framework which ran on the one-request-per-persistent-process model. My reasons for distrusting this sort of framework were mainly related to performance and scalability, but Lowell brought up an interesting point that I hadn’t considered before: process-shared sessions.

See, because Rails runs as a parallel share-nothing process (i.e. the mongrel instances don’t share memory state with one-another), trivial in-memory data can be a bit of a problem. Also, caching to disk can be a bit problematic since there are multiple processes attempting to access the on-disk data simultaneously. I’m sure many clever solutions have been mooted to solve this problem, but (I think) a new one occurred to me as we were discussing the problem. (caveat: I haven’t fleshed out this solution at all with any code. I’m posting it because Lowell thought it was an idea worth sharing) :-)

My solution to the problem drops back into one of the hottest topic in Ruby today: JRuby on Rails. JRuby allows you to run Rails applications in a Java-based and integrated environment, even to the extent of using existing Java tools, libraries and process containers. An ancillary project to JRuby even allows you to package up your Rails application within a WAR and host it directly within a Java application server like Tomcat or Glassfish.

Packaging Rails as a WAR obviously necessitates a bit of configuration that wouldn’t normally go into the deployment of a Rails application. For example, if you want to serve multiple requests with a Rails app concurrently, you would run multiple Mongrel instances and use an Apache mod_proxy configuration which would proxy requests to available server processes. Java web application, while they do run on the persistent-process model, are designed to be multi-threaded, rather than multi-process. Thus, Java web applications automatically scale to involve concurrent requests since all that is required is the spawning of a new thread within the application server.

Rails however, is designed to be hosted as a separate process and would have to be extensively modified to support this kind of scaling directly. The solution found by the JRuby-extras project is to allow multiple Rails instances to be controlled by a single Rails app WAR. The number of instances is controlled by a configuration option within the web.xml config file within the WAR. Thus, the JRuby WAR will spawn a new instance of the JRuby interpreter for each Rails process (as Rails expects), all hosted in separate threads within the same JVM instance, controlled by the Java app server. Thus, instead of going to all the hassle of configuring a new Mongrel instance and adding a mod_proxy rule to scale your Rails app, all that is necessary is to change a value in an XML file and to redeploy.

This single-process encapsulation of Rails in this way allows us to provide a solution for Lowell’s shared data problem. Instead of storing shared data (like application sessions or cached values) within the Rails process itself, the Rails application should use a Java class (hosted within the same WAR) to store the data. Thus, all shared application data should be stored at lower level than Rails, within the Java process itself. Java has some very solid concurrency APIs which would allow this sort of shared state without data corruption.

JRuby-on-Rails Diagram

As the application scaled and the shared data requirements increased, the SharedCache class (as we’ll christen it) could be modified to cluster, using Terracotta. The Rails WAR itself is already transparently clusterable through Java application servers like JBoss. As Lowell put it, it’s like an infinitely scalable memcached, without all the fuss.

Well, it’s a thought anyway…