Skip to content

Software Transactional Memory in Scala

6
Oct
2008

The fact is that there are a lot of problems that are hard to solve in a purely-functional style.  That’s not to say that no solution exists, but certain problems are very difficult to model without shared state.  In such situations, a slightly different approach to concurrency must be considered.  Actors are inapplicable, seeing as they are the embodiment of “shared-nothing” continuation passing, and fork/join doesn’t really help us.  Usually, when faced with the need for shared mutable state, most developers will resort to the old-fashioned technique of locking everything and controlling access to that state one thread at a time.  The solution often goes something like this:

public class Container {
    private int value;
    private final ReadWriteLock lock = new ReentrantReadWriteLock();
 
    public int getValue() {
        lock.readLock().lock();
        try {
            return value;
        } finally {
            lock.readLock.unlock();
        }
    }
 
    public void setValue(int value) {
        lock.writeLock().lock();
        try {
            this.value = value;
        } finally {
            lock.writeLock.unlock();
        }
    }
}

Obviously, most scenarios which call for such locking are a bit more complex, but you get the picture.  There are two very serious problems with locking:

  • Implementation rules are ad hoc
  • Throughput is reduced

The first issue is obvious: there is nothing actually preventing us from accessing value without locking.  What’s worse, it would be just as easy to lock, access value and then forget to unlock when we were done.  There is no logical connection between value and its lock, the implementation rules are solely enforced by convention.  Of course, we can avoid the problem of forgetting to unlock by using a locking strategy which is lexical in nature, but in that case our second issue (throughput) becomes even more pronounced:

private int value;
private final Object lock = new Object();
 
public int getValue() {
    synchronized (lock) {
        return value;
    }
}
 
public void setValue(int value) {
    synchronized (lock) {
        this.value = value;
    }
}

The problem here is that everything blocks everything else.  If one thread wants to retrieve value, it literally stops the world while the other threads wait their turn.  Data integrity is guaranteed, but this was accomplished only by creating a serial bottleneck on read/write access to value.  On top of that, we still have no enforced correlation between the lock and its corresponding piece of data.

“Glass is Half Full” Concurrency

What we have been doing here is called “pessimistic locking”.  We have literally started with the assumption that we are going to run into data contention issues.  Our entire locking mechanism is designed with the worst case scenario in-mind: lots of threads trying to write to the same piece of data simultaneously.  However, not every concurrent system is going to have constant interference between threads.  As it turns out, in practice data contention is the exception rather than the rule.  Given this, it would be very nice if we could design a system which gave threads the “benefit of the doubt”, assuming that they will not conflict, while still somehow maintaining data integrity.

The answer to this is “optimistic locking”.  We turn the problem completely on its head.  Instead of assuming that there will be problems and so locking everything preemptively, we assume that everything will be fine and just allow everyone free, non-blocking access.  Of course, an “open-door policy” with regards to shared mutable state isn’t enough in and of itself, we have to have some rule for dealing with contention issues, and some way of detecting those conflicts when they happen.  Enter transactions…

The idea behind a purely-optimistic transactional memory model is that all write operations must occur within a transaction.  Data can be read any time, but the memory model will ensure that the data is never in an “intermediate state” – or between writes.  Let’s consider the more complicated scenario of the transfer of funds from one bank account to another.  The steps are as follows:

  1. Withdraw $500 from account A
  2. Deposit $500 into account B

Of course, this isn’t really an algorithm, it’s a high-level overview.  We’re really looking at something a bit more complicated than it would seem:

def transfer(amount: Int, a: Account, b: Account) {
  a.balance = a.balance - amount
  b.balance = b.balance + amount
}

I avoided using the += and -= operators so as to illustrate the problem more explicitly.  This operation has four separate operations on shared mutable state (the account balance).  It’s not too difficult to see how this could go terribly wrong in the case where two separate transfers are taking place simultaneously.  For example, we could transfer $500 from account A into account B, while at the same time we transfer $200 from account B into account C.  Remember, we’re not locking anything, so we run the danger that our concurrent execution order will be interleaved in the following fashion:

  1. [Thread1] Get balance of account A
  2. [Thread1] Set balance of account A to its former amount less 500
  3. [Thread1] Get balance of account B
  4. [Thread2] Get balance of account B
  5. [Thread1] Set balance of account B to its former amount plus 500
  6. [Thread2] Set balance of account B to its former amount less 200
  7. [Thread2] Get balance of account C
  8. [Thread2] Set balance of account C to its former amount plus 200

The operation in red is the one we need to be concerned about.  Thread2 retrieved the balance of account B just prior to when it was modified by Thread1.  This means that when Thread2 calculates the new balance (less 200), it will be basing its result on a now-obsolete balance.  When it sets the balance, the $500 that was transferred from account A will mysteriously vanish, the unfortunate victim of a common race condition.

In a transactional system, both transfers would be separately handled in their own transaction.  Neither of them would modify the other’s data until all operations are completed, at which point the transaction would commit and the data would become “live”.  Thus, when the Thread1 adds $500 to account B, the actual balance of account B will remain constant at its original value (prior to the transaction).  Once the transaction commits, both the balance of A and B will be updated essentially simultaneously.  There will never be a point where $500 is discovered missing “in transit” between two accounts.

This is only half the equation though.  The real genius of the transactional model is just prior to committing, a transaction validates itself, ensuring that all of the data it worked with is still in the same state it was at the beginning of the transaction.  If something has changed, then the transaction is invalidated and must be re-run automatically.  Our issue from above can never happen because the Thread2 transaction will attempt to validate itself, only to discover that the balance of account B has been changed in the meantime.  Rather than yield to the race condition, the Thread2 transaction will throw away its work and start from scratch.  Assuming that nothing else is running, the second validation will be successful (since nothing will have changed concurrently) and the transaction will commit.

It all sounds much more complicated than it actually is.  People familiar with modern databases like Oracle should already be comfortable working with optimistic transactional models.  The technique is a little less common in software, but it can still be applied.

In order to make this work in a conventional application setting, we need to introduce a few more abstractions.  There are several ways to go about this, but I have chosen to follow the model laid down by Rich Hickey in his implementation of Clojure’s STM.  In turn, Clojure seems to take a fair bit of inspiration from Haskell’s STM monad, although it does not port over concepts like transaction composition and what Simon Payton Jones calls “choice”.  Basically, the design can be distilled as follows:

  • Each item of shared state must be stored in a reference
  • References can be read at any point, but they can only be modified within a transaction
  • The data contained within a reference must itself be immutable, the reference simply allows you to switch the data it contains
  • Within a transaction, reading a reference does not return its current value, but rather the value it had precisely when the transaction began.  This allows data to change outside the transaction without disrupting its internal processing.
  • Changes made to a reference inside a transaction are not world-visible until the transaction completes and is committed.  Validation ensures that no data is lost during the commit.
  • Transactions must never contain side-effects as they may be executed multiple times

With all that in mind, let’s get to work on an implementation!

References

Since these are the primitive building blocks of our STM, it seems logical that we should start here.  At an extremely basic level, we will need a structure which looks something like the following:

class Ref[T](private var value: T) {
  def get = value
 
  def :=(value: T) {
    this.value = value
  }
}

This is the basic idea anyway.  We want to be able to retrieve a value from a reference using get, and we want to be able to store a value in the reference using the := operator (hat tip to all of you rabid Pascal fan-boys).  Unfortunately, we haven’t really accomplished anything here.  Yes, we have a reference to wrap around a mutable piece of data, but there are no associated concurrency semantics.  Remember the definition of a reference?  We must only be able to write to it within a transaction.  Furthermore, the value returned from a reference within a transaction is not necessarily its current, world-visible state, but rather the data it had the moment the transaction began.

In order to accommodate these requirements, we will introduce the concept of a context.  Both read and write access to a reference will require a context to be present.  We will have one context for the entirety of world-visible state.  Additionally, each transaction will have its own context.  In that way, we can ensure that transaction modifications are kept local to itself prior to commit while at the same time preventing changes from other transactions from becoming visible after the transaction has started (potentially leading to data integrity problems).  Our API has now evolved to something more like the following:

class Ref[T](private var value: T) {
  // ???
 
  def get(c: Context) = c.retrieve(this)
 
  def :=(value: T)(c: Context) {
    c.store(this)(value)
  }
}

I left the constructor undefined because of the fairly obvious problem in this implementation: how does the Context get the values in the first place?  For the moment, let’s put that problem aside and deal with a more interesting one: limiting write access to within transactions.  Recall that we’re going to have two different kinds of contexts: a live context which is global, as well as a context which is local to the transaction.  The requirement alone implies a way to statically restrict reference mutability to within transactions: require a type of context other than the live one.  To this end, we will derive the following closed hierarchy.  It is closed because there will be no other Context implementations, preventing over-zealous API extensions from fouling up our semantics.

image

We will define Context to be an abstract class and LiveContext to be a singleton object.  Each transaction will have its own Transaction context which it will use in both read and write operations.  LiveContext will only be used outside of a transaction, when there is no other context available.  To enforce this, we will restrict the type of the Context taken by the reference assignment operator to only accept Transaction:

def :=(value: T)(c: Transaction) {
  c.store(this)(value)
}

With this in mind, we can start to envision what the syntax for reference operations will look like:

def transfer(amount: Int, a: Ref[Int], b: Ref[Int])(c: Transaction) {
  a.:=(a.get(c) - amount)(c)
  b.:=(b.get(c) + amount)(c)
}
 
// ...
val accountA = new Ref[Int](1500)
val accountB = new Ref[Int](200)
 
// somehow call `transfer` as a transaction
println("Account A: " + accountA.get(LiveContext))
println("Account B: " + accountB.get(LiveContext))

And just when our design was looking so nice too.  We’ve succeeded in preventing at compile time the modification of references outside of transactions, but we did it at the cost of a tyrannical syntax.  Fortunately, Scala has a handy mechanism for cleaning up syntax such as this, one which should reduce the volume of annoying bulk by several orders of magnitude: implicits.

IScala makes it possible to mark parameters as accepting implicit values.  These parameters in turn become implicit values themselves.  When a method which accepts an implicit parameter of a specific type is called with an implicit value of the same type in scope, the parameter can be omitted entirely.  Thus, by marking LiveContext as an implicit object and appropriately annotating the last parameter of transfer as well as the Context parameters of Ref’s accessor and mutator, we can eliminate almost all of the annoying bulk in the above example:

implicit object LiveContext extends Context {
  ...
}
 
class Ref[T](value: T) {
  // ???
 
  def get(implicit c: Context) = c.retrieve(this)
 
  def :=(value: T)(implicit c: Transfer) {
    c.store(this)(value)
  }
}

With these extra modifiers in place, we can redo our transfer snippet to see how things look.  To cut down on line length, we will also assume that accountA and accountB are references in some sort of global scope:

val accountA = new Ref[Int](1500)
val accountB = new Ref[Int](200)
 
def transfer(amount: Int)(implicit t: Transaction) {
  accountA := accountA.get - amount
  accountB := accountB.get + amount
}
 
// somehow call `transfer` as a transaction
println("Account A: " + accountA.get)
println("Account B: " + accountB.get)

Pretty slick, and we aren’t even finished yet!  We can add an implicit conversion from Ref[T] to T, eliminating the need to call get anywhere in the code:

implicit def refToValue[T](ref: Ref[T])(implicit c: Context) = {
  ref.get(c)
}

…and the final example syntax:

val accountA = new Ref[Int](1500)
val accountB = new Ref[Int](200)
 
def transfer(amount: Int)(implicit t: Transaction) {
  accountA := accountA - amount
  accountB := accountB + amount
}
 
// somehow call `transfer` as a transaction
println("Account A: " + accountA)
println("Account B: " + accountB)

With the exception of the := syntax (which is unfortunately unavoidable), you would never be able to tell that references are being used rather than conventional vars.  Even better, we have managed to preserve our static assurance that no reference may be modified outside of a transaction.  If we were to attempt to call the := method without an instance of Transaction on-hand, the Scala type checker would complain and our code would not compile.  The we need to do to close the loop is to make sure that the only place a Transaction instance can be obtained is inside a transaction (seems logical).

Atomic

For the moment, let’s shelve all of the implementation requirements for the STM and instead focus on the API.  We already have an almost elegant syntax for references, but we’re still missing one final piece: initiating the transaction.  Being incredibly imaginative and innately creative, I decided the best way to devise an API for this would be to hit Wikipedia.  After all, why come up with something original when someone smarter has already solved the problem?

 // Insert a node into a doubly-linked list atomically
 atomic {
     newNode->prev = node;
     newNode->next = node->next;
     node->next->prev = newNode;
     node->next = newNode;
 }

 atomic (queueSize > 0) {
     remove item from queue and use it
 }

Credit: Software transactional memory # Proposed language support

It’s a fairly straightforward API, one which could be expressed in Scala as it stands if we really wanted to.  Obviously the body of the first transaction is a little closer to C than Scala, but we can just take that as example pseudo-code and move on.  The real core of the API is a single method, atomic, which takes a function value and executes that function as a transaction.  If a conditional is provided, it acts as a guard.  If the guard ever returns false, the transaction aborts and is not retried.

The only problem here is we haven’t accounted for our implicit Transaction parameter.  Intuitively, we could just add a parameter of type Transaction to the function value, but unfortunately Scala doesn’t allow anonymous functions with implicit parameters.  That leaves one of two options: either take the Transaction as a parameter to the anonymous function and then store it in an implicit value within the function; or pass an actual method which takes an implicit parameter.  In truth, either approach will work, but for the remainder of the article I will use the method-passing approach, rather than the separate assignment within the anonymous method.

Altogether, our API in action looks something like this:

val accountA = new Ref[Int](1500)
val accountB = new Ref[Int](200)
 
def transfer(amount: Int)(implicit t: Transaction) {
  accountA := accountA - amount
  accountB := accountB + amount
}
 
atomic(transfer(500)(_))    // run transaction
 
println("Account A: " + accountA)
println("Account B: " + accountB)

Notice the extra (_) syntax in the call to atomic.  (thanks, Jesper!)  This is required because transfer accepts an implicit parameter.  Without it, Scala doesn’t know whether we mean to call the method using some implicit value in scope or if we want the function value itself.

Remember this API.  We’ll come back to the implementation of this method after we have completed the rest of the framework.  For now, let’s move onto context…

Context

Returning to the dark and mysterious internals of the implementation, we now come to the deep morass of Context.  As it turns out, everything in the STM will revolve around this class and its two separate implementations.  Recall that it is responsible for retrieving reference values, controlling what data is visible from within a transaction and what data is live.  Generally speaking, we are going to have the following design with respect to where reference data is handled:

  • If in LiveContext, access the One True copy of the data.
  • If in Transaction, access the One True copy if and only if there is no transaction-local version.  Once the reference has been read, cache the data and return that value from all future reads within the transaction.  We’re cheating a bit here since we aren’t taking a snapshot of the world on transaction start, we’re waiting for the first reference read.  As long as we get our validation right, this should be ok.

Logically, this is a simple default operation clouded by a fairly substantive special case (within a transaction after the reference has been read or written once).  To handle the special case, it only makes sense that we use a map from reference to the data to which it corresponds in that transaction.  Remember that the data within a transaction may be a bit behind the live copy.

We could use another map for the general case, but we can make things even simpler than that.  Rather than having a global map from reference to value, we can just store the live values within the Ref objects themselves.  They will still need to delegate to their context to retrieve and store that value, but the actual live version can be kept locally.  This simplifies LiveContext tremendously:

class Ref[T](private[stm] var value: T) {
  def get(implicit c: Context) = c.retrieve(this)
 
  def :=(value: T)(implicit t: Transaction) {
    t.store(this)(value)
  }
}
 
// ...
sealed abstract class Context {
  def retrieve[T](ref: Ref[T]): T
}
 
implicit object LiveContext extends Context {
  def retrieve[T](ref: Ref[T]) = ref.value
}

In this iteration, we have simplified Context a little bit, seeing as it is never actually used for storage.  Notice that all we need to do in LiveContext is delegate right back to the reference.  This may seem like a bit of superfluous indirection but it becomes absolutely essential once we start considering Transaction:

import collection._
 
final class Transaction private[stm] () extends Context {
  private val world = mutable.Map[Ref[Any], Any]()
 
  def retrieve[T](ref: Ref[T]) = {
    val castRef = ref.asInstanceOf[Ref[Any]]
 
    if (!world.contains(castRef)) {
      world(castRef) = ref.value
    }
 
 
 
 
 
    world(castRef).asInstanceOf[T]
  }
 
  def store[T](ref: Ref[T])(value: T) {
    val castRef = ref.asInstanceOf[Ref[Any]]
    world(castRef) = value
  }
}

This is the reason we needed to redirect through retrieve rather than just grabbing the value directly in Ref.  Within a transaction, any dereferencing will polymorphically come to an instance of this class.  The mutable map, world, handles the transaction-local cache of all values once they have been accessed.  Thus, the reference can change after we have looked at it (when another transaction commits) and it doesn’t affect the values local to our transaction.  This technique is exceedingly powerful and in no small part responsible for the higher throughput made possible by the transactional model.

Incidentally, it is worth noting that within a transaction, references are not thread safe.  Thus, if you start a transaction and then start manipulating references concurrently within that same transaction, bad things will happen.  This isn’t really a problem though because transactions are always designed to be single-threaded from start to finish.  They are used in multi-threaded situations, they do not use multiple threads.

Commitment

Now that we have the basic nuts and bolts of our STM framework, we need to start considering how we are going to commit transactions.  The process is two fold: first, we validate all references either read or written to by the transaction, checking for anything which may have changed in the interim; and second, we copy all new data from our transaction-local cache into the live references.  For the sake of simplicity, we will only allow one transaction to commit at a time.  We could do a little better, but this should work just fine for the experimental stuff.

Validation is a toughy, but the second commit step is fairly easy to satisfy: just loop through a set of all writes and copy the changes into the corresponding Ref.  The necessary changes are as follows:

final class Transaction private[stm] () extends Context {
  private val world = mutable.Map[Ref[Any], Any]()
  private val writes = mutable.Set[Ref[Any]]()
 
  ...
 
  def store[T](ref: Ref[T])(v: T) {
    val castRef = ref.asInstanceOf[Ref[Any]]
 
    world(castRef) = v
    writes += castRef
  }
 
  def commit() = {
    CommitLock.synchronized {
      // TODO  validate
 
      for (ref <- writes) {
        ref.value = world(ref)
      }
    }
 
    true
  }
}
 
private[stm] object CommitLock

The commit method returns a Boolean value indicating whether or not the commit was successful.  If a conflict was detected the method returns false and (presumably) the transaction will be tried again.  Since we aren’t doing any validation just yet, we will always return true.  Note that we don’t really have to lock anything related to the Ref instances in order to write their data.  It is possible for another thread to be reading data from these same references at precisely the same time as we are writing to them.  However, primitive memory operations are atomic by definition, meaning that we don’t need to worry about data integrity on the level of a reference value.

Validation is actually the most important part of the transaction commit process and quite possibly the most important facet of the entire STM concept.  Without it, there is nothing to prevent data integrity from breaking down and causing problems.  (remember that $500 we lost?)  Unfortunately, our system doesn’t quite have the chops yet to support any sort of transaction validation.

In order to validate a Ref, we need to compare its state to the state it was in at the moment we read from it or wrote to it (meaning into the transaction-local cache).  We can’t just compare values, partially because equals isn’t fast enough, but also because it doesn’t provide a strong enough guarantee about whether or not data has changed.  What we need is some value which we control which indicates deterministically the current state of a Ref and which can be used later to determine if that state has changed.  In short, we need a revision number:

class Ref[T](value: T) {
  private[stm] var contents = (value, 0)    // init revision to 0
 
 
}

The revision numbers have to be controlled statically with thread-safety, so the best place for them should be in the Transaction singleton.  Transaction (the companion object for Transaction the Context) also contains our implicit conversion as well as that enigmatic atomic method that we still have yet to implement.

object Transaction {
  private var rev_ = 1
  private val revLock = new AnyRef
 
  private def rev = revLock.synchronized {
    val back = rev_
    rev_ += 1
    back
  }
 
  ...
}

Empowered by a revision increment system which is guaranteed to produce unique values for each invocation, we can expand upon our concept just a little bit.  Each Transaction will have a unique revision number associated with it (maintained within the rev field).  Assuming a transaction successfully commits, it will not only modify the value of the references in question but also their revision, which it will set to its own number.

This revision system can be used in validation of transaction commit.  Whenever we read or write to a Ref for the first time, we will store its current revision number within the transaction.  When it comes time to commit the transaction, we can loop over our revision map and compare with the actual revision of the reference in question.  If all of the expected revisions match up with reality, the transaction checks out and we can go ahead and commit.  Otherwise, we have to assume that a transaction operating concurrently modified a reference we used and committed after we started our own transaction.  Once this is known, we can’t simply commit over the other transaction’s changes, throwing away all of that money.  Our transaction must be retried from scratch.

Now that we know how to validate, we can finally look at a completed version of commit (and supporting cast):

final class Transaction private[stm] (val rev: Int) extends Context {
  private val world = mutable.Map[Ref[Any], Any]()
  private val writes = mutable.Set[Ref[Any]]()
  private val version = mutable.Map[Ref[Any], Int]()
 
 
  def retrieve[T](ref: Ref[T]) = {
    ...
 
    if (!world.contains(castRef)) {
      ...
 
      if (!version.contains(castRef)) {
        version(castRef) = castRef.contents._2
      }
    }
 
    ...
  }
 
  def store[T](ref: Ref[T])(v: T) {
    ...
 
    if (!version.contains(castRef)) {
      version(castRef) = ref.contents._2
    }
 
    ...
  }
 
  def commit() = {
    CommitLock.synchronized {
      val back = world.foldLeft(true) { (success, tuple) =>
        val (ref, _) = tuple
        success && ref.contents._2 == version(ref)
      }
 
      if (back) {
        for (ref <- writes) {
          ref.contents = (world(ref), rev)
        }
      }
 
      back
    }
  }
}

It’s a lot of code, but all fairly straightforward.  The validation simply bears out our intuition: checking the revisions initially retrieved from the references with their current values.  Here again we are making use of the fact that memory access is atomic.  We don’t need to worry about a revision changing out of sync with a value because both of them are encapsulated by a 2-tuple within the Ref itself.  Meanwhile, the validation can be trusted because of the CommitLock: we don’t need to worry about another transaction committing between our validation and when we actually get around to saving our values.

Atomic 2.0

I said we would come back to this, and here we are!  We never did implement the atomic method, which is a bit of a shame seeing as it is what is responsible for kicking off the entire transactional process.  Not only that, but it creates the Transaction instance, ensures that the transaction gets committed once it has finished and it retries if that commit fails.  Set in code, it looks something like this:

def atomic[A](f: (Transaction)=>A): A = atomic(true)(f)
 
def atomic[A](cond: =>Boolean)(f: (Transaction)=>A) = {
  def attemptTransact(): A = {
    if (cond) {
      val trans = new Transaction(rev)
 
      try {
        val result = f(trans)
 
        if (trans.commit()) result else attemptTransact()
      } catch {
        case _ => attemptTransact()    // if exception, assume conflict and retry
      }
    } else null.asInstanceOf[A]
  }
 
  attemptTransact()
}

The only weird thing here is the use of an internal function to control the transaction dispatch process.  This is necessary because we need to emulate the pattern of a do/while loop without losing the ability to capture a return value.  This is the one minor feature of the transaction API that we have designed which I haven’t already discussed: the ability to return a value from a transaction.  Practically speaking, this isn’t needed too often since the very purpose of a transaction is to modify references, but it is still a pattern worth keeping in hand.

You will notice that we have some generic catch-all exception handling going on here.  Whenever a transaction throws an exception, we assume that it has failed and we try again.  To be honest, I wrestled back and forth with this decision.  After all, if a transaction comes across a NullPointerException on its first try, it’s not likely to do any better the second time around, or the third, or the fourth, or the…  On the other hand, there is a remote but very real possibility that data can briefly get into an inconsistent state within a transaction.  To understand how, consider the following abstract scenario:

  1. Transaction A writes data to reference A and reference B
  2. Transaction B starts after transaction A but before it has committed
  3. Transaction B reads from reference A
  4. Transaction A commits
  5. Transaction B reads from reference B
  6. Transaction B does the funky chicken and tries to commit
  7. Transaction B fails validation and has to try again
  8. Everything works fine on the second time around

Everything gets straightened out in the end due to the validation, but there is still this hairy moment within transaction B that we have to worry about.  Within the transaction, we have some data from before transaction A committed, and some from afterward.  Validation is never going to let this slip by, but while the transaction is still executing we can run into some fairly serious issues.

Imagine for example that reference B is changed in such a way that it throws an exception in transaction B unless it is paired with just the right value of reference A.  Since we have an inconsistent pairing between references A and B within transaction B, we will get an exception caused directly by a breakdown in data integrity.  Because this is a data integrity issue, we want this exception to trigger a redo in the transaction.  However, we can’t really predict such exceptions, so there’s no way we can distinguish between a legitimate exception (which should be propagated outside the transaction) and a data integrity fault.  In short, we’re in trouble.

Given the way we designed the STM framework, I don’t really see a nice, efficient way to avoid this problem.  There are other approaches we could take which don’t have this issue, but that would either require an entirely different implementation or a much smarter developer taking the lead.  I would be interested to see how Clojure handles this case…

A Less-Trivial Example

Now that we have this full STM framework, it might be interesting to put it to work.  To that end, let’s create a simple market simulation with three businesses and a hundred potential customers, each with their own accounts and separate balances.  All of these entities will have their own thread operating on their behalf, making purchases, excepting refunds and keeping the Mafia off their back.  Bear in mind that this is a truly concurrent simulation with full thread semantics.  We’re not going to use actors or anything like that to reduce the number of threads involved, everything will operate in true parallel.

To make things a little more interesting, we will also associate a 7.5 % fee with each transfer, paid to a separate account.  Also, each transfer will be logged presumably for later review.  To make everything fun, we will have one final thread which monitors the entire market, summing up every account and checking the total.  The obvious concern is that some collision in access of the shared data will lead to the unexpected loss (or gain) of wealth.  So long as the total market value remains constant, we can assume that shared state is being handled appropriately.

Let’s start out by defining our accounts and basic structure.  Bear in mind that Account is a type alias (similar to a C-style typedef) for Ref[Long], it is not a separate class.

object BankFrenzy {
  import Transaction._
 
  type Account = Ref[Long]
 
  private val fees = new Account
  private val log: Ref[Vector[Transfer]] = new Ref(EmptyVector)
 
  def main(args: Array[String]) {
    val business1 = new Account(15000)
    val business2 = new Account(20000)
    val business3 = new Account(50000)
 
    val people = (0 until 100).foldLeft(Vector[Account]()) { (vec, i) =>
      vec + new Account(1000)
    }
  }
}

Remember that all data contained within a reference must be completely immutable, otherwise the STM framework cannot help you.  If you can just change the underlying data within a reference at will, then transaction semantics are useless (since revision tracking breaks down).  To that end, we will use my port of Clojure’s persistent vector to handle the transfer log; and just because it’s fun, we will also use it to manage personal accounts.

Moving on, we should probably define some functions to operate on these accounts.  Remember, they must take an implicit parameter of type Transaction, otherwise they will be prevented from modifying references.

def transfer(amount: Long, from: Account, to: Account)(implicit t: Transaction) {
  log := log + Transfer(amount, from, to)
 
  val less = Math.round(amount * 0.075)
 
  from := from - amount
  to := to + (amount - less)
  fees := fees + less
}
 
def sum(portfolio: Vector[Account])(implicit t: Transaction) = {
  portfolio.foldRight(0:Long) { _ + _ }
}

One thing worthy of attention here is the sum method: we aren’t actually modifying any references in this method, so why bother putting it within a transaction?  The answer is to enforce data integrity.  We want to make sure that we see a truly consistent picture of the entire market, and the only way to be absolutely sure of that is to use the conceptual “snapshot of the world” maintained by a transactional log.

Also notice that the sum method does not store its result within a reference, it actually returns a value.  This is one of the neat features of our STM implementation: it allows transactions to return values just like functions.  This dramatically reduces the boilerplate which would normally be required to get the result of a calculation from within a transaction.

With this infrastructure in place, we can go ahead and create all of the threads we’re going to need for the simulation:

val market = people + business1 + business2 + business3 + fees
var running = true
 
val secActor = thread {
  while (running) {
    val total = atomic(sum(market)(_))
    println("Market value: $" + total)
 
    sleep(10)
  }
}
 
val businessActor = thread {
  while (running) {
    atomic(transfer(250, business1, business2)(_))    // transfer rent
 
    sleep(200)
  }
}
 
val peopleActors = for {
  i <- 0 until people.length
  val p = people(i)
} yield thread {
  atomic(transfer(50, p, business3)(_))       // payoff the mob
  atomic(transfer(i * 10, p, business1)(_))   // purchase from business1
  atomic(transfer(i * 3, business2, p)(_))    // refund from business2
}

This is assuming that we have already defined a utility method, thread, in the following fashion:

def thread(f: =>Unit) = new Thread {
  override def run() {
    f
  }
}

Just one of the many little tricks made possible by the staggering power of Scala’s syntax.

There isn’t much worthy of attention within our simulation threads, it’s just a lot of concurrent operations running against some shared state.  If we wanted to really have some fun, we could add println status messages to each thread, allowing us to try the simulation multiple times and watch the thread interleaving change from run to run.  However, all we’re really interested in with this simulation is the assurance that data integrity is maintained at all times.  To see that, all we really need is to check the starting market value, the ending value and some period market auditing while the simulation is in progress:

println("Starting market value: $" + atomic(sum(market)(_)))
 
businessActor.start()
secActor.start()
 
for (pa <- peopleActors) pa.start()
for (pa <- peopleActors) pa.join()
running = false
 
businessActor.join()
secActor.join()
 
println("Total fees: $" + fees)
println("Final market value: $" + atomic(sum(market)(_)))

If we compile and run the simulation, the output will look something like this (on a dual-core, 2 Ghz processor):

  Starting market value: $185000
  Market value: $185000
  Market value: $185000
  Market value: $185000
  Total fees: $5258
  Final market value: $185000

We could try the simulation a hundred times with different CPU loads and even on separate machines, the results will always be the same.  While there may be a greater or lesser number of concurrent market audits during the simulation, the values retrieved each time will be the same.  From this we can conclude one important fact: we have succeeded in designing an STM framework which preserves data integrity.

In simulations like this one with a high degree of contested data, STM may actually be slower than a traditional, fine-grained locking strategy.  However, just think for a moment about trying to write this simulation using Java’s ReentrantReadWriteLock class.  It would be nearly impossible to design such a system, let alone maintain it.  There would always be the danger that we would accidentally get the locking in the wrong order, or forget to lock something before we access it.  In short, such an effort would be extremely hazard prone, and far more verbose.  Using our STM framework, the resulting code was clean and simple to understand.  It’s easy to see why techniques like this are really starting to catch on.

Conclusion

Hopefully this has been an informative and enjoyable foray into the world of transactional memory systems.  I will retroactively apologize for any areas where my facts are in err; I’m certainly quite new to all of these concepts.  As per usual, the library described in this article is available for download.  I was a little lax on the testing side of life, so you may want to vet the library a little bit before you trust it unsupervised in your data center.

There are a number of interesting aspects to STM that I didn’t cover in this article, such as retry and the related function, check (implemented in the framework).  Also left completely untouched is the monadic operation orElse which can be used to compose transactions in a “first try this, then try that on failure” sort of way.

STM is a very active research topic today with a lot of the brightest minds in the industry pondering ways to make it better.  While it certainly doesn’t solve all of the problems associated with concurrency, it does have the potential to simplify locking and produce better performance under some conditions.  Definately a technology to watch as it develops!

Integrating Scala into JRuby

29
Sep
2008

More and more projects (especially startups) have been choosing to build their software in multiple languages.  Rather than using SQL for the database, XML for the RPC and Java for the everything else, companies have learned that sometimes a different language can serve best in a specific area.  Ola Bini provides some guidance with regards to methodology in this area in the form of what he calls “language layers”.  He suggests that an application can be divided logically into separate layers, and for each of these layers there exists a class of language – be it dynamic, static, compiled or otherwise – which can best accomplish the task at hand.  All of that aside, there is one problem which is absolutely assured when dealing with polyglot programming: integration between the languages.

I’m told that this issue came up at the JVM languages conference this past week.  The problem of integrating very separate languages in a natural way is not an easy one, even when all languages in question are on the JVM.  So far, the only solution that anyone has been able to come up with is just to use Java as an intermediary.  After all, JRuby integrates with Java, as does Clojure, therefore JRuby has at least some path of integration with Clojure and vice versa.

The problem is that such integration is not really idiomatic.  As the title of this post implies, we’re going to consider the example of Scala and JRuby.  I’ve already written about how to create a Scala DSL for calling JRuby directly, so let’s look at the other side of the problem: it’s certainly possible to use Scala classes from within JRuby, but it isn’t exactly a pleasant proposition.  Let’s imagine that I want to make use of my Scala port of Rich Hickey’s excellent immutable persistent vector data structure:

import com.codecommit.collection.Vector
 
vec = Vector.new
vec = vec.send('$plus'.to_sym, 1)
vec = vec.update(0, 2)
 
puts vec.apply 0       # 2

Straightforward, but ugly.  Let’s break this down a little bit…  The import and instantiation are both self-explanatory, so we come to the rather cryptic invocation of send.  In case you don’t know, this is a Ruby method which makes it possible to invoke arbitrary methods on a given object, including those with illegal characters.  I happen to know that Scala will compile the + operator to a method named $plus within class Vector.  JRuby is perfectly happy to handle and forward this call as necessary, despite the fact that you could never actually declare this method in pure Ruby ($ has special meaning).  Thus, this invocation is the same as the following Scala snippet:

vec = vec + 1

In other words, append 1 to the vector and assign the resulting new vector back into vec.

Moving on, we come to the invocations of update and apply.  These should be a bit more comprehensible to those familiar with Scala’s intricacies.  In short, these methods are how you overload the parentheses and parentheses/assignment operators.  Thus, our last two lines correspond as follows:

vec = (vec(0) = 2)
println(vec(0))

This was just a trivial example, you can imagine how a more complicated Scala API could be neigh unusable in JRuby.  Intuitively though, it shouldn’t have to be this way.  After all, JRuby supports many of the same syntactic power as Scala: it has some operator overloading, closures and even more complicated features like mixins.  With all of this common ground, surely there is a way to make the two coexist more naturally?  I mean, optimally we could have something like this:

import com.codecommit.collection.Vector
 
vec = Vector.new
vec = vec + 1
vec = (vec[0] = 2)       # problematic
 
puts vec[0]              # 2

What we have just done is informally define a desired syntax for a domain-specific problem.  Does that sound like an internal DSL to anyone else?

Our goal is to create a simple Ruby API that can be required into any JRuby application to improve the integration with Scala.  To do this, we will need to find a way to convert Ruby features into their corresponding Scala features by going through Java.  Once we find this translation, we can use meta-programming and dangerously-cool Monkey Patching to augment JRuby’s existing Java integration.  In this way, we don’t have to start our integration layer from scratch, we can just “pretty up” JRuby’s existing features, taking advantage of the fact that Scala classes are Java classes.

Step One: Operators

From our example above, converting the invocation of a $plus method into a Ruby + operator seems to be the easiest challenge to tackle.  Ruby does support operator overloading; unfortunately, this support is incredibly limited when compared with Scala’s awesome power.  For example, in Scala it is possible to invent arbitrary operators, a feature which is heavily used in the Scala standard library.  Ruby has no such capability, operators are implemented using conventional techniques and hard-coded rules in the parser.

In order to avoid blowing this experiment completely out of proportion, we’re only going to implement translations for the set of conventional Ruby operators.  It would be possible to also implement Scala operators which are made by combining existing Ruby operators (e.g. the ++ operator) using techniques developed for the Superators gem, but to do so would be extremely difficult.

We saw in our original example that the Scala + operator is translated into an invocation of the $plus method.  It stands to reason that we could make use of this fact in our translation from Ruby to Scala.  The trick is finding a comprehensive list of Scala’s operators and what methods they compile to.  Fortunately, I had already investigated these translations for a different project:

Scala Operator Compiles To
= $eq
> $greater
< $less
+ $plus
- $minus
* $times
/ div
! $bang
@ $at
# $hash
% $percent
^ $up
& $amp
~ $tilde
? $qmark
| $bar
\ $bslash

Repeated iterations of an operator become corresponding repetitions of the equivalent character sequence.  Thus, ++ becomes $plus$plus.  This suggests a very natural strategy for converting Ruby operator invocations into Scala: string substitution.  We can easily Ruby operator calls using method_missing, substitute the appropriate character sequences and then attempt the modified call against the same object.  This idea, when translated into Ruby, is almost as simple as that:

OPERATORS = {"=" => "$eq", ">" => "$greater", "<" => "$less", \
        "+" => "$plus", "-" => "$minus", "*" => "$times", "/" => "div", \
        "!" => "$bang", "@" => "$at", "#" => "$hash", "%" => "$percent", \
        "^" => "$up", "&" => "$amp", "~" => "$tilde", "?" => "$qmark", \
        "|" => "$bar", "\\" => "$bslash"}
 
alias_method :__old_method_missing_in_scala_rb__, :method_missing
def method_missing(sym, *args)
  str = sym.to_s
 
  str = $&[1] + '_=' if str =~ /^(.*[^\]=])=$/
 
  OPERATORS.each do |from, to|
    str.gsub!(from, to)
  end
 
  if methods.include? str
    send(str.to_sym, args)
  else
    __old_method_missing_in_scala_rb__(sym, args)
  end
end

Right at the end, after we have transformed the method name to convert any Ruby operators into their Scala equivalents, we actually check to see if the method exists instead of blindly sending the invocation.  The reason for this is to avoid creating an infinite loop in cases where a method really doesn’t exist.  We can rely on the presence of bona fide methods due to the way that JRuby proxies Java objects into Scala.

Astute readers will notice the extra bit of regular expression checking right at the head of the method.  We didn’t cover this in our target example, but this transformation is none-the-less quite important.  Both Scala and Ruby provide mechanisms to simulate assignment to class members through method calls.  In Ruby, you just suffix the method name with =, whereas in Scala the correct suffix is _=.  This extra transformation looks for those situations and converts to the appropriate result.  Thus, Scala var fields are now accessible within JRuby using the same syntax as standard attr_reader/attr_writer methods in Ruby.

Two other operators which might be useful to enable are the [] and []= operators, generally used for collections access.  Scala doesn’t actually support these operators, reserving square brackets for declaring type parameters.  However, it does provide a very similar syntax with parentheses.  As a refresher, here is how we might use an array within Scala:

val arr = new Array(5)
arr(0) = 5
arr(1) = 4
arr(2) = 3
arr(3) = 2
arr(4) = 1
 
println(arr(1))        // 4

At compile-time, this translates into corresponding invocations of the apply and update methods of class Array.  Specifically, apply is used to retrieve data, while update is used to assign it.  These are the direct corollaries to Ruby’s [] and []=.  It would only be natural to translate invocations of these operators into corresponding calls to apply and update, and we can do this simply by extending our method_missing just a little bit:

  # ...
 
  gen_with_args = proc do |name, args|
    code = "#{name}("
 
    unless args.empty?
      for i in 0..(args.size - 2)
        code += "args[#{i}], "
      end
      code += "args[#{args.size - 1}]"
    end
 
    code += ')'
  end
 
  if str == '[]'
    eval(gen_with_args.call('apply', args))
  elsif sym.to_s == '[]='
    eval gen_with_args.call('update', args)            # doesn't work right
  elsif methods.include? str
    send(str.to_sym, args)
  else
    __old_method_missing_in_scala_rb__(sym, args)
  end
end

The gen_with_args proc is merely a nifty little utility closure used to cut down on redundancy without creating an entire method.  It actually generates a String of Ruby code which invokes the specified method with the given arguments.  This is required because JRuby’s Java integration is only so smart.  It will try to properly convert invocations of n-arity Java methods when Ruby arrays are passed as arguments, but it can only do so well.  The safest route is to just call the method, passing each element of the array as a successive argument.

All of this looks quite nice, and it seems to satisfy our requirements, but there is just one problem: Ruby doesn’t behave itself with respect to the []= operator.  Rather than taking the sensible approach and allowing the receiving class to define its return value, it actually ignores whatever value is returned from the []= method and forces it to be the final argument.  In other words, the above code will work, but it might not integrate with Scala in quite the way we would expect.  Consider our original motivating example:

import com.codecommit.collection.Vector
 
vec = Vector.new
vec = vec + 1
vec = (vec[0] = 2)       # problematic
 
puts vec[0]              # 2

Well, I said that this would be problematic.  The issue is the value of the expression vec[0] = 2 is not a new Vector instance as returned by the Vector#update method, but actually the Fixnum value 2.  Thus, the snippet above can never work.  In what is probably the most bone-headed feature of the entire language, Ruby forces every invocation of []= to return the assigned value.  This works fine (sort-of) for mutable, side-effecting implementations like Array and Scala’s ListBuffer, but it completely falls apart for immutable, functional collections like Vector.  So much for a language that “makes me smile”…

The solution of course is to always be aware of these cases where the update method does not return Unit and just use update directly rather than []=.  Thus, a working version of the given snippet would be as follows:

import com.codecommit.collection.Vector
 
vec = Vector.new
vec = vec + 1
vec = vec.update(0, 2)
 
puts vec[0]              # 2

By calling update directly, we ensure that its return value is captured and assigned back into vec.  Kind of ugly, but unavoidable.

Step Two: Mixins

Mixins are probably the most useful and code-saving feature in the entire Scala language.  Like interfaces, they provide the flexibility of multiple inheritance, but they also bring with that the DRYness of shared definitions between subtypes.  In fact, it may even be safe to say that the overwhelming power of the Scala collections library is owed primarily to traits.  After all, without the inherited method definitions from Iterable, each collection would have to re-implement foldLeft, map, flatMap and each of the myriad of common methods defined by collections.

It just so happens that Ruby also has mixins of a very similar variety.  It seems natural then that we should be able to use Scala mixins within JRuby just as if they were standard Ruby modules.  Unfortunately, Java does not have mixins, meaning that unlike operator overloading, there isn’t a nice and easy technique we can use to convert between the two worlds.  The good news is that JRuby does give us a bit of a head start in making this integration work: it’s interface implementation syntax.

Ruby doesn’t support interfaces directly, but JRuby allows us to use the standard include syntax on a Java interface.  Once we have included the interface and implemented its methods, we can pass an instance of our Ruby class into Java methods expecting instances of that interface.  JRuby takes care of all of the magic required to proxy the method calls.  Syntactically, it looks something like this

class DoSomething
  include java.lang.Runnable
 
  def run
    puts 'Running!'
  end
end

This is how we want our mixin integration to work.  We should be able to include a Scala trait into a Ruby class, implement any abstract methods and then expect everything to work per-normal.  In order to accomplish this, we’re going to need to override the include method to check to see if its target is a Scala trait.  If that is the case, then include should create proxies for all of the defined methods within the trait.  Basically, we have to do the same thing for Scala traits that Ruby does automatically for its modules.

Fortunately, this too is an area where Ruby’s notion of “open classes” will come to our rescue.  Not only does Ruby allow us to add methods to classes at runtime, it also allows us to redefine core methods within the standard library; in this case, Module#include.

class Module
  alias_method :__old_include_in_scala_rb__, :include
  def include(*modules)
    modules.each do |m|
      clazz = nil
      begin
        if m.java_class.interface?
          cl = m.java_class.class_loader
          mixin_methods_for_trait(cl, cl.loadClass(m.java_class.to_s))
        end
      rescue
      end
 
      # ...
    end
 
    modules.each {|m| __old_include_in_scala_rb__ m }
  end
end

The most important thing to see here is regardless of whether or not the module in question is a trait, we still forward the inclusion onto the old definition.  This is critical, because it means that JRuby is still able to create an interface proxy for that class, allowing us to pass instances of the including class into Scala and have them treated as instances of the trait in question.

The key to our actual inclusion of the defined methods is the way in which Scala compiles traits.  Consider the following:

trait Test {
  def method1(): Unit
 
  def method2() {
    println("In method2()")
  }
}
 
class Usage extends Test {    // mixin
  def method1() {
    println("In method1()")
  }
}

Scala will compile this into the bytecode equivalent of the following Java code:

public interface Test {
    public void method1();
 
    public void method2();
}
 
public class Test$class {     // actually an inner-class of Test
    public static void method2(Test t) {
        System.out.println("In method2()");
    }
}
 
public class Usage implements Test {
    public void method1() {
        System.out.println("In method1()");
    }
 
    public void method2() {
        Test$class.method2(this);
    }
}

It’s very clever if you think about it.  Through a bit of compile-time magic, Scala is able to keep the method definitions within traits centralized (rather than inlining everything) as well as maintain full interface-level interop with Java.  It is actually possible to define a Java method which accepts as a parameter an instance of a particular trait.  Since traits are interfaces, javac knows how to deal with this definition and the JVM has no problems in actually dispatching the call.

We can use this implementation detail to enable our mixin of traits into JRuby classes.  We’re already testing within include to see whether or not the “module” at hand is in fact an interface, it would be a simple matter to also check for the existence of a class of the form “TraitName$class“.  If such a class exists, we could loop through all of its public static members and create corresponding proxy methods within the including class.  All of this is done within the mixin_methods_for_trait method:

def mixin_methods_for_trait(cl, trait_class, done=Set.new)
  return if done.include? trait_class
  done << trait_class
 
  clazz = cl.loadClass "#{trait_class.name}$class"
 
  trait_class.interfaces.each do |i| 
    begin
      mixin_methods_for_trait(cl, i, done)
    rescue
    end
  end
 
  clazz.declared_methods.each do |meth|
    mod = meth.modifiers
 
    if java.lang.reflect.Modifier.isStatic mod and \
	    java.lang.reflect.Modifier.isPublic mod
      @@trait_methods ||= []
 
      unless meth.name.include? '$'
        module_eval "\
def #{meth.name}(*args, &block)
  args.insert(0, self)
  args << block unless block.nil?
 
  args.map! do |a|
    if defined? a.java_object
      a.java_object
    else
      a
    end
  end
 
  begin
    scala_reflective_trait_methods[#{@@trait_methods.size}].invoke(nil, args.to_java)
  rescue java.lang.reflect.InvocationTargetException => e
    raise e.cause.message.to_s      # TODO  change over for 1.1.4
  end
end
"
 
        @@trait_methods << meth
      else
        define_method meth.name do |*args|    # fallback for methods with special names
          args.insert(0, self)
 
          begin
            meth.invoke(nil, args.to_java)
          rescue java.lang.reflectInvocationTargetException => e
            raise e.cause.message.to_s
          end
        end
      end
    end
  end
end

I know, the amount of code here is a little daunting, but it’s really not that bad.  Essentially, this just implements our intuition about how mixins should work in Ruby.  The only thing about the algorithm that we haven’t already considered is how to deal with the super-traits of traits.  Since the inheriting of method definitions is only done through static proxying, there would be no reason for Scala to compile an inherited relationship between the definition class of a sub-trait and its super-trait.  To get around this problem, we explicitly check for super-interfaces and then mixin their methods first.  This is to allow for methods which are inherited by sub-traits and then overridden.

The final little tidbit here is the way in which the proxy methods are created.  Ruby does allow us to add methods to classes at runtime, but it is unfortunately rather restrictive in how those methods can be added.  In a nutshell, there are two main techniques: module_eval and define_method.  By using define_method, we are able to avoid String evaluation and keep our editor’s syntax-highlighting happy with our sources.  Unfortunately, methods created using define_method have a very critical limitation: they cannot accept blocks.  Thus, any mixed-in trait methods which took function values as parameters would be unusable from within Ruby.

This inconsistency is fixed in Ruby 1.9, but until then we still need a way of proxying higher-order Scala methods.  Thus, we default to using module_eval.  Unfortunately, this technique also comes with its own set of issues; specifically: illegal characters.  As we saw previously, any Scala method with a non-alphanumeric name will have to use the $ character to denote certain symbols.  However, Ruby assigns special significance to symbols like $ and @, making it impossible to use them within method names.  The only way around this restriction on method names is to create such special methods using define_method, which brings us right back to our first set of problems.

The only solution I could come up with for Ruby 1.8 was to use module_eval by default, unless the method name contained a $ character, in which case define_method would be used.  With this technique, almost every case is covered.  The only remaining issue would be if a method with a non-alphanumeric name took a function as a parameter.  In this case, the method would be unusable from Ruby.  However, even a cursory glance over the Scala standard library shows that this is an extremely rare case, one which can be safely made “difficult” if it means improving the integration in other areas (trait mixins).

One final unfortunate note on this method: it doesn’t work with JRuby 1.1.3 and later (the current release is 1.1.4).  As reported by JRUBY-2999, including two separate Java interfaces with conflicting methods will actually cause the interpreter to crash.  At the time of the writing, this bug has not been resolved, either in 1.1.4 or in the latest SVN sources.  Since traits quite often have methods which overlap with inherited methods, there is really no way to get around this bug.  Thus, the library described in this article was developed with JRuby 1.1.2 and will only work with that release or any upcoming releases which fix the regression.  Interestingly, this means that the library must deal with a different, less critical JRuby bug which makes it impossible to raise Java exceptions.  As this exception bug does not entirely prevent the use of Scala mixins, it is preferable to the more serious interpreter-crash regression.

Now that we have mixins at our disposal, it is possible to further improve our Scala integration by implementing a scala_object method for Array and Hash, one which converts these objects into a form which can be passed to Scala methods expecting Seq and Map, respectively.

class BoxedRubyArray
  include Scala::RandomAccessSeq
 
  def initialize(delegate)
    @delegate = delegate
  end
 
  def apply(i)
    @delegate[i]
  end
 
  def length
    @delegate.size
  end
end
 
class Array
  def scala_object
    BoxedRubyArray.new self
  end
end

The Hash proxy is analogous, but much more verbose.  The key to this entire implementation here is the enabling of Scala mixins within JRuby.  All we have to do is mixin the RandomAccessSeq trait, implement apply and length, and suddenly we have a first-class Scala collection backed by a Ruby Array.  Not only does this allow us to use some of Scala’s higher-order magic on Ruby arrays, it also enables previously impossible usages like the following:

import com.codecommit.collection.Vector
 
vec = Vector.new
vec = vec + 1 + 2 + 3 + 4 + 5
 
arr = [6, 7, 8, 9, 0]
cat = vec.concat arr.scala_object

At the end of this snippet, cat is a Scala Iterable with contents [1, 2, 3, 4, 5, 6, 7, 8, 9, 0].  Remember that concat is actually a method within Vector which itself expects an instance of Iterable.  Fortunately, we now have a method to convert a Ruby array into just such an Iterable, which we then pass to concat achieving our final result.

Step Three: Closures

Both Ruby and Scala have this notion of closures, which are assignable, anonymous functions with access to their enclosing scope.  Closures are most often used as a syntactic device for passing and/or storing a bit of code for later invocation.  A common example that even Java has to deal with would be event handling, where a specific code block must be executed every time a button is pressed.  Scala and Ruby take this concept a little further, providing higher-order functions like map and implementing conventional iteration using foreach and each (respectively).

Optimally, we should be able to call Scala methods passing Ruby blocks where Scala function values are required.  As it happens, even without special Scala integration magic, we can already get very close to this:

vec = Vector.new
# ...
 
vec.foreach do |e|
  puts e
end

This will print every element in the vec instance.  This works because JRuby allows blocks to be passed to methods accepting interfaces.  Since Scala’s function values are in fact subtypes of traits Function0, Function1 and so on (according to arity), JRuby is able to recognize the method signatures appropriately and proxy the values.

Unfortunately, JRuby doesn’t innately understand Scala traits, so it doesn’t know that the compose method should be proxied to an implementation within the trait.  I assume that JRuby will proxy every method in the target interface to the block in question (I haven’t actually tested that).  Assuming this is the case, the moment that foreach (or any other method) attempts to invoke compose on a proxied JRuby block, the result will be a ClassCastException.  It just so happens that foreach behaves itself and there is nothing to worry about, but we cannot make that guarantee in general.

The solution of course is to do for Ruby’s Proc what we already did for Array and Hash: provide a wrapper which uses the Functionn mixin and delegates to the Proc in question.  However, unlike Array, there is no single “Function” mixin that we can use.  Instead, we must create a separate wrapper for each function arity supported by Scala…all 22 of them.  Fortunately, Ruby’s meta-programming capabilities help us out here, allowing us to define classes within a loop and then dynamically select the correct wrapper class name based on the Proc arity:

module ScalaProc
  class ScalaFunction
    def initialize(delegate)
      @delegate = delegate
    end
 
    def apply(*args)
      @delegate.call args
    end
  end
 
  for n in 0..22    # sneaky, but much more concise
    eval "\
class Function#{n} < ScalaFunction
  include Scala::Function#{n}
end
"
  end
end
 
class Proc
  def to_function
    eval "ScalaProc::Function#{arity}.new self"
  end
 
  def java_object
    to_function
  end
 
  def scala_object
    to_function
  end
end

The only abstract method within any of the Functionn traits is apply, taking the same number of arguments as the function arity.  It’s easy enough to implement this function once within the ScalaFunction superclass, which means that all we have to do within each of the arity-specific wrappers is mixin the relevant Function.

Now that we have this conversion between Ruby Proc(s) and Scala function values, we can make use of it in situations where it becomes necessary.  For example, let’s say that I have a Scala method like the following:

def doAndMultiply(by: Int)(f: (Int)=>Int) = {
  f compose { (_:Int) * by }
}

Simply put, this method takes two curried parameters, an Int along with a function value taking another Int and returning an Int.  This method then returns a new function which itself takes an Int, first multiplies it by the original first Int parameter, then applies the given function to the result.  Got all that?  :-)

We can call this function from Scala in the following way:

val comp = doAndMultiply(3) { _ + 2 }
comp(5)      // => 17

Now that we have our super-fancy Proc conversion, we are able to use an almost identical call syntax from within Ruby.  Notice how we are able to take advantage of the way in which Scala compiles curried methods to achieve a JRuby syntax which looks almost exactly like block-standard Ruby:

add = proc {|x| x + 2 }
comp = doAndMultiply(3, add.scala_object)
 
comp.call 5        # => 17

And if doAndMultiply is a method brought in via a mixin, we can do even better:

comp = doAndMultiply 3 do |x| 
  x + 2
end
 
comp.call 5        # => 17

This works because of the way in which we coerce parameters within the proxies for the mixed-in methods.  Recall from earlier:

args.map! do |a|
  if defined? a.java_object
    a.java_object
  else
    a
  end
end

The very reason we have to explicitly map over the method arguments is to satisfy this particular case.  JRuby’s Ruby-to-Java coercion is pretty smart, but it doesn’t seem to be up to this particular challenge (believe me, I tried).  Thankfully, a little bit of benign check-and-coerce later, our arguments are none the worse for wear and in a form that Scala can chew on.

Conclusion

As you may have guessed, I have already taken the liberty of implementing the framework described in this article.  It even has a few features I didn’t discuss, like auto-detecting the default Scala installation and the ability to use Scala function values as if they were Ruby Proc(s).  All in all, it makes for a very slick way of working with Scala libraries from within JRuby scripts in an intuitive, idiomatic fashion.  Hopefully, this should help to encourage the use of these two fascinating technologies together in future projects.

  • Download scala.rb (does not work with JRuby 1.1.3 or 1.1.4)

Higher-Order Fork/Join Operators

22
Sep
2008

I think we can all agree that concurrency is a problem.  Not really a problem as in “lets get rid of it”, but more the type of problem that really smart people spend their entire lives trying to solve.  Over the years, many different solutions have been proposed, some of them low-level, some more abstract.  However, despite their differences, a common thread runs through all of these ideas: each of them attempts to ease the pain of decomposing operations in a reorderable fashion.

Surprisingly, the word “ordering” is not often heard in conjunction with parallelism.  Most of the time, people are thinking in terms of server/client or broker/investor.  If you really deconstruct the issue though, there is actually a deeper question underlying all concurrency: what operations do not depend upon each other in a sequential fashion?  As soon as we identify these critical operations, we’re one step closer to being able to effectively optimize a particular algorithm with respect to asynchronous processing.

By the way, I really will get to fork/join a little later, but I wanted to be sure that I had laid a solid groundwork for the road ahead.  Without understanding some of the fundamental theory behind fork/join, it will be impossible to see how it can be applied effectively to your next project.

Factorial

One of the odd things about computer science is a depressing lack of imaginative examples.  Not being one to break with tradition, I’ve decided to kick off our little quest with a little time spent in the well-trodden foothills of factorial.  This should help us to establish some terminology (which I’m arbitrarily assigning for the purposes of this article) as well as the basic concepts involved.  A simple recursive implementation (in Scala of course) would look like this:

def fac(n: Int): Int = if (n < 1) 1 else fac(n - 1) * n

For each number, this function performs a number of discrete operations.  First, it checks to see if the index is less than 1.  If so, then the function returns immediately, otherwise it proceeds on a separate course.  This is a branching operation.  Since the “true branch” is uninteresting, we will focus on the case when the index is in fact greater than 1.  In this case, we have three critical operations which must be performed.  They are as follows (temporary names are fictitious):

  • Subtract 1 from n and store the value in some temporary $t1
  • Dispatch to function fac passing the value from $t1
  • Multiply result from dispatch with n and return

All this may seem extremely pedantic but please, bear with me.  Consider these operations very carefully in the topological sense.  What we’re trying to see here is if one (or more) of these operations may be ordered above (or below) one of the others.  For example, could we perhaps dispatch to fac after multiplying and returning?  Or could we perform the subtraction operation after the dispatch?

The answer is quite obviously “of course not”.  There is no way we can change the ordering in this expression because each step depends entirely upon the result from the previous.  As far as our attempts to parallelize are concerned, these three operations are completely atomic, meaning that they form an inseparable computation.

image

Since we’ve drilled down as far as we can possibly go in our implementation and so identified the most atomic computation, let’s move out one step and see if we can find anything with promise.  Stepping back through our execution sequence leads us directly to the branching operation identified previously.  Remember that our goal is to identify operations which can be shuffled around in the execution order without affecting the semantics. (does this feel like pipeline optimization to anyone else?)  Unfortunately, here too we are at an impasse.  We might try moving an atomic computation from one of the branches out before the branching operation, but then we could conceivably do the wrong thing.  Since our function uses recursion, this sort of reordering would be very dangerous indeed. 

The truth is that for factorial, there are absolutely no operations which can be moved around without something going wrong.  Because of this property, we are forced to conclude that the entire factorial operation is atomic, not just its false branch.  Unfortunately, this means that there is no way to effectively transform this function into some sort of asynchronous variant.  That’s not to say that you couldn’t calculate factorial of two separate numbers concurrently, but there is no way to modify this implementation of the factorial function in a parallel fashion1.  This is truly the defining factor of atomic computations: it may be possible to reorder a series of atomic computations, but such a reordering cannot affect the internals of these computations.  Within the “atom”, the order is fixed.

So what does reordering have to do with concurrency?  Everything, as it turns out.  In order to implement an asynchronous algorithm, it is necessary to identify the parts of the algorithm which can be executed in parallel.  In order for one computation to be executed concurrently with another, neither must rely upon the other being at any particular stage in its evaluation.  That is to say, in order to execute computation A at the same time as computation B, the ordering of these two computations must be irrelevant.  Providing that both computations complete prior to some computation C (which presumably depends upon the results of A and B), the aggregate semantics of the algorithm should remain unaffected.  You could prove this, but I really don’t feel like it and frankly, I don’t think anyone reading this will care.  :-)

Fibonacci

Now that we have some simple analysis on factorial under our belt, let’s try something a little tougher.  The Fibonacci series is another of those classic computer science examples.  Curiously enough, the implementation used by every known textbook to explain recursion is actually one of the worst possible ways to implement the calculation.  Wikipedia has an excellent description of why this is, but suffice it to say that the intuitive approach is very, very bad (efficiency wise).

However, the “good” implementations used to calculate the nth number of the Fibonacci series just aren’t as concise or easily recognized.  Also, they’re fairly efficient in their own rights and thus see far less benefit from parallelization at the end of the day.  So rather than taking the high road, we’re going to just bull straight ahead and use the first algorithm which comes to mind:

def fib(n: Int): Int = if (n < 2) n else fib(n - 1) + fib(n - 2)

Like factorial, this function makes an excellent poster child for the syntactic wonders of functional programming.  Despite its big-O properties, one cannot help but stop and appreciate the concise beauty of this single line of code.

As is common in simple recursion, our function begins with a conditional.  We have a simple branching operation testing once again for a range (n < 2), with a base case returning n directly.  It is easy to see how the “true branch” is atomic as it consists of but one operation.  We’ve already made a hand-wavy argument that branches themselves should not be dissected and moved around outside of the conditional, so it would seem that our only hope rests with the recursive “false branch”.  In words, we have the following operations:

  • Subtract 1 from n and store the value in temporary $t1
  • Dispatch to function fib passing the value from $t1; store the value in $t1
  • Subtract 2 from n and store the value in temporary $t2
  • Dispatch to function fib passing the value from $t2; store the value in $t2
  • Add values $t1 and $t2 and return

Ah, this looks promising!  We have two “blocks” of operations which look almost identical.  Printed redundancy should always be a red flag to developers, regardless of the form.  Printed redundancy should always be a red flag to developers, regardless of the form.  In this case though, we don’t want to extract the duplicate functionality into a separate function, that would be absurd.  Rather, we need to observe something about these two operations, specifically: they do not depend on one-another.  It doesn’t matter whether or not we have already computed the value of fib(n - 1), we can still go ahead and compute fib(n - 2) and the result will be exactly the same.  We’re going to get into trouble again as soon as we get to the addition operation, but as long as both dispatches occur before the final aggregation of results, we should be in the clear!

image

Because it does not matter in which order these computations occur, we are able to safely parallelize without fear of subtle semantic errors cropping up at unexpected (and of course, unrepeatable) full-board demonstrations.  Armed with the assurance which only comes from heady, unrealistic trivial algorithm analysis, we can start planning our attack.

Threads Considered Insane

Being a good Java developer (despite the fact that we’re using Scala), the very first thing which should come to mind when thinking of concurrency is the concept of a “thread”.  I’m not going to go into any detail as to what threads are or how they work since they really are concurrency 101.  Suffice it to say though that threads are the absolute lowest-level mechanism we could possibly use (at least on this platform).  Here we are, Fibonacci a-la Thread:

def fib(n: Int): Int = {
  if (n < 2) n else {
    var t1 = 0
    var t2 = 0
 
    val thread1 = new Thread {
      override def run() {
        t1 = fib(n - 1)
      }
    }
 
    val thread2 = new Thread {
      override def run() {
        t2 = fib(n - 2)
      }
    }
 
    thread1.start()
    thread2.start()
 
    thread1.join()
    thread2.join()
 
    t1 + t2
  }
}

I can’t even begin to count all of the things that are wrong with this code.  For starters, it’s ugly.  Gone is that attractive one-liner that compelled us to pause and marvel.  In its place we have a 25 line monster with no apparent virtues.  The intent of the algorithm has been completely obscured, lost in a maze of ceremony.  But the worst flaw of all is the fact that this design will actually require (n – 2)! threads.  So to calculate the 10th Fibonacci number, we will need to create, start and destroy 40,320 Thread instances!  That is a truly frightening value.

At first blush, it seems that we can alleviate at least some of the insanity by using a thread pool.  After all, can’t we just reuse some of these threads rather than throwing them away each time?  Unfortunately, this well-intentioned approach doesn’t quite suffice.  It turns out that we can’t really pool very many threads due to the fact that we’re utilizing a thread in fib to recursively call itself and then wait for the result.  Thus, the “parent” dispatch is still holding a resource when the “child” attempts to obtain an allocation.  Granted, we have reduced the number of required threads to a mere 2n – 4, but with a fixed size thread pool (the most common configuration), we’re still going to run into starvation almost immediately.  Apocalisp has a more in-depth article explaining why this is the case.

Something a Little Better…

For the moment, it looks like we have run into an insurmountable obstacle.  Rather than mash our brains out trying to come up with a solution, let’s move on and conceptualize how we might want things to work, at least in syntax.

Clearly threads are not the answer.  A better approach might be to deal with computational values using indirection.  If we could somehow kick off a task asynchronously and then keep a “pointer” to the final result of that task (which has not yet completed), we could later come back to that result and retrieve it, blocking only if necessary.  It just so happens that the Java 5 Concurrency API introduced a series of classes which fulfill this wish:  (what a coincidence!)

def fib(n: Int): Future[Int] = {
  if (n < 2) future(n) else {
    val t1 = future {
      fib(n - 1).get()
    }
 
    val t2 = future {
      fib(n - 2).get()
    }
 
    future {
      t1.get() + t2.get()
    }
  }
}
 
def future[A](f: =>A) = exec.submit(new Callable[A] {
  def call = f
})

I’m assuming that a variable called exec is defined within the enclosing scope and is of type ExecutorService.  The helper method is just syntax, the real essence of the example is what we’re doing with Future.  You’ll notice that this is much shorter than our threaded version.  It still bears a passing resemblance to that horrific creature of yesteryear, but yet remains far enough removed as to be legible.  We still have our issue of thread starvation to content with, but at least the syntax is getting better.

Along those lines, we should begin to notice a pattern emerging from the chaos: in both implementations so far we have started by asynchronously computing two values which are assigned to their respective variables, we then block and then merge the result via addition.  Do you see the commonality?  We start by forking our reorderable computations and finish by joining the results according to some function.  This right here is the very essence of fork/join.  If you understand this one concept, then everything else falls into place.

Now that we have identified a common pattern, we can work to make it more syntactically palatable.  If indeed fork/join is all about merging asynchronous computations based on a given function, then we can invent a bit of syntax sugar which should make the Fibonacci function more concise and more readable.  To differentiate ourselves from Future, we will call our result “Promise” (catchy, ain’t it?).

def fib(n: Int): Promise[Int] = {
  if (n < 2) promise(n) else {
    { (_:Int) + (_:Int) } =<< fib(n - 1) =<< fib(n - 2)
  }
}

At first glance, it seems that all we have done is reduce a formerly-comprehensible series of verbose constructs to a very concise (but unreadable) equivalent.  We can still make out our recursive calls as well as the construction of the base case, but our comprehension stops there.  Perhaps this would be a bit more understandable:

val add = { (a: Int, b: Int) => a + b }
 
def fib(n: Int): Promise[Int] = {
  if (n < 2) promise(n) else {
    add =<< fib(n - 1) =<< fib(n - 2)
  }
}

The only reason to use an anonymous method assigned to a value (add) rather than a first-class method is the Scala compiler treats the two differently in subtle ways.  Technically, I could use a method and arrive at the same semantic outcome, but we would need a little more syntax to make it happen (specifically, an underscore).

This should be starting to make some more sense.  What we have here is a literal expression of fork/join: given a function which can join two integers, fork a concurrent “process” (not in the literal sense) for each argument and reduce.  The final result of the expression is a new instance of Promise.  As with Future, this operation is non-blocking and very fast.  Since the arguments themselves are passed in as instances of Promise, we literally don’t need to wait for anything.  We have now successfully transformed our original fib function into a non-blocking version.  The only thing left is a little bit of syntax to “unwrap” the final result:

val num = fib(20)
num()                  // 6765

Incidentally, the =<< operator was not chosen arbitrarily, its resemblance to the “bind” operator in Haskell is quite intentional.  That is not to say that the operation itself is monadic in any sense of the word, but it does bear a conceptual relation to the idea of “binding multiple operations together”.  The operator is inverted because the bind operation is effectively happening in reverse.  Rather than starting with a monad value and then successively binding with other monads and finally mapping on the tail value (as Scala does it), we are starting with the map function and then working our way “backwards” from the tail to the head (as it were).  None of the monadic laws apply, but this particular concurrency abstraction should tickle the same region of your brain.

An End to Starvation

I half-promised a little while back that we would eventually solve the issue of thread starvation in the implementation.  As mentioned, this particular issue was the central focus of an article on Apocalisp a few weeks back.  For full details, I will again refer you to the original.  In a nutshell though, it looks like this:

  • Operation A dispatches operations B and C, instructing them to send the result back to A once they are finished
  • Operation A releases the thread
  • Operation B executes and sends the result back to A
  • Operation C executes and sends the result back to A
  • Operation A combines the two results and sends the final result back to whoever dispatched it
  • …and so on, recursively

Rather than stopping the world (or at least, our little thread) while we wait for a sub-operation to complete, we just tell it to give us the result as soon as it’s done and we move on.  The whole thing is based around the idea of asynchronous message passing.  The first person to say “actors” gets a gold star.

Every Promise is an actor, capable of evaluating its calculation and sending the result wherever we need it.  The =<< builds up a "partially-applied asynchronous function" based on the original function value we specified (add), binding each Promise in turn to a successive argument (a nice side-benefit of this is compile-time type checking for argument binding).  Once the final argument is bound, a full-fledged Promise emerges with the ability to receive result messages from the argument Promise(s).  Once every value is available, the results are aggregated in a single collection and then passed in order to the function.  The final result is returned and subsequently passed back to any pending actors.  It's a classic actor pattern actually: don't block, just tell someone else to call you as soon as they are ready.

With this strategy, it is actually possible to execute the whole shebang in a single thread!  This is because we never actually need to be executing anything in parallel, everything is based on the queuing of messages.  Of course, a single-threaded execution model would completely ruin the entire exercise, so we will just trust that Scala's actor library will choose the correct size for its internal thread pool and distribute tasks accordingly.

Conclusion

In case you hadn't already guessed, I've actually gone and implemented this idea.  What I have presented here is a bit of a distillation of the "long think" I've had regarding this concept and how it could be done.  The only important item that I've left out is what Doug Lea calls the "granularity control".  Basically, it's a threshold which describes the point at which the benefits of executing a task asynchronously (using fork/join) are outweighed by the overhead involved.  This threshold can be seen in my benchmark of the library.  Performance numbers look something like this (on a dual-core, 2 Ghz 32-bit processor):

Calculate fib(45)
Sequential Time 14682.403901 ms
Parallel Time 7515.882423 ms
Sequential Memory 0.023438 KB
Parallel Memory 3131.548828 KB

For the mathematically challenged, the results show that the parallel execution using Promise was 95.351698% faster than the same operation run sequentially.  That's almost linear with the number of cores!  Accounting for the overhead imposed by actors, I would expect that the impact on performance would approach linearity as the number of cores increases.

Fork/join isn't the answer to the worlds concurrency problems, but it certainly is a step in the right direction.  Also, it's hardly a new approach, but it has generally remained shrouded in a murky cloud of academic trappings and formalisms.  As time progresses and our industry-wide quest for better concurrency becomes all the more urgent, I hope that we will begin to see more experimentation into improving the outward appearance of this powerful design pattern.

Implicit Conversions: More Powerful than Dynamic Typing?

15
Sep
2008

One of the most surprising things I’ve ever read about Scala came in the form of a (mostly positive) review article.  This article went to some lengths comparing Scala to Java, JRuby on Groovy, discussing many of its advantages and disadvantages relative to those languages.  Everyone seems to be writing articles to this effect these days, so the comparison in and of itself was not surprising.  What was interesting was an off-hand comment discussing Scala’s “dynamic typing” and how it aids in the development of domain specific languages.

Now this article had just finished a long-winded presentation of type inference and compilation steps, so I’m quite certain that the author was aware of Scala’s type system.  The more likely target of the “dynamic typing” remark would be Scala’s implicit conversions mechanism.  I have heard this language feature described many times as being a way of “dynamically” adding members to an existing class.  While it would be incorrect to say that this feature constitutes a dynamic type system, it is true that it may be used to satisfy many of the same design patterns.  Consider the facetious example of a string “reduction” method, one which produces an acronym based on the upper-case characters within the string:

val acronym = "Microsoft Certified Systems Engineer".reduce
println(acronym)            // MCSE

The immediate problem with this snippet is the fact that string literals are of type java.lang.String, a class which comes pre-defined by the language.  The only way to ensure that the above syntax works properly is to “add” the reduce method to the String class separate from its definition.  In a language such as Ruby or Groovy which have dynamic type systems, we could simply open the class definition and add a new method at runtime.  However, in Scala we have to be a bit more tricky.  We can’t actually add methods to an existing class, but we can define a new class which contains the desired method.  Once we have that, we can define an implicit conversion from our target class to our new class.  The Scala compiler sees this and performs the appropriate magic behind the scenes.  In code, it looks like this:

class MyRichString(str: String) {
  def reduce = str.toCharArray.foldLeft("") { (t, c) =>
    t + (if (c.isUpperCase) c.toString else "")
  }
}
 
implicit def str2MyRichString(str: String) = new MyRichString(str)

This contrasts quite dramatically with the Ruby implementation of the same concept via open classes (somewhat less-graciously known as “Monkey Patching”):

class String
  def reduce
    arr = unpack('c*').select { |c| (65..90).include? c }
    arr.pack 'c*'
  end
end
 
puts 'HyperText Transfer Protocol'.reduce       # HTTP

No visible type conversion is taking place here, all we did is add a method to an existing class and trust that the runtime can figure out the rest.  Indeed, for this application, we don’t really need anything else.  However, as anyone with experience implementing internal domain-specific languages will tell you, seldom is life as simple as adding a few methods to an existing class.  Consider a more complicated scenario where we need to overload the < operator on integers to operate on String values, returning true if the length of the string is less than the integer value, otherwise false.  In Scala, we would once again make use of the implicit conversion mechanism, this time with an even more concise syntax:

implicit def lessThanOverload(i: Int) = new {
  def <(str: String) = str.length < i
}

In fact, we don't even need to go this far.  It is possible to create an implicit conversion from String to Int defined on the length of the String.  This would allow existing method implementations within the Int class to operate upon String values:

implicit def str2Int(str: String) = str.length

As a matter of interest, this particular situation can be managed by one of the most convoluted and verbose languages on the market, C++:

bool operator<(const int &i, const std::string &str)
{
    return str.length() < i;
}

Despite the seemingly-dynamic nature of the problem, the statically typed language camp seems well represented in terms of solutions.  Ironically, this sort of problem is one which will be exceedingly difficult to solve in a language like Ruby.  This is primarily because method overloading is an innately static device.  That's not to say that overloading is impossible in a dynamically typed language (Groovy), but it's not easy.  To see why, let's consider the most natural implementation of our operator problem in Ruby:

class Fixnum
  def <(str)
    str.size < self
  end
end

Intuitively, this may seem like the right way to approach the problem, but the results of such an implementation would be disastrous.  At the very least, the first time anyone attempted to perform a < comparison targeting an integer, the interpreter will overflow the call stack.  In fact, any time any code uses the less-than operator on an instance of Fixnum, the interpreter will crash.  The reason for this is the invocation of < upon str.size within our "overloaded" definition.  This call creates a very tight recursive loop which will very quickly eat through all available stack frames.  We can avoid this problem by reversing the comparison like so:

class Fixnum
  def <(str)
    self >= str.size
  end
end

Now we don't have to worry about stack overflow, but in the process we have accidentally redefined integer-to-integer comparison in a very strange way:

irb(main):006:0> 123 < 'test'
=> true
irb(main):007:0> 123 < 123
=> true

Clearly, more effort is going to be required if we are to put to rest our little dilemma.  As it turns out, the final solution is surprisingly ugly and verbose:

class Fixnum
  alias_method :__old_less_than__, '<'.to_sym
  def <(target)
    if target.kind_of? String
      __old_less_than__ target.size
    else
      __old_less_than__ target
    end
  end
end

Whatever happened to Ruby as a "more elegant" language?  The unfortunate truth is that in order to emulate method overloading based on input type, we must hold onto the old method implementation while we implement a type-sensitive facade in its place.  The alias_method invocation literally copies the old less-than operator implementation and provides us with a way of referencing it within our later redefinition.  And what happens if someone else happens to monkey patch Fixnum and (for whatever reason) uses the identifier "__old_less_than__"?  Well, then we have problems.  It's like the old days of Lisp macros and endless identifier collisions.

It is true that this was an example specifically contrived to make Ruby look bad.  I could have implemented the overload using Groovy's meta-classes and been reasonably certain that everything would work out fine, but that's not the point.  The point is that there are a surprising number of situations where static typing serves not only to check for errors but also to allow extension patterns which would be otherwise impossible (or very, very difficult).  Dynamic typing isn't the panacea of extensibility that its proponents make it out to be, sometimes it isn't quite up to the task.

In fact (and this is where we come to my Digg-friendly point), I would submit that Scala (and to a lesser extent, C++) have created a mechanism for controlled extensibility which is more powerful than Ruby's open classes design.  That's not to say that there aren't situations which are easily solved using open classes and entirely intractable using only implicit conversions, but in my experience these scenarios are very rare.  In fact, I believe that it is far more common to run against a problem like my contrived overload which is greatly simplified through the use of static typing.

Ironically enough, some of Ruby's greatest pundits are starting to come around to the belief that a more controlled and well-defined model of class extension is required.  ParseTree is a Ruby framework which provides mechanisms for dynamically manipulating the AST of an expression prior to evaluation.  Conceptually, it is very similar to Lisp's macros and peripherally related to .NET's expression trees (used in LINQ).  ParseTree is used by a number of complex Ruby domain-specific languages, including Ambition, a fact which is extremely telling of how great the need is for just such a tool.  Having myself attempted a domain-specific language for constructing queries, I can state categorically that to do such a thing solely on the basis of open classes would be nearly impossible.  Even if successful, such a framework would be extremely volatile, sensitive to the slightest change in the Ruby core library, either caused by update or by other packages injecting their own meddlesome implementations into runtime classes.

Lex Spoon (co-author of Programming in Scala) once said that any language which seriously targeted domain-specific languages would have to create some sort of implicit conversion mechanism.  At the time, I was skeptical, convinced that Ruby (and similar) would always have the upper-hand in the area of class extension due to their dynamic treatment of modules and classes.  However, after some serious dabbling in the field of internal domain-specific languages, I'm beginning to come 'round to his point of view.  Implicit conversions are far from a weak imitation of Scala's dynamically typed "betters", they are a powerful and controlled way of extending types far beyond anything which can be easily accomplished through open classes.

More Persistent Vectors: Performance Analysis

1
Sep
2008

In my previous post, I introduced the concept of "persistent vectors" and walked through one implementation of the idea.  When I actually pushed the post out, I was pretty happy with my code, but it seems I still have much to learn.  :-)   A number of very smart people replied, suggesting ways that the implementation could be cleaned up and improved.  Among these intrepid commenters was David MacIver, who correctly pointed out the similarities between my persistent vector and his IntMap class (coming in Scala 2.7.2).  Needless to say, my interest was piqued, so over the course of last week, I spent a fair amount of time going over his implementation as well as the implementations proposed by researchers in the past.

I also took the time to create a proper performance test suite for my vector implementation, one which could compare with other conceptually-similar implementations in a controlled and repeatable environment.  The results were both interesting and instructive, so I figured I would take the time to share them here.

Essentially, the performance suite runs through six tests, each of which designed to illuminate either a strength or a weakness in the underlying implementation.  These tests are run against four different classes:

  • Vector (my persistent implementation)
  • ArrayBuffer (from scala.collection.mutable)
  • IntMap (David MacIver)
  • Map

The addition of the last test target was more curiosity than anything else.  I wanted to see just how improved was IntMap over Map for integer keys.  The results turned out to be rather surprising:

  Time Memory
  Vector ArrayBuffer IntMap Map Vector ArrayBuffer IntMap Map
Fill Sequential 190.51ms 15.39ms 37.15ms 115.14ms 67.11 MB 3.93 MB 22.29 MB 12 MB
Fill Random 392.98ms 2028.43ms 128.35ms 103.3ms 64.97 MB 513.59 MB 39.89 MB 10.94 MB
Read Sequential 28.01ms 3.83ms 23.21ms 35.24ms 6.67 MB 0.02 KB 0.02 KB 3.37 MB
Read Random 92.8ms 11.14ms 54.81ms 63.8ms 8.01 MB 0.02 KB 0.02 KB 2.01 MB
Reverse 0.02ms 0.01ms - - 0.09 KB 0.04 KB - -
Compute Length 0.01ms 0.01ms 5.11ms 0.3ms 0.02 KB 0.02 KB 0.02 KB 2.23 MB

As you can see, IntMap triumphed over the other immutable data structures (including Vector) at just about every turn.  To understand why this is surprising, we need to spend a little time examining the theoretical properties of the two primary implementations: IntMap and Vector.

Patricia Tries

I already spent a fair bit of time explaining the concept of partitioned tries in the previous article, so I’m not going to reiterate all of that information here.  In a nutshell, the implementation of Vector is based upon the concept of a trie with an extremely high branching factor where the path to each node encodes its index.  Unlike List, the data structure is not fully persistent, meaning that some data copying must take place upon insert.  Specifically, a new array of branches must be allocated for the specific parent node of the inserted value and so on recursively on to the root.  The advantage to this partially-persistent implementation is that we can take advantage of the constant access time afforded by the use of arrays under the surface.  The unfortunate truth is that fully persistent data structures do not offer constant access time (at least, none that I know of), and thus are generally unsuitable for implementing random-access vectors.

The idea for this implementation comes originally from Phil Bagwell (interestingly enough, a researcher at LAMP, Martin Odersky’s research department at EPFL) in a paper entitled "Ideal Hash Trees".  His original concept though was for a more efficient hash table data structure, not necessarily with immutability as a requirement.  This concept was then adapted by Rich Hickey for his language, Clojure.  Finally, I expanded upon Clojure’s persistent vectors somewhat by changing their trie distribution from little- to big-endian and wrote up the result in Scala.  There are some other minor differences between Hickey’s design and my own, but the data structures are essentially identical.

Like my Vector implementation, the idea for IntMap began its life as a research paper, this time by Chris Okaski and Andrew Gill.  This paper is quite an interesting read if you’ve got a spare afternoon, although it does make use of SML rather than Scala as a base language.  In summary, the idea was to create an efficient, persistent map structure for integer keys.  Superficially, this sounds quite similar to Hickey’s modification of Bagwell’s concept, but there are many important distinctions below the surface.

At the extremely lowest level, IntMap actually makes use of a structure known as a "Patricia trie" with a fixed branching factor of two.  Much like Vector, IntMap encodes the key (index) of the data node within its path.  This encoding is based on the individual bits of the index.  However, unlike Vector, the ordering is little-endian.  Also, to avoid needlessly growing trees to absurd depths, linear common sub-keys are merged into a single prefix node.  This is what differentiates Patricia tries.  To illustrate, if our branching factor were 10 and we were to store at indexes 1234 and 2234, the common prefix "234" would be represented in a single node, rather than three separate nodes trivially linked to one-another.

This use of a low branching factor in the trie is extremely useful when performing insertions.  Specifically, more of the trie structure is preserved untouched from one modification to another.  Literally, IntMap is more persistent than Vector.  While this is great for writes, it does make read times a little less efficient.  Specifically, IntMap reads are worst-case O( log2(k) ), where k is the index in question.  For random data input, the average case is reduced somewhat by the prefix collapsing, but this should not be too significant.

By contrast, Vector uses an extremely high branching factor (by default) and so offers read efficiency which is O( logb(k) ), where k is the index and b is the branching factor.  Due to the practical limitations imposed on integer length, this translates into an upper-bound of O(7), which is (for all intents and purposes) constant.  Unfortunately, this analysis does not seem to be born-out by the performance data.

Possible Explanation

The only answer I can think of to explain the disparity between IntMap and Vector (both in time and in space utilization) is the use of a List[Int] in Vector to find the target node in the data structure.  This List is required primarily because I wanted the data distribution in the trie to be optimized for sequential access, therefore requiring the trie encoding to be big-endian on the index rather than little-endian.  The unfortunate truth is there’s no clean mathematical method (that I know of) which would allow the deconstruction of a number based on its most significant value in an arbitrary base.  In fact, the implementation of computePath (as suggested by Jules) actually cheats and makes use of the fact that persistent List(s) are constructed from the tail-end:

@inline
def computePath(total: Int, base: Int) = {
  if (total < 0) {
    throw new IndexOutOfBoundsException(total.toString)
  } else {
    var back: List[Int] = Nil
    var num = total
 
    do {
      back = (num % base) :: back
      num /= base
    } while (num > 0)
 
    back
  }
}

As efficient as this method is on most modern processors, it can never be as fast as a simple bit-masking operation.  Not only that, but it requires the creation of massive numbers of small, immutable objects (cons cells).  I believe that it is this instantiation overhead which is eating up the extra memory on reads and killing the overall performance.

Unfortunately, I can’t seem to conceive a better way of doing big-endian data distribution.  So are there any clever mathy people out there who have a brilliant way of deconstructing the index head-first rather than from the tail end?  If I could do that, then I could remove the List entirely from the implementation and rely instead on in-place calculations.  Maybe then I could catch up with David’s blisteringly fast IntMap:-)