Skip to content

Infinite Lists for the Finitely Patient

17
Nov
2008

Functional programming has a lot of weird and abstract concepts.  Monads of course are the poster child for all that is strange and confusing in functional languages, but there are other examples.  At first glance, it seems that the concept of an infinite list would be just as bizarre and incomprehensible as anything else in the paradigm, but really things aren’t as bad as all that.  In fact, even the die-hard imperative programmer can likely benefit from the patterns and techniques enabled by this obtuse construct.

For the first part of the article, I’m going to use a lot of examples involving numbers.  This is just because I’m lazy and only want to introduce one concept at a time.  Everyone understands the idea that there are an infinite amount of integers starting from 0 and counting upward.  Please bear with me all the way to the end, I promise I will give some more convincing motivation for infinite lists along the way.

If you’re already familiar with the semantics of infinite lists, feel free to skip to the section answering that all-important question: Why Do We Care?

Syntax

Before we dive into the topic of infinite lists and how they are practically applicable, it’s important to understand the underlying mechanics and (critically) how the API works in practice.  I’ll be using Scala to illustrate.  Any language which supports lambdas would actually suffice, but Scala’s unique blend of the functional and the object-oriented can be very useful in illustrating infinite lists in an imperative context.  Also, for the duration of the article, I will be making use of the following implicit conversion:

class RichStream[A](str: =>Stream[A]) {
  def ::(hd: A) = Stream.cons(hd, str)
}
 
implicit def streamToRichStream[A](str: =>Stream[A]) = new RichStream(str)

The only thing you need to know about the above is that it dramatically simplifies the syntax required when working with infinite lists in Scala.  Since the focus of this article is infinite lists and not implicit conversions, we will just take the above as magic incantations and leave it at that.

The primary medium for working with infinite lists in Scala is the scala.Stream class.  This class is not magic, you could write it yourself in plain-old-Scala without too much trouble.  Stream inherits from the all-encompassing scala.Iterable trait, meaning that familiar methods like map, foldLeft and foreach are all available for use with an infinite list as well as a finite one.

In terms of mechanics, infinite lists are very similar to finite ones: composed of a head—which is some data of type A—and a tail—which is another Stream[A].  The critical distinction which allows infinite lists to be infinite is the fact that this tail is lazily evaluated.  That means that the tail as a value is not available until you ask for it.  This is a profound idea with far reaching consequences, particularly in languages which take it to its greatest extreme (Haskell).

Just like with a normal, finite List, we build instances of Stream using a cons cell.  This is basically the in-code representation of head/tail concept.  Normally, this “cons for Streams” is handled by the singleton method Stream.cons (the equivalent of Clojure’s lazy-cons).  However, thanks to our magic implicit conversion noted above, we can make use of the same right-associative :: operator that we use with regular every-day finite Lists.  For example, we can easy create an infinite list of the natural numbers (from 0 to ∞):

def from(n: Int): Stream[Int] = n :: from(n + 1)
 
val nats = from(0)

In this example, from is what is known as a generator function.  That is to say, it takes an input and transforms it directly into some output structure.  Well, actually that’s only part of the definition, but the rest of it gets a little weird…

One very important property of from which should immediately jump out at you is the fact that it is infinitely recursive.  It takes a number, invokes this mysterious :: operator on that value and then recursively calls back to itself.  There is no conditional guard, no base case, just an endless series of calls.  Intuitively, this should lead to non-termination at runtime…except that it doesn’t.  Remember what I said about the lazily-evaluated tail?  This is where that idea really begins to take effect.  The from function is not infinitely recursive; at least, not right away.

Because the tail of a Stream is lazy, there is no need to actually invoke from(n + 1) until that particular element of the list (or one which comes after it) is required.  To prove this intuition, we can add a side-effect to the from function so that we can trace the execution:

def from(n: Int): Stream[Int] = {
  println("Requesting n = " + n)
  n :: from(n + 1)
}
 
val nats = from(0)

This code will print exactly (and only) the following:

  Requesting n = 0

The invocation from(1) is never processed because it does not need to be.  We can force further evaluation of the list by requesting a later element:

nats(4)      // get the fifth natural number (4)

This will print the following:

  Requesting n = 1
  Requesting n = 2
  Requesting n = 3
  Requesting n = 4

Notice that we don’t evaluate for n = 0 a second time.  Infinite lists are lazy, but not inefficient.  Each element is evaluated exactly once, and only as necessary.  You can exploit this property to do some really amazing things (such as the most concise Fibonacci sequence you will ever see).

As mentioned previously, all of the standard utility methods available on collections are also usable with infinite lists.  For example, it might be moderately interesting to get a list of all multiples of 5.  We could create a new generator function (e.g. fromBy5) which only returned multiples of five, but there is a much easier technique we could apply.  We already have a list of all natural numbers, why not make use of that?  If we multiplied every natural number by 5, the result would be identical to the infinite list of the multiples of 5.  If you haven’t taken calculus, this statement probably sounds a little weird.  After all, wouldn’t the list of all natural numbers multiplied by 5 be bigger than just the list of the multiples of five?

Actually, the answer to this question is “no”, and we could prove that mathematically if it weren’t entirely besides the point.  The really interesting thing to focus on is the technique suggested: take the infinite list of natural numbers (which we already have) and multiply each element by 5, resulting in another infinite list.  If you’re at all familiar with the higher-order processing of finite collections, you should be thinking map:

val fives = nats map { _ * 5 }

For those of you not “in the know”, the ever-cryptic { _ * 5 } syntax actually means the following: { x => x * 5 }.  The advantage of the former over the latter is merely brevity.

The definition of the map method goes something like this:

For every element within the target collection, apply the specified function value and store the result in the corresponding location of a new collection.

“For every element…”  That phrase should be raising huge red flags.  It implies that map is trying to iterate over the entire nats collection—a collection of infinite size.  Once again, we seem to have created a non-terminating expression.

Fortunately, the laziness of infinite lists comes to our rescue.  Just like :: doesn’t evaluate the tail until it is absolutely required to do so, map doesn’t actually invoke the function it is passed until an element is retrieved from the mapped list.  Once again making use of our println-enabled from method, we can test when the elements are being retrieved:

val nats = from(0)
val fives = nats map { _ * 5 }
 
println("Querying list...")
fives(4)                      // get the fifth multiple of five (20)

When this code executes, it will print the following:

  Requesting n = 0
  Querying list...
  Requesting n = 1
  Requesting n = 2
  Requesting n = 3
  Requesting n = 4

Surprised?  It helps to remember this simple fact: everything about an infinite list is lazy.  Nothing is evaluated until it absolutely must.

So, that means we can do something like the following, right?

for (n <- fives) {
  println(n)
}

If we were to execute this snippet, the result would be as follows:

  Requesting n = 0
  0
  Requesting n = 1
  5
  Requesting n = 2
  10
  Requesting n = 3
  15
  Requesting n = 4
  20
  Requesting n = 5
  25
  Requesting n = 6
  30
  .
  .
  .

In other words, this evaluation really is non-terminating.  This is because there is no way for the laziness of the list to help us.  We’re trying to iterate over every element in the list, retrieving that element for evaluation right away.  In this sort of situation, evaluation cannot be deferred, and thus the program will never end.

As it turns out, there are a number of operations on infinite lists which cannot be done lazily.  The foreach method is one of these, as are foldLeft, foldRight, reduce(Left and Right), length, etc.  As a general rule of thumb, virtually any operation on an infinite list which produces another infinite list can be performed lazily.  Otherwise, if a result is of an immediate type (such as Int or even Array), the evaluation absolutely must be performed at that exact point in the code.  More formally: any operation on an infinite list which is catamorphic (meaning that it takes the whole collection and reduces it to a single value) will force the evaluation of the collection in its entirety.

So if it’s impossible to iterate, fold or get the length of an infinite list, of what use could they possibly be?  It is true that infinite lists in and of themselves would be too “high-minded” for any practical application, but that’s what the take method is designed to fix.  The Stream#take(Int) method is a way of imposing a narrow window on an infinite list.  Rather than looking at every single natural number from 0 to infinity, we just look at the numbers from 0 to 9.  Put into code:

val nats = from(0)
val digits = nats.take(10)
 
for (d <- digits) {
  println(d)
}

This will print (exactly):

  0
  1
  2
  3
  4
  5
  6
  7
  8
  9

The beauty of this is that we have created an infinite list, taken exactly what we needed and ignored the rest.  Most of the time, this is how you will use the streams which you create.  Note that the take method still does not force evaluation of the list.  If we go back to our fancy printlnified version of from, we can see that evaluation is still being performed lazily, just limited to an upper bound of 9 (because we took the first 10 elements, starting from 0):

  Requesting n = 0
  0
  Requesting n = 1
  1
  Requesting n = 2
  2
  Requesting n = 3
  3
  Requesting n = 4
  4
  Requesting n = 5
  5
  Requesting n = 6
  6
  Requesting n = 7
  7
  Requesting n = 8
  8
  Requesting n = 9
  9

Why Do We Care?

Thus far, we have looked at a few examples of infinite lists, all of them having to do with numbers.  This list is surprisingly useful (as we will see in a moment), but it is far from the only infinite list we can create.  This section will explore some slightly more down-to-earth applications for infinite lists, including use-cases which everyone (even the dirty-scumbag imperative lovers) can relate to.  :-)

Conceptually, infinite lists are a way of flattening recursion and packaging it up in a form which can be manipulated in a less clumsy fashion.  Got that?  Good.  Now, read it again.

Let’s consider the simple (and extremely common) case in C/C++ where we must iterate over all of the elements of an array.  I’m using C++ here because Java has the enhanced for-loop which completely blows my example out of the proverbial water:

char *names[6] = { "Daniel", "Chris", "Joseph", "Renee", "Bethany", "Grace" };
int len = 6;
 
for (int i = 0; i < len; ++i)
{
    cout << names[i] << endl;
}

We’ve all written code like this.  In fact, most of us have probably written it so often that the prefix “for (int i = 0; ” has become part of our permanent muscle memory.  However, look at what we have done here.  Strictly speaking, we’re not looping over the the array of names at all.  We are actually looping over every successive integer between 0 and 6.

Replicating these semantics in a functional programming language is actually a bit tricky (assuming that we rule out higher-order solutions like foreach).  Ignoring for the moment the issues associated with side-effects and I/O, we have a somewhat more serious issue to worry about: functional languages don’t have any loops.  Of course, Scala does (while and do/while), but for the sake of argument we will ignore this fact.

The idiom for “looping” in functional languages is to use recursion.  Instead of maintaining a mutable counter (i) which changes value at each iteration, we will define a loop method which takes a parameter i, processes the ithe element of an array and then recursively calls itself, passing i + 1.  In code, this looks like the following:

val names = Array("Daniel", "Chris", "Joseph", "Renee", "Bethany", "Grace")
 
def loop(i: Int) {
  if (i < names.length) {
    println(names(i))
    loop(i + 1)
  }
}
loop(0)

Just in case you were confused as to why some people dislike functional languages, the above is “it”.  Scala will compile the above into a form which is actually just as efficient as the earlier C++ example, but that’s not really the point.  The problem is that this code is horribly ugly.  The underlying meaning (looping over the array) is completely obscured behind a fog of syntax and method dispatch.  You can easily imagine how a more complex looping example (particularly with nested loops) might get unmaintainably messy.

This is where infinite lists swoop in and save the day.  Remember when I said that infinite lists serve to bottle up recursion in a form that we can swallow?  (if you don’t remember, scroll back up and read it again)  We can make use of the fact that we have already defined an infinite list of all integers starting at 0.  All we have to do is just take exactly the number we need (names.length) and then use a for-comprehension to iterate over the result:

for (i <- nats.take(names.length)) {
  println(names(i))
}

If you actually unroll the evaluation semantics of the above, you will find that we’re still using recursion, it’s just being handled behind the scenes in the from method.  This is what I meant when I said that infinite lists “flatten recursion”.

Note that the above idiom is so common-place that Scala actually defines a specialization of infinite lists just to represent ranges of integers.  Oddly enough, this class is called “Range“.  We can create a range by using the until method on a starting integer, passing it an end-point:

for (i <- 0 until names.length) {
  println(names(i))
}

This way, we don’t have to define a new infinite list of nats every time we need to iterate over a range of numbers.

Of course, iteration over an integer range is a fairly trite example.  Everyone reading this article should know that there are better ways of accomplishing what we just did.  For example, the higher-order method foreach is capable of printing the names array in a far more concise fashion:

names foreach println

Let’s try something a little more complex.  Imagine that we have to loop through 10 random integers between -100 and 100.  Just to make this really fun, let’s say that we have to perform this as an inner and an outer loop, printing out all combinations of the same 10 random numbers.

If it weren’t for the restriction that the 10 random numbers need be the same between both loops, an imperative programmer might just do something like the following:

int a = 0;
for (int i1 = 0; i1 < 10; i1++) {
    a = // generate random number
 
    int b = 0;
    for (int i2 = 0; i2 < 10; i2++) {
        b = // generate random number
        System.out.println("(" + a + ", " + b + ")");
    }
}

This code is really nasty.  Not only that, but it doesn’t do what we want since the random numbers generated for the inner loop will be different from the ones generated for the outer.  We could technically fiddle with shared seeds and the like, but even that isn’t a really elegant solution.

Using infinite lists, the solution is surprisingly simple.  All we need to do is define an infinite list of random numbers between -100 and 100, then take the first 10 values from this list and perform standard nested iteration using a for-comprehension.  It’s much easier to write in Scala than it is to say:

def randomGen: Stream[Int] = (Math.random * 200).toInt - 100 :: randomGen
val random = randomGen
 
for (a <- random.take(10); b <- random.take(10)) {
  println("(" + a + ", " + b + ")")
}

When this executes, it will print the following sequence:

  (68, 68)
  (68, -83)
  (68, -79)
  (68, -78)
  (68, -18)
  (68, -80)
  (68, 29)
  (68, 10)
  (68, -63)
  (68, 3)
  (-83, 68)
  (-83, -83)
  (-83, -79)
  (-83, -78)
  .
  .
  .

Pretty nifty, eh?  You’ll notice that this time our generator function didn’t need to take a parameter.  This dramatically simplified the syntax involved in generating the infinite list.  However, despite the fact that we no longer require a parameter for our generator, we still need to assign the list to a value.  Consider these two (incorrect!) definitions for random:

def random: Stream[Int] = (Math.random * 200).toInt - 100 :: random
// or even...
val random: Stream[Int] = (Math.random * 200).toInt - 100 :: random

The first definition for random (as a function) will indeed work, producing an infinite list of random numbers.  However, that list will not be repeatable.  In other words, consecutive invocations of random (such as in our nested loop example) will produce different lists.  Swinging to the other end of the scale, the value definition for random will actually produce an infinite list of exactly the same number (e.g. 42, 42, 42, 42, 42, 42, ...).  The reason for this is left as an exercise for the reader.

Let’s consider one last example.  This time, we’ll look at something a little less “functional” in nature.  Imagine that you have defined a database schema which looks like the following:


people
id INTEGER
firstName VARCHAR
lastName VARCHAR
parentID INTEGER

This is a fairly simple parent/child schema with each row having a reference to its parent.  It’s not difficult to imagine a scenario where one might want to traverse the “ancestry” of a row in the table, perhaps looking for some specific criterion or just for enumerative purposes.  This could be done in Java (and other languages) by maintaining a mutable “person” variable containing the current row in the database.  This variable could be modified iteratively based on successive queries determining its corresponding parentID.  However, as you can imagine, the code would be messy and prone to subtle errors.

The better approach in this situation is actually to make use of an infinite list.  While it is true that the list could not possibly be infinite given a finite database, the concept of a lazily-constructed collection can still simplify the issue.  We would construct the list using the following generator method:

def ancestors(id: Int) = {
  val conn: Connection = ...
 
  try {
    val res = conn.executeQuery("SELECT parentID FROM people WHERE id = ?")
 
    if (res.next()) {
      val parentID = res.getInt(1)
      parentID :: ancestors(parentID)
    } else Stream.empty
  } finally {
    conn.close()
  }
}

One item of note regarding this function is the use of the Stream.empty constant.  This is precisely analogous to the Nil object used in constructing conventional lists: it simply marks the end of the linked list.  Because every person will have a finite ancestry, we aren’t actually exploiting the infinite potential (no pun intended) of streams.  In fact, we are only using the data structure as a tool to facilitate lazy evaluation.

The nice thing about representing the database traversal in the given fashion is the ability to treat the results as a fully-realized collection even before the queries have actually been run.  For example, we can generate a list of all people in the parentage of person 42, and then restrict this list to only people with an odd id.  I have no idea why this would be useful, but here it is:

val parents = ancestors(42)
val fewerParents = parents filter { _ % 2 == 1 }
 
for (id <- fewerParents) {
  println(nameForPerson(id))       // get a person's name and print
}

All of the logic associated with traversing the database tree is nicely tucked away within the ancestors function.  This separation of logic allows for much cleaner code at times, hence reducing the potential for silly mistakes.

Conclusion

Infinite lists can be a surprisingly powerful tool, even in the arsenal of someone who is still devoted to the Dark Side of imperative programming.  Our last example looked at a scenario where an infinite list could be useful, despite the obviously side-effecting nature of the problem.  Similarly, the example of a list of random numbers shows how infinite lists can even be used to solve problems which are extremely difficult to solve using classical, imperative techniques.

To summarize: infinite lists are not just a beautiful mathematical formalism stemming from the forbidding forests of Haskell, they can also be a useful device when approaching down-to-earth problems like complex iteration.  The next time you find yourself writing unpleasant code to loop over complex data sets, stop and consider whether or not a stream might be a better solution.

Software Transactional Memory in Scala

6
Oct
2008

The fact is that there are a lot of problems that are hard to solve in a purely-functional style.  That’s not to say that no solution exists, but certain problems are very difficult to model without shared state.  In such situations, a slightly different approach to concurrency must be considered.  Actors are inapplicable, seeing as they are the embodiment of “shared-nothing” continuation passing, and fork/join doesn’t really help us.  Usually, when faced with the need for shared mutable state, most developers will resort to the old-fashioned technique of locking everything and controlling access to that state one thread at a time.  The solution often goes something like this:

public class Container {
    private int value;
    private final ReadWriteLock lock = new ReentrantReadWriteLock();
 
    public int getValue() {
        lock.readLock().lock();
        try {
            return value;
        } finally {
            lock.readLock.unlock();
        }
    }
 
    public void setValue(int value) {
        lock.writeLock().lock();
        try {
            this.value = value;
        } finally {
            lock.writeLock.unlock();
        }
    }
}

Obviously, most scenarios which call for such locking are a bit more complex, but you get the picture.  There are two very serious problems with locking:

  • Implementation rules are ad hoc
  • Throughput is reduced

The first issue is obvious: there is nothing actually preventing us from accessing value without locking.  What’s worse, it would be just as easy to lock, access value and then forget to unlock when we were done.  There is no logical connection between value and its lock, the implementation rules are solely enforced by convention.  Of course, we can avoid the problem of forgetting to unlock by using a locking strategy which is lexical in nature, but in that case our second issue (throughput) becomes even more pronounced:

private int value;
private final Object lock = new Object();
 
public int getValue() {
    synchronized (lock) {
        return value;
    }
}
 
public void setValue(int value) {
    synchronized (lock) {
        this.value = value;
    }
}

The problem here is that everything blocks everything else.  If one thread wants to retrieve value, it literally stops the world while the other threads wait their turn.  Data integrity is guaranteed, but this was accomplished only by creating a serial bottleneck on read/write access to value.  On top of that, we still have no enforced correlation between the lock and its corresponding piece of data.

“Glass is Half Full” Concurrency

What we have been doing here is called “pessimistic locking”.  We have literally started with the assumption that we are going to run into data contention issues.  Our entire locking mechanism is designed with the worst case scenario in-mind: lots of threads trying to write to the same piece of data simultaneously.  However, not every concurrent system is going to have constant interference between threads.  As it turns out, in practice data contention is the exception rather than the rule.  Given this, it would be very nice if we could design a system which gave threads the “benefit of the doubt”, assuming that they will not conflict, while still somehow maintaining data integrity.

The answer to this is “optimistic locking”.  We turn the problem completely on its head.  Instead of assuming that there will be problems and so locking everything preemptively, we assume that everything will be fine and just allow everyone free, non-blocking access.  Of course, an “open-door policy” with regards to shared mutable state isn’t enough in and of itself, we have to have some rule for dealing with contention issues, and some way of detecting those conflicts when they happen.  Enter transactions…

The idea behind a purely-optimistic transactional memory model is that all write operations must occur within a transaction.  Data can be read any time, but the memory model will ensure that the data is never in an “intermediate state” - or between writes.  Let’s consider the more complicated scenario of the transfer of funds from one bank account to another.  The steps are as follows:

  1. Withdraw $500 from account A
  2. Deposit $500 into account B

Of course, this isn’t really an algorithm, it’s a high-level overview.  We’re really looking at something a bit more complicated than it would seem:

def transfer(amount: Int, a: Account, b: Account) {
  a.balance = a.balance - amount
  b.balance = b.balance + amount
}

I avoided using the += and -= operators so as to illustrate the problem more explicitly.  This operation has four separate operations on shared mutable state (the account balance).  It’s not too difficult to see how this could go terribly wrong in the case where two separate transfers are taking place simultaneously.  For example, we could transfer $500 from account A into account B, while at the same time we transfer $200 from account B into account C.  Remember, we’re not locking anything, so we run the danger that our concurrent execution order will be interleaved in the following fashion:

  1. [Thread1] Get balance of account A
  2. [Thread1] Set balance of account A to its former amount less 500
  3. [Thread1] Get balance of account B
  4. [Thread2] Get balance of account B
  5. [Thread1] Set balance of account B to its former amount plus 500
  6. [Thread2] Set balance of account B to its former amount less 200
  7. [Thread2] Get balance of account C
  8. [Thread2] Set balance of account C to its former amount plus 200

The operation in red is the one we need to be concerned about.  Thread2 retrieved the balance of account B just prior to when it was modified by Thread1.  This means that when Thread2 calculates the new balance (less 200), it will be basing its result on a now-obsolete balance.  When it sets the balance, the $500 that was transferred from account A will mysteriously vanish, the unfortunate victim of a common race condition.

In a transactional system, both transfers would be separately handled in their own transaction.  Neither of them would modify the other’s data until all operations are completed, at which point the transaction would commit and the data would become “live”.  Thus, when the Thread1 adds $500 to account B, the actual balance of account B will remain constant at its original value (prior to the transaction).  Once the transaction commits, both the balance of A and B will be updated essentially simultaneously.  There will never be a point where $500 is discovered missing “in transit” between two accounts.

This is only half the equation though.  The real genius of the transactional model is just prior to committing, a transaction validates itself, ensuring that all of the data it worked with is still in the same state it was at the beginning of the transaction.  If something has changed, then the transaction is invalidated and must be re-run automatically.  Our issue from above can never happen because the Thread2 transaction will attempt to validate itself, only to discover that the balance of account B has been changed in the meantime.  Rather than yield to the race condition, the Thread2 transaction will throw away its work and start from scratch.  Assuming that nothing else is running, the second validation will be successful (since nothing will have changed concurrently) and the transaction will commit.

It all sounds much more complicated than it actually is.  People familiar with modern databases like Oracle should already be comfortable working with optimistic transactional models.  The technique is a little less common in software, but it can still be applied.

In order to make this work in a conventional application setting, we need to introduce a few more abstractions.  There are several ways to go about this, but I have chosen to follow the model laid down by Rich Hickey in his implementation of Clojure’s STM.  In turn, Clojure seems to take a fair bit of inspiration from Haskell’s STM monad, although it does not port over concepts like transaction composition and what Simon Payton Jones calls “choice”.  Basically, the design can be distilled as follows:

  • Each item of shared state must be stored in a reference
  • References can be read at any point, but they can only be modified within a transaction
  • The data contained within a reference must itself be immutable, the reference simply allows you to switch the data it contains
  • Within a transaction, reading a reference does not return its current value, but rather the value it had precisely when the transaction began.  This allows data to change outside the transaction without disrupting its internal processing.
  • Changes made to a reference inside a transaction are not world-visible until the transaction completes and is committed.  Validation ensures that no data is lost during the commit.
  • Transactions must never contain side-effects as they may be executed multiple times

With all that in mind, let’s get to work on an implementation!

References

Since these are the primitive building blocks of our STM, it seems logical that we should start here.  At an extremely basic level, we will need a structure which looks something like the following:

class Ref[T](private var value: T) {
  def get = value
 
  def :=(value: T) {
    this.value = value
  }
}

This is the basic idea anyway.  We want to be able to retrieve a value from a reference using get, and we want to be able to store a value in the reference using the := operator (hat tip to all of you rabid Pascal fan-boys).  Unfortunately, we haven’t really accomplished anything here.  Yes, we have a reference to wrap around a mutable piece of data, but there are no associated concurrency semantics.  Remember the definition of a reference?  We must only be able to write to it within a transaction.  Furthermore, the value returned from a reference within a transaction is not necessarily its current, world-visible state, but rather the data it had the moment the transaction began.

In order to accommodate these requirements, we will introduce the concept of a context.  Both read and write access to a reference will require a context to be present.  We will have one context for the entirety of world-visible state.  Additionally, each transaction will have its own context.  In that way, we can ensure that transaction modifications are kept local to itself prior to commit while at the same time preventing changes from other transactions from becoming visible after the transaction has started (potentially leading to data integrity problems).  Our API has now evolved to something more like the following:

class Ref[T](private var value: T) {
  // ???
 
  def get(c: Context) = c.retrieve(this)
 
  def :=(value: T)(c: Context) {
    c.store(this)(value)
  }
}

I left the constructor undefined because of the fairly obvious problem in this implementation: how does the Context get the values in the first place?  For the moment, let’s put that problem aside and deal with a more interesting one: limiting write access to within transactions.  Recall that we’re going to have two different kinds of contexts: a live context which is global, as well as a context which is local to the transaction.  The requirement alone implies a way to statically restrict reference mutability to within transactions: require a type of context other than the live one.  To this end, we will derive the following closed hierarchy.  It is closed because there will be no other Context implementations, preventing over-zealous API extensions from fouling up our semantics.

image

We will define Context to be an abstract class and LiveContext to be a singleton object.  Each transaction will have its own Transaction context which it will use in both read and write operations.  LiveContext will only be used outside of a transaction, when there is no other context available.  To enforce this, we will restrict the type of the Context taken by the reference assignment operator to only accept Transaction:

def :=(value: T)(c: Transaction) {
  c.store(this)(value)
}

With this in mind, we can start to envision what the syntax for reference operations will look like:

def transfer(amount: Int, a: Ref[Int], b: Ref[Int])(c: Transaction) {
  a.:=(a.get(c) - amount)(c)
  b.:=(b.get(c) + amount)(c)
}
 
// ...
val accountA = new Ref[Int](1500)
val accountB = new Ref[Int](200)
 
// somehow call `transfer` as a transaction
println("Account A: " + accountA.get(LiveContext))
println("Account B: " + accountB.get(LiveContext))

And just when our design was looking so nice too.  We’ve succeeded in preventing at compile time the modification of references outside of transactions, but we did it at the cost of a tyrannical syntax.  Fortunately, Scala has a handy mechanism for cleaning up syntax such as this, one which should reduce the volume of annoying bulk by several orders of magnitude: implicits.

IScala makes it possible to mark parameters as accepting implicit values.  These parameters in turn become implicit values themselves.  When a method which accepts an implicit parameter of a specific type is called with an implicit value of the same type in scope, the parameter can be omitted entirely.  Thus, by marking LiveContext as an implicit object and appropriately annotating the last parameter of transfer as well as the Context parameters of Ref’s accessor and mutator, we can eliminate almost all of the annoying bulk in the above example:

implicit object LiveContext extends Context {
  ...
}
 
class Ref[T](value: T) {
  // ???
 
  def get(implicit c: Context) = c.retrieve(this)
 
  def :=(value: T)(implicit c: Transfer) {
    c.store(this)(value)
  }
}

With these extra modifiers in place, we can redo our transfer snippet to see how things look.  To cut down on line length, we will also assume that accountA and accountB are references in some sort of global scope:

val accountA = new Ref[Int](1500)
val accountB = new Ref[Int](200)
 
def transfer(amount: Int)(implicit t: Transaction) {
  accountA := accountA.get - amount
  accountB := accountB.get + amount
}
 
// somehow call `transfer` as a transaction
println("Account A: " + accountA.get)
println("Account B: " + accountB.get)

Pretty slick, and we aren’t even finished yet!  We can add an implicit conversion from Ref[T] to T, eliminating the need to call get anywhere in the code:

implicit def refToValue[T](ref: Ref[T])(implicit c: Context) = {
  ref.get(c)
}

…and the final example syntax:

val accountA = new Ref[Int](1500)
val accountB = new Ref[Int](200)
 
def transfer(amount: Int)(implicit t: Transaction) {
  accountA := accountA - amount
  accountB := accountB + amount
}
 
// somehow call `transfer` as a transaction
println("Account A: " + accountA)
println("Account B: " + accountB)

With the exception of the := syntax (which is unfortunately unavoidable), you would never be able to tell that references are being used rather than conventional vars.  Even better, we have managed to preserve our static assurance that no reference may be modified outside of a transaction.  If we were to attempt to call the := method without an instance of Transaction on-hand, the Scala type checker would complain and our code would not compile.  The we need to do to close the loop is to make sure that the only place a Transaction instance can be obtained is inside a transaction (seems logical).

Atomic

For the moment, let’s shelve all of the implementation requirements for the STM and instead focus on the API.  We already have an almost elegant syntax for references, but we’re still missing one final piece: initiating the transaction.  Being incredibly imaginative and innately creative, I decided the best way to devise an API for this would be to hit Wikipedia.  After all, why come up with something original when someone smarter has already solved the problem?

 // Insert a node into a doubly-linked list atomically
 atomic {
     newNode->prev = node;
     newNode->next = node->next;
     node->next->prev = newNode;
     node->next = newNode;
 }

 atomic (queueSize > 0) {
     remove item from queue and use it
 }

Credit: Software transactional memory # Proposed language support

It’s a fairly straightforward API, one which could be expressed in Scala as it stands if we really wanted to.  Obviously the body of the first transaction is a little closer to C than Scala, but we can just take that as example pseudo-code and move on.  The real core of the API is a single method, atomic, which takes a function value and executes that function as a transaction.  If a conditional is provided, it acts as a guard.  If the guard ever returns false, the transaction aborts and is not retried.

The only problem here is we haven’t accounted for our implicit Transaction parameter.  Intuitively, we could just add a parameter of type Transaction to the function value, but unfortunately Scala doesn’t allow anonymous functions with implicit parameters.  That leaves one of two options: either take the Transaction as a parameter to the anonymous function and then store it in an implicit value within the function; or pass an actual method which takes an implicit parameter.  In truth, either approach will work, but for the remainder of the article I will use the method-passing approach, rather than the separate assignment within the anonymous method.

Altogether, our API in action looks something like this:

val accountA = new Ref[Int](1500)
val accountB = new Ref[Int](200)
 
def transfer(amount: Int)(implicit t: Transaction) {
  accountA := accountA - amount
  accountB := accountB + amount
}
 
atomic(transfer(500)(_))    // run transaction
 
println("Account A: " + accountA)
println("Account B: " + accountB)

Notice the extra (_) syntax in the call to atomic.  (thanks, Jesper!)  This is required because transfer accepts an implicit parameter.  Without it, Scala doesn’t know whether we mean to call the method using some implicit value in scope or if we want the function value itself.

Remember this API.  We’ll come back to the implementation of this method after we have completed the rest of the framework.  For now, let’s move onto context…

Context

Returning to the dark and mysterious internals of the implementation, we now come to the deep morass of Context.  As it turns out, everything in the STM will revolve around this class and its two separate implementations.  Recall that it is responsible for retrieving reference values, controlling what data is visible from within a transaction and what data is live.  Generally speaking, we are going to have the following design with respect to where reference data is handled:

  • If in LiveContext, access the One True copy of the data.
  • If in Transaction, access the One True copy if and only if there is no transaction-local version.  Once the reference has been read, cache the data and return that value from all future reads within the transaction.  We’re cheating a bit here since we aren’t taking a snapshot of the world on transaction start, we’re waiting for the first reference read.  As long as we get our validation right, this should be ok.

Logically, this is a simple default operation clouded by a fairly substantive special case (within a transaction after the reference has been read or written once).  To handle the special case, it only makes sense that we use a map from reference to the data to which it corresponds in that transaction.  Remember that the data within a transaction may be a bit behind the live copy.

We could use another map for the general case, but we can make things even simpler than that.  Rather than having a global map from reference to value, we can just store the live values within the Ref objects themselves.  They will still need to delegate to their context to retrieve and store that value, but the actual live version can be kept locally.  This simplifies LiveContext tremendously:

class Ref[T](private[stm] var value: T) {
  def get(implicit c: Context) = c.retrieve(this)
 
  def :=(value: T)(implicit t: Transaction) {
    t.store(this)(value)
  }
}
 
// ...
sealed abstract class Context {
  def retrieve[T](ref: Ref[T]): T
}
 
implicit object LiveContext extends Context {
  def retrieve[T](ref: Ref[T]) = ref.value
}

In this iteration, we have simplified Context a little bit, seeing as it is never actually used for storage.  Notice that all we need to do in LiveContext is delegate right back to the reference.  This may seem like a bit of superfluous indirection but it becomes absolutely essential once we start considering Transaction:

import collection._
 
final class Transaction private[stm] () extends Context {
  private val world = mutable.Map[Ref[Any], Any]()
 
  def retrieve[T](ref: Ref[T]) = {
    val castRef = ref.asInstanceOf[Ref[Any]]
 
    if (!world.contains(castRef)) {
      world(castRef) = ref.value
    }
 
 
 
 
 
    world(castRef).asInstanceOf[T]
  }
 
  def store[T](ref: Ref[T])(value: T) {
    val castRef = ref.asInstanceOf[Ref[Any]]
    world(castRef) = value
  }
}

This is the reason we needed to redirect through retrieve rather than just grabbing the value directly in Ref.  Within a transaction, any dereferencing will polymorphically come to an instance of this class.  The mutable map, world, handles the transaction-local cache of all values once they have been accessed.  Thus, the reference can change after we have looked at it (when another transaction commits) and it doesn’t affect the values local to our transaction.  This technique is exceedingly powerful and in no small part responsible for the higher throughput made possible by the transactional model.

Incidentally, it is worth noting that within a transaction, references are not thread safe.  Thus, if you start a transaction and then start manipulating references concurrently within that same transaction, bad things will happen.  This isn’t really a problem though because transactions are always designed to be single-threaded from start to finish.  They are used in multi-threaded situations, they do not use multiple threads.

Commitment

Now that we have the basic nuts and bolts of our STM framework, we need to start considering how we are going to commit transactions.  The process is two fold: first, we validate all references either read or written to by the transaction, checking for anything which may have changed in the interim; and second, we copy all new data from our transaction-local cache into the live references.  For the sake of simplicity, we will only allow one transaction to commit at a time.  We could do a little better, but this should work just fine for the experimental stuff.

Validation is a toughy, but the second commit step is fairly easy to satisfy: just loop through a set of all writes and copy the changes into the corresponding Ref.  The necessary changes are as follows:

final class Transaction private[stm] () extends Context {
  private val world = mutable.Map[Ref[Any], Any]()
  private val writes = mutable.Set[Ref[Any]]()
 
  ...
 
  def store[T](ref: Ref[T])(v: T) {
    val castRef = ref.asInstanceOf[Ref[Any]]
 
    world(castRef) = v
    writes += castRef
  }
 
  def commit() = {
    CommitLock.synchronized {
      // TODO  validate
 
      for (ref <- writes) {
        ref.value = world(ref)
      }
    }
 
    true
  }
}
 
private[stm] object CommitLock

The commit method returns a Boolean value indicating whether or not the commit was successful.  If a conflict was detected the method returns false and (presumably) the transaction will be tried again.  Since we aren’t doing any validation just yet, we will always return true.  Note that we don’t really have to lock anything related to the Ref instances in order to write their data.  It is possible for another thread to be reading data from these same references at precisely the same time as we are writing to them.  However, primitive memory operations are atomic by definition, meaning that we don’t need to worry about data integrity on the level of a reference value.

Validation is actually the most important part of the transaction commit process and quite possibly the most important facet of the entire STM concept.  Without it, there is nothing to prevent data integrity from breaking down and causing problems.  (remember that $500 we lost?)  Unfortunately, our system doesn’t quite have the chops yet to support any sort of transaction validation.

In order to validate a Ref, we need to compare its state to the state it was in at the moment we read from it or wrote to it (meaning into the transaction-local cache).  We can’t just compare values, partially because equals isn’t fast enough, but also because it doesn’t provide a strong enough guarantee about whether or not data has changed.  What we need is some value which we control which indicates deterministically the current state of a Ref and which can be used later to determine if that state has changed.  In short, we need a revision number:

class Ref[T](value: T) {
  private[stm] var contents = (value, 0)    // init revision to 0
 
 
}

The revision numbers have to be controlled statically with thread-safety, so the best place for them should be in the Transaction singleton.  Transaction (the companion object for Transaction the Context) also contains our implicit conversion as well as that enigmatic atomic method that we still have yet to implement.

object Transaction {
  private var rev_ = 1
  private val revLock = new AnyRef
 
  private def rev = revLock.synchronized {
    val back = rev_
    rev_ += 1
    back
  }
 
  ...
}

Empowered by a revision increment system which is guaranteed to produce unique values for each invocation, we can expand upon our concept just a little bit.  Each Transaction will have a unique revision number associated with it (maintained within the rev field).  Assuming a transaction successfully commits, it will not only modify the value of the references in question but also their revision, which it will set to its own number.

This revision system can be used in validation of transaction commit.  Whenever we read or write to a Ref for the first time, we will store its current revision number within the transaction.  When it comes time to commit the transaction, we can loop over our revision map and compare with the actual revision of the reference in question.  If all of the expected revisions match up with reality, the transaction checks out and we can go ahead and commit.  Otherwise, we have to assume that a transaction operating concurrently modified a reference we used and committed after we started our own transaction.  Once this is known, we can’t simply commit over the other transaction’s changes, throwing away all of that money.  Our transaction must be retried from scratch.

Now that we know how to validate, we can finally look at a completed version of commit (and supporting cast):

final class Transaction private[stm] (val rev: Int) extends Context {
  private val world = mutable.Map[Ref[Any], Any]()
  private val writes = mutable.Set[Ref[Any]]()
  private val version = mutable.Map[Ref[Any], Int]()
 
 
  def retrieve[T](ref: Ref[T]) = {
    ...
 
    if (!world.contains(castRef)) {
      ...
 
      if (!version.contains(castRef)) {
        version(castRef) = castRef.contents._2
      }
    }
 
    ...
  }
 
  def store[T](ref: Ref[T])(v: T) {
    ...
 
    if (!version.contains(castRef)) {
      version(castRef) = ref.contents._2
    }
 
    ...
  }
 
  def commit() = {
    CommitLock.synchronized {
      val back = world.foldLeft(true) { (success, tuple) =>
        val (ref, _) = tuple
        success && ref.contents._2 == version(ref)
      }
 
      if (back) {
        for (ref <- writes) {
          ref.contents = (world(ref), rev)
        }
      }
 
      back
    }
  }
}

It’s a lot of code, but all fairly straightforward.  The validation simply bears out our intuition: checking the revisions initially retrieved from the references with their current values.  Here again we are making use of the fact that memory access is atomic.  We don’t need to worry about a revision changing out of sync with a value because both of them are encapsulated by a 2-tuple within the Ref itself.  Meanwhile, the validation can be trusted because of the CommitLock: we don’t need to worry about another transaction committing between our validation and when we actually get around to saving our values.

Atomic 2.0

I said we would come back to this, and here we are!  We never did implement the atomic method, which is a bit of a shame seeing as it is what is responsible for kicking off the entire transactional process.  Not only that, but it creates the Transaction instance, ensures that the transaction gets committed once it has finished and it retries if that commit fails.  Set in code, it looks something like this:

def atomic[A](f: (Transaction)=>A): A = atomic(true)(f)
 
def atomic[A](cond: =>Boolean)(f: (Transaction)=>A) = {
  def attemptTransact(): A = {
    if (cond) {
      val trans = new Transaction(rev)
 
      try {
        val result = f(trans)
 
        if (trans.commit()) result else attemptTransact()
      } catch {
        case _ => attemptTransact()    // if exception, assume conflict and retry
      }
    } else null.asInstanceOf[A]
  }
 
  attemptTransact()
}

The only weird thing here is the use of an internal function to control the transaction dispatch process.  This is necessary because we need to emulate the pattern of a do/while loop without losing the ability to capture a return value.  This is the one minor feature of the transaction API that we have designed which I haven’t already discussed: the ability to return a value from a transaction.  Practically speaking, this isn’t needed too often since the very purpose of a transaction is to modify references, but it is still a pattern worth keeping in hand.

You will notice that we have some generic catch-all exception handling going on here.  Whenever a transaction throws an exception, we assume that it has failed and we try again.  To be honest, I wrestled back and forth with this decision.  After all, if a transaction comes across a NullPointerException on its first try, it’s not likely to do any better the second time around, or the third, or the fourth, or the…  On the other hand, there is a remote but very real possibility that data can briefly get into an inconsistent state within a transaction.  To understand how, consider the following abstract scenario:

  1. Transaction A writes data to reference A and reference B
  2. Transaction B starts after transaction A but before it has committed
  3. Transaction B reads from reference A
  4. Transaction A commits
  5. Transaction B reads from reference B
  6. Transaction B does the funky chicken and tries to commit
  7. Transaction B fails validation and has to try again
  8. Everything works fine on the second time around

Everything gets straightened out in the end due to the validation, but there is still this hairy moment within transaction B that we have to worry about.  Within the transaction, we have some data from before transaction A committed, and some from afterward.  Validation is never going to let this slip by, but while the transaction is still executing we can run into some fairly serious issues.

Imagine for example that reference B is changed in such a way that it throws an exception in transaction B unless it is paired with just the right value of reference A.  Since we have an inconsistent pairing between references A and B within transaction B, we will get an exception caused directly by a breakdown in data integrity.  Because this is a data integrity issue, we want this exception to trigger a redo in the transaction.  However, we can’t really predict such exceptions, so there’s no way we can distinguish between a legitimate exception (which should be propagated outside the transaction) and a data integrity fault.  In short, we’re in trouble.

Given the way we designed the STM framework, I don’t really see a nice, efficient way to avoid this problem.  There are other approaches we could take which don’t have this issue, but that would either require an entirely different implementation or a much smarter developer taking the lead.  I would be interested to see how Clojure handles this case…

A Less-Trivial Example

Now that we have this full STM framework, it might be interesting to put it to work.  To that end, let’s create a simple market simulation with three businesses and a hundred potential customers, each with their own accounts and separate balances.  All of these entities will have their own thread operating on their behalf, making purchases, excepting refunds and keeping the Mafia off their back.  Bear in mind that this is a truly concurrent simulation with full thread semantics.  We’re not going to use actors or anything like that to reduce the number of threads involved, everything will operate in true parallel.

To make things a little more interesting, we will also associate a 7.5 % fee with each transfer, paid to a separate account.  Also, each transfer will be logged presumably for later review.  To make everything fun, we will have one final thread which monitors the entire market, summing up every account and checking the total.  The obvious concern is that some collision in access of the shared data will lead to the unexpected loss (or gain) of wealth.  So long as the total market value remains constant, we can assume that shared state is being handled appropriately.

Let’s start out by defining our accounts and basic structure.  Bear in mind that Account is a type alias (similar to a C-style typedef) for Ref[Long], it is not a separate class.

object BankFrenzy {
  import Transaction._
 
  type Account = Ref[Long]
 
  private val fees = new Account
  private val log: Ref[Vector[Transfer]] = new Ref(EmptyVector)
 
  def main(args: Array[String]) {
    val business1 = new Account(15000)
    val business2 = new Account(20000)
    val business3 = new Account(50000)
 
    val people = (0 until 100).foldLeft(Vector[Account]()) { (vec, i) =>
      vec + new Account(1000)
    }
  }
}

Remember that all data contained within a reference must be completely immutable, otherwise the STM framework cannot help you.  If you can just change the underlying data within a reference at will, then transaction semantics are useless (since revision tracking breaks down).  To that end, we will use my port of Clojure’s persistent vector to handle the transfer log; and just because it’s fun, we will also use it to manage personal accounts.

Moving on, we should probably define some functions to operate on these accounts.  Remember, they must take an implicit parameter of type Transaction, otherwise they will be prevented from modifying references.

def transfer(amount: Long, from: Account, to: Account)(implicit t: Transaction) {
  log := log + Transfer(amount, from, to)
 
  val less = Math.round(amount * 0.075)
 
  from := from - amount
  to := to + (amount - less)
  fees := fees + less
}
 
def sum(portfolio: Vector[Account])(implicit t: Transaction) = {
  portfolio.foldRight(0:Long) { _ + _ }
}

One thing worthy of attention here is the sum method: we aren’t actually modifying any references in this method, so why bother putting it within a transaction?  The answer is to enforce data integrity.  We want to make sure that we see a truly consistent picture of the entire market, and the only way to be absolutely sure of that is to use the conceptual “snapshot of the world” maintained by a transactional log.

Also notice that the sum method does not store its result within a reference, it actually returns a value.  This is one of the neat features of our STM implementation: it allows transactions to return values just like functions.  This dramatically reduces the boilerplate which would normally be required to get the result of a calculation from within a transaction.

With this infrastructure in place, we can go ahead and create all of the threads we’re going to need for the simulation:

val market = people + business1 + business2 + business3 + fees
var running = true
 
val secActor = thread {
  while (running) {
    val total = atomic(sum(market)(_))
    println("Market value: $" + total)
 
    sleep(10)
  }
}
 
val businessActor = thread {
  while (running) {
    atomic(transfer(250, business1, business2)(_))    // transfer rent
 
    sleep(200)
  }
}
 
val peopleActors = for {
  i <- 0 until people.length
  val p = people(i)
} yield thread {
  atomic(transfer(50, p, business3)(_))       // payoff the mob
  atomic(transfer(i * 10, p, business1)(_))   // purchase from business1
  atomic(transfer(i * 3, business2, p)(_))    // refund from business2
}

This is assuming that we have already defined a utility method, thread, in the following fashion:

def thread(f: =>Unit) = new Thread {
  override def run() {
    f
  }
}

Just one of the many little tricks made possible by the staggering power of Scala’s syntax.

There isn’t much worthy of attention within our simulation threads, it’s just a lot of concurrent operations running against some shared state.  If we wanted to really have some fun, we could add println status messages to each thread, allowing us to try the simulation multiple times and watch the thread interleaving change from run to run.  However, all we’re really interested in with this simulation is the assurance that data integrity is maintained at all times.  To see that, all we really need is to check the starting market value, the ending value and some period market auditing while the simulation is in progress:

println("Starting market value: $" + atomic(sum(market)(_)))
 
businessActor.start()
secActor.start()
 
for (pa <- peopleActors) pa.start()
for (pa <- peopleActors) pa.join()
running = false
 
businessActor.join()
secActor.join()
 
println("Total fees: $" + fees)
println("Final market value: $" + atomic(sum(market)(_)))

If we compile and run the simulation, the output will look something like this (on a dual-core, 2 Ghz processor):

  Starting market value: $185000
  Market value: $185000
  Market value: $185000
  Market value: $185000
  Total fees: $5258
  Final market value: $185000

We could try the simulation a hundred times with different CPU loads and even on separate machines, the results will always be the same.  While there may be a greater or lesser number of concurrent market audits during the simulation, the values retrieved each time will be the same.  From this we can conclude one important fact: we have succeeded in designing an STM framework which preserves data integrity.

In simulations like this one with a high degree of contested data, STM may actually be slower than a traditional, fine-grained locking strategy.  However, just think for a moment about trying to write this simulation using Java’s ReentrantReadWriteLock class.  It would be nearly impossible to design such a system, let alone maintain it.  There would always be the danger that we would accidentally get the locking in the wrong order, or forget to lock something before we access it.  In short, such an effort would be extremely hazard prone, and far more verbose.  Using our STM framework, the resulting code was clean and simple to understand.  It’s easy to see why techniques like this are really starting to catch on.

Conclusion

Hopefully this has been an informative and enjoyable foray into the world of transactional memory systems.  I will retroactively apologize for any areas where my facts are in err; I’m certainly quite new to all of these concepts.  As per usual, the library described in this article is available for download.  I was a little lax on the testing side of life, so you may want to vet the library a little bit before you trust it unsupervised in your data center.

There are a number of interesting aspects to STM that I didn’t cover in this article, such as retry and the related function, check (implemented in the framework).  Also left completely untouched is the monadic operation orElse which can be used to compose transactions in a “first try this, then try that on failure” sort of way.

STM is a very active research topic today with a lot of the brightest minds in the industry pondering ways to make it better.  While it certainly doesn’t solve all of the problems associated with concurrency, it does have the potential to simplify locking and produce better performance under some conditions.  Definately a technology to watch as it develops!

ActiveObjects 0.8 Released

22
Mar
2008

Happy Easter everyone, we’ve got a new release!  ActiveObjects 0.8 is simmering at low heat (and footprint) on the servers even as I type.  This is probably the most significant milestone we’ve released thus far in that the 1.0 stream is now basically feature-complete.  I won’t be adding any new features to this release, just polishing and testing the heck out of the old ones.  We’ve already got a suite of 91 tests and I’ve got lots of ideas for more.  Expect this framework to get very, very stable in the next few months.

Features

Despite an annoyingly cramped schedule in the last couple months, I have managed to implement a few long-requested features.  The big ticket item for this release is pluggable cache implementations.  For those of you not “in the know”, ActiveObjects has a multi-tiered cache system composed of several layers and delegates.  This isn’t entirely unusual for an ORM, which is why I hadn’t mentioned it before.  The three basic layers:

  • Entity Cache - A SoftReference map from id/type pairs to actual entity values.  This means that if you run a query which returns an entity for people {id=1} and then run a second query which returns an entity for the same row, the two instances will be identical.  This is what enables the additional cache layers to function properly (and it saves on memory).
  • Value Cache - This is where all the field values for a given row are stored.  This will store both field values from the database as well as entity values (corresponding to foreign key fields).  Each entity has a separate instance of this cache.  Most of the memory required by ActiveObjects is taken up here.  With a long-running application, quite a bit of the database can actually get paged into memory.  This is a really good thing, since it drastically reduces round-trips to the database.  Thanks to the SoftReference(s) in the entity cache, running out of memory due to stale cache entries isn’t a problem.
  • Relations Cache - This is probably the most complex of all of the cache layers.  This cache literally caches any one-to-one, one-to-many or many-to-many result sets and spits them out on demand.  This reduces round-trips still more to the point where certain applications can be running completely from memory, even querying on complex relations.  This in and of itself isn’t that complex; the hard part is knowing when to expire specific cache entries.  This cache layer is essentially three different indexes to the same data, allowing cache expiry for a number of different circumstances.  Naturally, these structures don’t represent well as a single map of key / value pairs.  (more on this in a bit)

As of the 0.8 release, the value cache and relations cache layers have been unified under a single controller.  They’re still separate caches, but this unification allows for something I’ve been wanting to implement since 0.2: Memcached support.

High-volume applications often run into the problem of limited memory across the various server nodes.  Solutions like Terracotta can help tremendously, but unfortunately this doesn’t solve everything.  One answer is to provide every node in the cluster with a shared, distributed, in-memory map of key / value pairs.  This allows the application to page data into cluster memory and retrieve it extremely quickly.  Memcached is effectively this.  Naturally, it would be very useful if an ORM could automatically make use of just such a distributed memory cache and thus share cached rows between nodes.  Hibernate has had this for a long time, and now ActiveObjects does as well.

By making use of the MemcachedCache class in the activeobjects-memcached module (separate from the main distribution), it is possible to point ActiveObjects at an existing Memcached cluster and cause it to store the value cache there, rather than in local memory.  Once the relevant call has been made to EntityManager (setting up the cache), all cached rows will be stored in Memcached and shared between every instance of ActiveObjects running in your cluster.

Caveats

Of course, nothing is perfect.  For the moment, the major issues is that the relations cache doesn’t work properly against Memcached.  I had originally planned to just release it running against the local RAM, but this causes issue with proper cache expiry.  As such, the relations cache is disabled when running ActiveObjects against Memcached.  This doesn’t really impair functionality other than forcing ActiveObjects to hit the database every time a relationship is queried.  Since this is what most ORMs do anyway, it’s not too horrible.

I plan to have this issue resolved in time for the 0.9 release.  The biggest problem is figuring out how to flatten the multi-index structure into a single map.  Once I can do that, rewriting things for Memcached should be a snap.  (btw, if anyone wants to help out with patches in this area, you’re more than welcome!)

Databases Galore

One of the major focuses of this release was testing on different platforms.  This process uncovered an embarrassing number of bugs and glitches in some of the supposedly “supported” databases.  With the exception of Oracle, every bug I’ve been able to identify has been fixed.  So if you’ve tried ActiveObjects in the past and run into problems on non-MySQL platforms, now would be the time to give it another shot!

Speaking of Oracle, we’ve made some tremendous progress toward full support of this ornery database.  To be fair, I’m not the one responsible.  One of our community members has been tirelessly submitting patches and running tests.  We didn’t have time to get things fully sorted for this release (so don’t try AO 0.8 with Oracle unless you’re willing to risk server meltdown), but you can expect dramatic improvements in 0.9.  Once these fixes are in place, we should be able to safely claim that we support most of the database market (in terms of product adoption).

Final Thoughts

This really is an amazing release, well worth the download.  Most of the core API has remained the same, so things should be basically plug-n-play for existing applications.  The user-facing API is now in freeze and (unless some serious bug arises) will experience no changes between now and 1.0.  If you’ve been considering trying ActiveObjects for your project, now would be an excellent time.  I can’t promise no bugs, but if you point out my idiocy, I’ll certainly work to fix it!

Oh, as an aside, ActiveObjects is now in a Maven2 repository.  (thanks to some POM reworking by Nathan Hamblen)  To make use of this somewhat dubious feature, add the java.net Maven2 repository to your POM and then insert the following dependency:

<dependency>
    <groupId>net.java.dev.activeobjects</groupId>
    <artifactId>activeobjects</artifactId>
    <version>0.8</version>
</dependency>

After that, the magic of Maven will take over.  Of course, you’ll need to add the dependency for the appropriate database driver and (optionally) connection pool.  Those dependencies are left as an exercise to the reader.  Note, activeobjects-memcached isn’t in Maven yet, but it will be for the 0.9 release (either that or integrated into the core, I haven’t decided yet).

I certainly hope you enjoy this latest release.  As always, feel free to drop by the users list if you have any questions or (more likely) uncover any problems.

Should ORMs Insulate Developers from SQL?

25
Feb
2008

This is a question which is fundamental to any ORM design.  And really from a philosophical standpoint, how should ORMs deal with SQL?  Isn’t the whole point of the ORM to sit between the developer and the database as an all-encompassing, object oriented layer?

A long time ago in an office far, far away, a very smart cookie named Gavin King got to work on what would become the seminal reference implementation for object relational mapping frameworks the world over (or so Java developers would like to think).  This project was to be bundled with JBoss, possibly the most popular enterprise application server, and would support dozens of databases out of the box.  It was to offer heady benefits such as totally object-oriented database access, transparent multi-tier caching and a flexible transaction model.  At its core though, Hibernate was design to resolve a single problem: application developers hate SQL.

No really, it’s true!  Bread-and-butter application developers really dislike accessing data with SQL.  This has led to endless conflict (and bad jokes) between application developers and database administrators.  Often times the developer team would write a set of boilerplate lines in Java and then copy/paste these arbitrarily throughout their code, swapping in the relevant query as supplied by the DBA.  For obvious reasons, this would become very hard to maintain and just intensified the bad blood between developer and database.

If you think about it though, it’s a bit odd that this intense dislike would mutate from just hating the insanity of JDBC to hating JDBC, SQL and RDBMS in general.  SQL is a very nice, almost mathematical language which allows phenomenally powerful queries to be expressed simply and elegantly.  It abstracts developers from the headache of database-specific hashing APIs and algorithms which are almost filesystems in complexity.  The language was designed to make it as easy as possible to get data out of a relational database.  The fact that this effort backfired so utterly is a source of endless confusion to me.

But irregardless, we were talking about ORMs.  When it was first introduced, Hibernate held out the promise that developers would never again have to wade knee deep through a sea of half-set SQL.  Instead, developers would pass around POJOs (Plain Old Java Object(s)), modifying their values like any other Java bean and then handing these objects off to the data mapper, which would handle the details of persistence.  Furthermore, Hibernate promised that developers would never again have to worry about which databases support which non-standard SQL extensions.  Since developers would never have to work with SQL, anything database-specific could be handled within the persistence manager deep in the bowels of Hibernate itself.

This all seems lovely and wonderful, but there’s a catch: it doesn’t work so well in practice.  Now before you stone me, I’m not talking about Hibernate specifically now, but ORMs in general.  It turns out to be completely impossible to interact with a relational database solely through an object-oriented filter.  This is easily seen with a simple example:

SELECT * FROM people WHERE age > 21 GROUP BY lastName

How in the world are you going to represent that in an object model?  Sure, maybe you can provide a little abstraction for the query details, but it starts to get complex if you try to handle things like grouping non-declaratively.  The developers working on Hibernate quickly realized this problem and came up with an innovative solution: write their own query language!  After all, SQL is too confusing, so why not invent an entirely new query language with the “feel” of SQL (to keep the DBAs happy) but without all of the database-specific wrinkles?

This query language is now called “HQL”, and as the name implies, it’s really SQL, but not quite.  Here’s how the aforementioned example would look in HQL (disclaimer: I’m not a Hibernate expert, so I may have gotten the syntax wrong):

FROM Person WHERE :age > 21 GROUP BY :lastName

Remarkably similar, that.  Executing this query in a Hibernate persistence manager yields an ordered list of Person entities pre-populated with data from the query.  It seems to make a lot of sense, but there are a number of problems with this approach.  First, it requires Hibernate to literally have its own compiler to translate HQL queries into database-specific SQL.  Second, it hasn’t really solved the core problem that many developers have with SQL: it’s a declarative query language.  As you can see, HQL is really just SQL in disguise, so it really doesn’t eliminate SQL from your database access, just dresses it in a funny hat.

Other ORMs have appeared over the years, taking alternative approaches to the problem of object-relational mapping, but none of them quite eliminating the query language.  Even DSL-based ORMs like ActiveRecord fail to remove SQL entirely:

class Person < AR::Base; end
 
Person.find(:all, :conditions => 'age > 21', :group => 'lastName')

It’s sort of SQL-free, but you can still see bits and pieces of a query language around the edges.  In fact, what ActiveRecord is actually doing here is building a proper SQL query around the SQL fragments which are passed as parameters.  It’s a system which is ripe for SQL injection, but surprisingly leads to very few problems in real-world applications.  This is the approach which is also taken by ActiveObjects for its database query API.

So ORMs in and of themselves seem to have failed to entirely eliminate SQL from the picture, but what about other frameworks?  There are a few quite recent efforts which seem to have nearly succeeded in eliminating the direct use of SQL completely from application code.  Ambition is perhaps the best (and most clever) example of this, though others like scala-rel are catching up fast.  Ambition is designed from the ground up to interact naturally with ActiveRecord, so the two combined perhaps represent the first “true” ORM: one which does not require the developer at any point to deal with any SQL whatsoever.

But was it really worthwhile?  As clever as things like Ambition are, is it really that much easier than just writing queries in SQL?  As Nathan Hamblen so eloquently said (when referring to a totally different topic):

…is the end of the ORM rainbow.  You get there, throw yourself a party and realize that important things are broken.

A quote taken out of context perhaps, but I think it applies to the “cult of SQL genocide” with as much validity.  In the end, by denying yourself access to the powerful and well-understood mechanism that is SQL, you’re just crippling your own application and forcing yourself to write more code instead of less.

So what’s the “right” approach?  Is there a happy medium between ActiveRecord+Ambition and full-blown SQL on Rails?  I think so, and that is the approach I have been trying to implement with ActiveObjects.  As I’m sure you know, ActiveObjects takes a lot of its inspiration from ActiveRecord, so the syntax for querying the database is very similar:

EntityManager em = ...
em.find(Person.class, Query.select().where("age > 21").group("lastName"));
 
// ...or
em.find(Person.class, "name > 21");   // no grouping

You still have the full power of SQL available to you.  You can still write complex, nested boolean conditionals and funky subqueries, but there’s no longer any need to be burdened with the whole of SQL’s verbosity.  As with vanilla ActiveRecord, this code intends to be a bit of a hand-holder, shielding innocent application developers from the fierce world of RDBMS.

Is this the right way to go?  I’m honestly not sure.  I’ve met a lot of developers that would give their left eye to never have to look at another SQL statement again (for developers already missing a right eye, this isn’t much of a stretch).  On the other hand, there are purists like myself who revel in the freedom afforded by a powerful, declarative language.  It’s hard to say which path is better, but at the end of the day, it’s really the question itself that matters.  Giving application developers the choice to select whichever approach they feel is most appropriate, that is the solution.

ActiveObjects 0.7 Released

22
Dec
2007

Just in time for the Christmas holidays, I give you ActiveObjects version 0.7!  This release sports a whole slew of minor performance tweaks and bugfixes, providing better reliability.  Also, we’ve greatly increased the number and scope of the unit tests running against the core.  This gives us more confidence that the code is indeed as stable as we hope.

Actually, the big ticket item for 0.7 is the introduction of polymorphic relations.  This feature is fully tested and stable, so I don’t anticipate it introducing any issues in the coming days as we narrow our focus to 1.0.  However, no feature is so well tested as to be certainly bug free under all circumstances.  I look forward to fixing any problems you may encounter with the library!

A few minor feature additions:

  • Support for persisting enum values
  • One-to-one relations
  • EntityManager#flush(…) now also flushes the relations cache

Looking toward the road ahead, things are really starting to settle down a bit.  The main feature I’m looking to implement for 0.8 is extensible value caching, which should allow you to determine how and where the cached field values are stored.  The main use-case behind this is support for memcached, which has been requested a few times and has actually been on the agenda since 0.3.  With memcached driving the value cache, ActiveObjects should be almost perfectly scalable.

To that end, it’s also worth mentioning that I did some profiling (using the NetBeans Profiler) using a Wicket-based web application I’m developing on the side.  I’m happy to report that the overhead imposed by ActiveObjects is extremely minimal.  In each test, the bottleneck was in the PreparedStatement#execute method (in JDBC).  Since ActiveObjects generates very streamlined and natural queries, it seems that this is really the minimal execution time possible for such database access.  Extrapolating further from this data, and given that the JDBC exec time was avg 40ms, while the execution time in the ActiveObjects API was on the order of 1ms-5ms, we can safely say that ActiveObjects is less than 10% slower than using JDBC directly (compare that to the 50% overhead claimed by ActiveRecord).  This “estimate” doesn’t even take into account the many performance advantages offered by ORMs over hand-coded database access (such as relations caching, optimal query construction, etc).

Anyway, enough shameless boasting.  As Hibernate has often pointed out, ORM benchmarks are usually worse than useless.  The only benchmark which really matters is how well the ORM performs in your application.  Try it out, see what you think.