Skip to content
Print

Improving the STM: Multi-Version Concurrency Control

10
Nov
2008

I wrote a post some time ago introducing a software transactional memory (STM) framework for Scala based on implicit conversions.  This framework had some nice features, like a very clean syntax coupled with compile-time verification of transactional semantics, but it really wasn’t ready for real-world use.  There was one very serious skeleton in the closet that I carefully avoided as I went through the presentation: non-deterministic behavior was still possible, even within a transaction.

The Problem

Allow me to illustrate.  Imagine two transactions, each executing concurrently in a world with two reference cells.  One transaction reads from both cells, subtracts one from the other and then uses the result as a divisor.  The other transaction increments both cells in lock-step with each other.  Put into code, this would look something like the following:

val a = new Ref(1)
val b = new Ref(0)
 
def compute(x: Int)(implicit t: Transaction) = x / (a - b)
 
def interfere(implicit t: Transaction) {
  a := a + 1
  b := b + 1
}
 
atomic(compute(3)(_))         // in thread-1
...
atomic(interfere(_))          // in thread-2

Incidentally, I know that the whole interfere(_) syntax is a bit ugly.  Voice your opinion on RFE #1492.

This series of computations, while useless, should work just fine under normal circumstances.  The STM is going to verify that neither transaction is stepping on the other with any lasting effect, but unfortunately there are things that can go wrong before we get to “lasting”.  It may not be obvious, but this code actually has a very real probability of throwing an ArithmeticException due to a division by zero.

This unfortunate circumstance arises when the “interfere” transaction commits after the “compute” transaction has read a value for a but before it reads a value for b.  When this occurs, compute will obtain a value for b which is precisely the same as the value it has already read from a.  Now, interfere has also changed the global state of a to preserve its property of leading b by 1, but there is no way for compute to know that: it has already extracted a value for a, it’s not going to check to ensure that it’s still sacred.  What happens next is of course trivial: with a and b having the same value, the subtraction operation results in 0, causing the division to fail.

Note that this issue does not indicate a failure in the safeguards, just an incomplete specification for the STM.  If the reference fault had not resulted in an exception, the commit process for the “compute” transaction would have caught the problem and retried the transaction.  In other words, if the problem hadn’t caused a catastrophic failure within the transaction, the STM would have gracefully recovered and no one would have been any the wiser.

When I created the STM implementation, I recognized that this case can indeed arise.  To compensate for this, I created a dumb catch-all within the transaction dispatch process.  This code literally swallows all exceptions coming from within the transaction, treating them as signals to abort and retry.  This works-around our exceptional-interferance issue from above, but it does bring consequences of its own:

def badBoy(implicit t: Transaction) {
  throw RuntimeException("Er...")
}

This transaction will always throw an exception, regardless of the state of the references or any other transactions.  With the originally envisioned design, this exception would have simply propagated out from where the transaction was kicked off (wherever the atomic method was called) and life would have been happy.  Unfortunately, with our catch-all in place (required to solve the earlier problem) this code will fall into an infinite loop.

The problem is that the catch-all will field the RuntimeException and assume that it’s a divine signal that trouble is on the way.  This assumption that data-integrity has been somehow compromised leads the STM to simply retry the transaction.  However, since the exception is thrown every time, the catch-all will just keep on swallowing the same exception, endlessly-retrying the same tired transaction.  Naturally, this is just a trivial example, but it’s not difficult to imagine a block of code which really does throw a legitimate exception, one which we would want to propagate back into user-land where we can effectively deal with the situation.  An infinite loop of failing transactions is not the right way to handle the problem.

Speaking of infinite loops, we still haven’t really solved our interference problem from above.  In our first snippet, we created a data contention situation which could non-deterministically cause an exception to be thrown.  In order to solve this problem, we introduced a catch-all to swallow the exception and just keep right on trucking.  This “solution” not only introduces some undesired behavior (swallowing legitimate exceptions), but it also fails to really fix the issue.  Consider the following slight variation on our previous theme:

val a = new Ref(1)
val b = new Ref(0)
 
def maybeLoop(implicit t: Transaction) {
  while (a <= b) {
    // should never happen
  }
}
 
def interfere(implicit t: Transaction) {
  a := a + 1
  b := b + 1
}

In case it isn’t obvious, the while-loop in maybeLoop should never actually execute.  We have defined references a and b to have values such that a is always strictly greater than b.  The only transaction which modifies these values preserves this property, and so by easy induction we can prove that the conditional “a <= b” is in fact a contradiction.

Unfortunately, our little hand-wavy proof doesn’t take non-determinism into account.  Remember from the first example where the “interfere” transaction committed between the reads for a and b?  Earlier, this just raised an ArithmeticException.  However, in this case, such a conflict would actually cause maybeLoop to suddenly become unterminating!  Because of the way that this data contention falls out, we could end up with an infinite loop where we had expected no loop at all.

This is very, very bad for a number of reasons.  With our exception problem, we were just able to introduce a catch-all to get around the issue.  The same trick isn’t going to work with an infinite loop.  Control is never returned to the transaction controller, so there is no way to inject any logic in this case.  What’s (possibly) worse is there is no way for us to even try to detect these sorts of situations.  This is actually a slight specialization of the halting problem.  Briefly: because any Turing Complete language must be capable of non-terminating execution, the introduction of non-deterministic evaluation into such a language can result in non-deterministic non-terminating execution.  Since it is impossible in general for one Turing Machine to determine whether or not another will halt on a given input, it must follow that it is also impossible to determine whether or not a non-deterministic Turing Machine will halt.  My logical form is atrocious, but I think you get the picture.

The Solution

The point is that we have a problem, and cop-out solutions like a catch-all exception handler isn’t going to solve it.  The only real answer1 would be to somehow ensure that when “interfere” commits, it doesn’t actually change the values of a and b as visible to the other running transaction.  In other words, transactional commits should change the global state, but leave any transaction-local state untouched.  That way, in-progress transactions can work with the reference values which were in place at the start of the transaction, deterministically work their way down to the bottom and finally either commit or detect conflicts and retry.  With this solution, exceptions can be propagated out to user-land because every exception will be valid, none will ever be caused by a reference fault.

In a nutshell, this strategy is like giving each transaction its own “snapshot of the world”.  From the moment the transaction starts, it remains entirely oblivious to any changes going on in the world outside.  It’s as if each transaction got to stop the universe, do its work, commit and then allow things to pick up right where they left off.  However, instead of actually stopping the universe, each transaction will preserve all reference state from that exact moment and allow everyone else to keep right on rolling.  In technical terms, this is called MVCC: Multi-Version Concurrency Control.

If we apply this technique to our first dilemma involving compute and interfere, we will find that even when interfere commits between the reads of a and b, there really is no issue.  Rather than getting a new value of b which is inconsistent with its already-retrieved old value of a, the “compute” transaction will just see the old values for both a and b.  None of the changes to the reference world outside of compute are observed until after the transaction completes.  At that point, the STM will detect any reference faults (such as interfere causing interference) and if necessary, retry the transaction.

The Implementation

This all sounds quite well-and-good, but how would we go about creating such a magical fairy land in the sky?  The most intuitive approach would be to copy every live reference whenever we start a transaction.  This isn’t really as bad as it sounds, seeing as any data contained within a reference must be immutable, so copies are literally free (just use the original data).  Unfortunately, that doesn’t stop the entire process from being heinously inefficient.  It’s easy to imagine a system with 100,000 references and a tiny little transaction which only uses 2 of them.  Do we really want to loop through 100,000 references just to preserve a pair of them?

Obviously, a different approach is needed.  That’s where the committal process comes into play.  With the old design, when a transaction commits, it would first check for any reference faults, then if everything was dandy it would write the necessary changes to the global state for each reference.  In code it looked like this:

def commit() = {
  if (world.size > 0) {
    CommitLock.synchronized {
      val back = world.foldLeft(true) { (success, tuple) =>
        val (ref, _) = tuple
        success && ref.rev == version(ref)
      }
 
      if (back) {
        for (ref <- writes) {
          ref.contents = (world(ref), rev)
        }
      }
 
      back
    }
  } else true
}

MVCC is going to require this commit method to do a little extra legwork.  Rather than just writing the changes into the global state, commit will need to loop through any active transaction, saving the old values for only the modified references within each of these transaction’s logs.  This dramatically reduces the amount of looping and saving which takes place without actually imposing too much extra overhead.  This change – combined with a separation of reads and writes within a transaction – actually looks like the following:

def preserve[T](ref: Ref[T]) {
  val castRef = ref.asInstanceOf[Ref[Any]]
 
  if (!world.contains(castRef)) {
    val (v, rev) = ref.contents
 
    world(castRef) = v
    version(castRef) = rev
  }
}
 
def commit() = {
  if (world.size > 0) {
    CommitLock.synchronized {
      val f = { ref: Ref[Any] => ref.rev == version(ref) }
      val back = reads.forall(f) && writes.forall(f)
 
      if (back) {
        for (ref <- writes) {
          for (t <- Transaction.active) {
            if (t != this) t.preserve(ref)       // preserve the old contents of t
          }
 
          ref.contents = (world(ref), rev)
        }
      }
 
      back
    }
 
  } else true
}

Amazingly enough, this tiny little change is all that is required to implement MVCC within our STM.  Well, of course I’m skipping the implementation of Transaction.active as well as the bitsy concurrency semantics required to make it all work, but you weren’t really interested in any of that, were you?

For those who are keeping score, Clojure already has MVCC implemented within its hallmark STM.  As a point of interest, the implementation is subtly different from the one I have just outlined.  Instead of placing the old reference values within the local transaction logs, Clojure stores these contents as old versions of the TVar within the reference itself.  I’m not sure if I like this approach better or not, but it certainly seems to work well!  In either case, the semantics are essentially identical.

Final aside worthy of mention: you need not concern yourself that all of this extra copying is going to eat through heap space.  This is one of the huge benefits of immutable data: copies are free.  We aren’t really copying the data, just maintaining a reference to its previous version.  Once the transaction commits, its log will go out of scope and that old reference will be eligible for garbage collection.  All in all, the addition of MVCC to our STM doesn’t raise the overhead even the slightest bit.

Conclusion

Realizing that my blog was not the most convenient way to distribute project files for a semi-serious library like the Scala STM, I decided to put the project on GitHub.  For those who hate following links, you can checkout, build and test (see below) the library using the following commands (assuming you have both Git and Buildr installed):

  git clone git://github.com/djspiewak/scala-stm.git
  cd scala-stm
  buildr

In a flurry of activity besides the addition of MVCC, I have also added some reasonably-comprehensive BDD specs to ensure that the correctness of the implementation isn’t just in my head.  Of course, all of the tests are probabilistic, but I think the library is finally to a point where they can be relied upon.  If you like, I’ll even distribute a proper pre-built JAR.

STM is hardly in its infancy, but it is just now starting to catch on as a viable alternative to the classic locking techniques favored by concurrent programs everywhere.  Improvements like MVCC promise even better reliability and throughput, further advancing the goal of deterministic asynchronous programs which are easy to develop and even easier to reason correct.

1 There actually is another solution involving conflict detection on the fly, but it’s a little inefficient and not really as subjectively elegant as MVCC.

Comments

  1. Thanks you really much for these two articles, which were just great !

    Well, now that I wrote it down, I see that this comment actually contains almost 0 informations, but it was a pleasure to read these articles :)

    Fanf Monday, November 10, 2008 at 2:15 am
  2. The reason Clojure’s STM stores history on the refs themselves is because it is designed for maximum transaction independence. Multiple transactions that share no refs can commit in parallel.

    Any kind of global commit lock or list of active transactions is going to limit your scalability by becoming a bottleneck for your entire STM. Clojure’s STM has neither of these.

    Rich Hickey Monday, November 10, 2008 at 6:26 am
  3. @Rich

    Actually, it’s even worse than I showed! CommitLock isn’t the only locking going on, I also lock `world` (the transaction log) due to the fact that commits can (and do) occur in different a different thread from the one executing within the transaction. Storing the new versions within the reference cells would solve this problem.

    I attempted to implement an extremely granular locking strategy for references (avoiding CommitLock), but I ran into troubles with lock ordering and deadlock. It’s something I intend to revisit, but for now we’re stuck with BIG_SHARED_LOCK.

    As to the throughput, I’m not entirely convinced that STM is the best approach for improving concurrent performance. There’s quite a bit of research out there which suggests that the overhead imposed by the STM generally outweighs any superior performance it can bring. Furthermore, in systems with a high rate of data collision, STM will be *exponentially* slower due to the need for constant retry of transactions. Pragmatically, I just don’t see it as a better “solution” than granular locking.

    On the flip side of that, an STM is undeniably easier to use, which is part of why I think it’s an interesting technique. It is far more intuitive to structure your code into modular, transactional operations on reference cells than it would be to attempt to keep track of dozens of separate locks.

    Daniel Spiewak Monday, November 10, 2008 at 9:55 am
  4. Have you heard of JVMSTM (http://web.ist.utl.pt/~joao.cachopo/jvstm/ )? The Ph.D. thesis in which it’s described is very readable and from what I’ve read so far it seems like it should scale pretty well.

    Alex Saturday, November 22, 2008 at 4:55 pm
  5. @Alex

    Very interesting. The implementation of JVMSTM is remarkably close to the way that Clojure’s STM works, right down to the linked list for reference versioning. I wonder if this is where Rich got his primary inspiration for for that aspect of the language.

    One of the things that the paper mentions is the fact that both read-only and write-only transactions may be committed without the overhead of validation. This is a feature I have added to the Scala STM in Git.

    Daniel Spiewak Saturday, November 22, 2008 at 5:20 pm
  6. An alternative solution to the problem (from Wikipedia)

    A commit-time scheme implemented by Dice, Shalev, and Shavit uses a global version clock. Every transaction starts by reading the current value of the clock and storing it as the read-version. Then, on every read or write, the version of the particular memory location is compared to the read-version; and, if it’s greater, the transaction is aborted. This guarantees that the code is executed on a consistent snapshot of memory. During commit, all write locations are locked, and version numbers of all read and write locations are re-checked. Finally, the global version clock is incremented, new write values from the log are written back to memory and stamped with the new clock version.

    Ben Harris Thursday, August 20, 2009 at 5:50 am
  7. @Rich

    If you have 1 global list of transactions that you only add to and never look at again, you will be able to use the garbage collector to magically remove old versions from your Ref’s history list. Thus you won’t have faults when you read the Ref and you won’t need to keep more history than absolutely necessary. Engineering trade-off I guess.

    Peter Foster Sunday, May 23, 2010 at 7:03 am
  8. Performance

    Here are some performance issues I’ve seen when playing with a Scala STM I wrote to investigate some STM ideas. This may be helpful for anyone playing with STMs.

    The biggest issues I found is the garbage collector, it can slow my STM from over 7M tx/sec to under 2M tx/sec. The STM uses MVCC with snapshot isolation but doesn’t allow (evil?) write skew; running on a 2.8GHz quad-core i7-860 running Windoze 7 with Java 6u20.

    I found that -XX:+AggressiveHeap and possibly -XX:-UseAdaptiveSizePolicy give good performance initially. It was after I tried this did I start to see good performance and work out that the GC was an issue. I don’t have a perfect GC solution yet, but I’ve been running at 7.5M tx/sec for over 20mins, which is all I need for the moment.

    I guarantee the snapshot will be available for all running transactions. There is no read-fault and consequently transactions that only read will never retry – so you can do println() etc.. inside the transaction. This is great, but it means that doing Thread.sleep(1000) inside a transaction will keep a large amount of history in memory which can’t be good for GC. It’s nice that when there’s no active transactions, there’s no history stored in memory. But the amount of memory being used can vary significantly depending on how long it takes the transaction to run; I suspect this affects some adaptive properties in the GC.

    I use one shared AtomicLong.incrementAndGet() to get the unique transaction number and I don’t need to lock, even for commit() and rollback() – which are done with AtomicReference.compareAndSet(). This seems to make it harder for GC to run effectively if all processors are used constantly running transactions. Leaving one “processor” free gives better overall performance and the GC seems much happier, even though the system is now running at 90% instead of 100% load.

    Also, since I can detect conflict between transactions early, I have a serialize-on-conflict mode, where one transaction performs a timed-wait on the conflicting transaction – allowing one to finish first before the other continues – hence serializing the transactions. This mode works better when you have many more threads than processors. When running like this, even at 100% load, the GC runs better. It appears that lots of task switches gives the GC more opportunity to be scheduled more often.

    Peter Foster Tuesday, June 8, 2010 at 4:52 am

Post a Comment

Comments are automatically formatted. Markup are either stripped or will cause large blocks of text to be eaten, depending on the phase of the moon. Code snippets should be wrapped in <pre>...</pre> tags. Indentation within pre tags will be preserved, and most instances of "<" and ">" will work without a problem.

Please note that first-time commenters are moderated, so don't panic if your comment doesn't appear immediately.

*
*