Skip to content

More Persistent Vectors: Performance Analysis

1
Sep
2008

In my previous post, I introduced the concept of "persistent vectors" and walked through one implementation of the idea.  When I actually pushed the post out, I was pretty happy with my code, but it seems I still have much to learn.  :-)  A number of very smart people replied, suggesting ways that the implementation could be cleaned up and improved.  Among these intrepid commenters was David MacIver, who correctly pointed out the similarities between my persistent vector and his IntMap class (coming in Scala 2.7.2).  Needless to say, my interest was piqued, so over the course of last week, I spent a fair amount of time going over his implementation as well as the implementations proposed by researchers in the past.

I also took the time to create a proper performance test suite for my vector implementation, one which could compare with other conceptually-similar implementations in a controlled and repeatable environment.  The results were both interesting and instructive, so I figured I would take the time to share them here.

Essentially, the performance suite runs through six tests, each of which designed to illuminate either a strength or a weakness in the underlying implementation.  These tests are run against four different classes:

  • Vector (my persistent implementation)
  • ArrayBuffer (from scala.collection.mutable)
  • IntMap (David MacIver)
  • Map

The addition of the last test target was more curiosity than anything else.  I wanted to see just how improved was IntMap over Map for integer keys.  The results turned out to be rather surprising:

  Time Memory
  Vector ArrayBuffer IntMap Map Vector ArrayBuffer IntMap Map
Fill Sequential 190.51ms 15.39ms 37.15ms 115.14ms 67.11 MB 3.93 MB 22.29 MB 12 MB
Fill Random 392.98ms 2028.43ms 128.35ms 103.3ms 64.97 MB 513.59 MB 39.89 MB 10.94 MB
Read Sequential 28.01ms 3.83ms 23.21ms 35.24ms 6.67 MB 0.02 KB 0.02 KB 3.37 MB
Read Random 92.8ms 11.14ms 54.81ms 63.8ms 8.01 MB 0.02 KB 0.02 KB 2.01 MB
Reverse 0.02ms 0.01ms - - 0.09 KB 0.04 KB - -
Compute Length 0.01ms 0.01ms 5.11ms 0.3ms 0.02 KB 0.02 KB 0.02 KB 2.23 MB

As you can see, IntMap triumphed over the other immutable data structures (including Vector) at just about every turn.  To understand why this is surprising, we need to spend a little time examining the theoretical properties of the two primary implementations: IntMap and Vector.

Patricia Tries

I already spent a fair bit of time explaining the concept of partitioned tries in the previous article, so I’m not going to reiterate all of that information here.  In a nutshell, the implementation of Vector is based upon the concept of a trie with an extremely high branching factor where the path to each node encodes its index.  Unlike List, the data structure is not fully persistent, meaning that some data copying must take place upon insert.  Specifically, a new array of branches must be allocated for the specific parent node of the inserted value and so on recursively on to the root.  The advantage to this partially-persistent implementation is that we can take advantage of the constant access time afforded by the use of arrays under the surface.  The unfortunate truth is that fully persistent data structures do not offer constant access time (at least, none that I know of), and thus are generally unsuitable for implementing random-access vectors.

The idea for this implementation comes originally from Phil Bagwell (interestingly enough, a researcher at LAMP, Martin Odersky’s research department at EPFL) in a paper entitled "Ideal Hash Trees".  His original concept though was for a more efficient hash table data structure, not necessarily with immutability as a requirement.  This concept was then adapted by Rich Hickey for his language, Clojure.  Finally, I expanded upon Clojure’s persistent vectors somewhat by changing their trie distribution from little- to big-endian and wrote up the result in Scala.  There are some other minor differences between Hickey’s design and my own, but the data structures are essentially identical.

Like my Vector implementation, the idea for IntMap began its life as a research paper, this time by Chris Okaski and Andrew Gill.  This paper is quite an interesting read if you’ve got a spare afternoon, although it does make use of SML rather than Scala as a base language.  In summary, the idea was to create an efficient, persistent map structure for integer keys.  Superficially, this sounds quite similar to Hickey’s modification of Bagwell’s concept, but there are many important distinctions below the surface.

At the extremely lowest level, IntMap actually makes use of a structure known as a "Patricia trie" with a fixed branching factor of two.  Much like Vector, IntMap encodes the key (index) of the data node within its path.  This encoding is based on the individual bits of the index.  However, unlike Vector, the ordering is little-endian.  Also, to avoid needlessly growing trees to absurd depths, linear common sub-keys are merged into a single prefix node.  This is what differentiates Patricia tries.  To illustrate, if our branching factor were 10 and we were to store at indexes 1234 and 2234, the common prefix "234" would be represented in a single node, rather than three separate nodes trivially linked to one-another.

This use of a low branching factor in the trie is extremely useful when performing insertions.  Specifically, more of the trie structure is preserved untouched from one modification to another.  Literally, IntMap is more persistent than Vector.  While this is great for writes, it does make read times a little less efficient.  Specifically, IntMap reads are worst-case O( log2(k) ), where k is the index in question.  For random data input, the average case is reduced somewhat by the prefix collapsing, but this should not be too significant.

By contrast, Vector uses an extremely high branching factor (by default) and so offers read efficiency which is O( logb(k) ), where k is the index and b is the branching factor.  Due to the practical limitations imposed on integer length, this translates into an upper-bound of O(7), which is (for all intents and purposes) constant.  Unfortunately, this analysis does not seem to be born-out by the performance data.

Possible Explanation

The only answer I can think of to explain the disparity between IntMap and Vector (both in time and in space utilization) is the use of a List[Int] in Vector to find the target node in the data structure.  This List is required primarily because I wanted the data distribution in the trie to be optimized for sequential access, therefore requiring the trie encoding to be big-endian on the index rather than little-endian.  The unfortunate truth is there’s no clean mathematical method (that I know of) which would allow the deconstruction of a number based on its most significant value in an arbitrary base.  In fact, the implementation of computePath (as suggested by Jules) actually cheats and makes use of the fact that persistent List(s) are constructed from the tail-end:

@inline
def computePath(total: Int, base: Int) = {
  if (total < 0) {
    throw new IndexOutOfBoundsException(total.toString)
  } else {
    var back: List[Int] = Nil
    var num = total
 
    do {
      back = (num % base) :: back
      num /= base
    } while (num > 0)
 
    back
  }
}

As efficient as this method is on most modern processors, it can never be as fast as a simple bit-masking operation.  Not only that, but it requires the creation of massive numbers of small, immutable objects (cons cells).  I believe that it is this instantiation overhead which is eating up the extra memory on reads and killing the overall performance.

Unfortunately, I can’t seem to conceive a better way of doing big-endian data distribution.  So are there any clever mathy people out there who have a brilliant way of deconstructing the index head-first rather than from the tail end?  If I could do that, then I could remove the List entirely from the implementation and rely instead on in-place calculations.  Maybe then I could catch up with David’s blisteringly fast IntMap:-)

Implementing Persistent Vectors in Scala

26
Aug
2008

Oh yeah, we’re really getting into Digg-friendly titles now, aren’t we?

The topic of persistent vectors is one of those odd backwaters of functional programming that few dare to approach.  The basic idea behind it all is to try to devise an immutable data structure which has the performance characteristics of a mutable vector.  What this means for practical application is that you shouldn’t have to deal with O(n) efficiency on random access like you do with List(s).  Instead, accessing arbitrary indexes should be constant time (O(1)), as should computing its length.  Additionally, modifying an arbitrary index should be reasonably efficient - as close to constant-time as possible.

Of course, the word “modifying” is a trifle inaccurate as it implies a mutable data structure.  One of the primary qualifications for a purely functional vector is that it is completely immutable.  Any changes to the vector result in the creation of a new vector, rather than modifying the old.  Basically, it’s an immutable data structure just like any other, but one which retains the brutal efficiency of its mutable counterpart.

Unfortunately, this turns out to be a very tough nut to crack.  A number of researchers have attempted different strategies for solving the problem, none of which have been entirely successful.  Rich Hickey, the creator of Clojure, has a brilliant presentation that essentially describes the solution I have chosen.  For the impatient, the good stuff starts at about 36:00 and lasts for about ten minutes.  I’ll elaborate on the problems with functional vectors a bit more in a second, but first a bit of motivational propaganda…

Thou Shalt Not Mutate

There is a single principle which should be drilled into skulls of all programmers everywhere: mutable data structures are bad news.  Don’t get me wrong, I love ArrayList as much as the Java addict, but such structures cause serious problems, particularly where concurrency is concerned.  We can consider the trivial example where two threads are attempting to populate an array simultaneously:

private String[] names = new String[6];
private int index = 0;
 
public static void main(String[] args) {
    Thread thread1 = new Thread() {
        public void run() {
            names[index++] = "Daniel";
            names[index++] = "Chris";
            names[index++] = "Joseph";
        }
    };
 
    Thread thread2 = new Thread() {
        public void run() {
            names[index++] = "Renee";
            names[index++] = "Bethany";
            names[index++] = "Grace";
        }
    };
 
    thread1.start();
    thread2.start();
 
    thread1.join();
    thread2.join();
 
    for (String name : names) {
        System.out.println(name);
    }
}

What does this snippet print?  I don’t know.  It’s actually indeterminate.  Now we can guess that on most machines the result will be essentially interleaved between the two threads, but there is no way to guarantee this.  Part of the reason for this is the fact that arrays are mutable.  As such, they enable (and indeed, encourage) certain patterns which are highly destructive when employed asynchronously.

However, concurrency is not even the only motivation for immutable data structures.  Consider the following example:

List<String> names = ...
for (String name : names) {
    names.add(name.toUpperCase());
}

I’m sure all of us have done something like this, most likely by accident.  The result is (of course) a ConcurrentModificationException caused by the fact that we are attempting to add to a List while we are iterating over its contents.  I know that the first time I was faced with this error message I became extremely confused.  After all, no threads are being employed, so why is this a problem?

Iterators are extremely dependent on the internal state of their data structure.  Anyone who has ever implemented an iterator for a linked list or (even better) a tree will attest to this fact.  This means that generally speaking, there is no way for an iterator to guarantee correctness if that structure is changing out from underneath it (so to speak).  Things may be fine in a linear structure like a list, but as soon as you get into anything non-linear like a tree or a hash table it becomes difficult to even define what the “correct” behavior should be.  Think about it; should the iterator backtrack to hit the missing elements?  Should this backtracking include elements which have already been consumed?  What if the order changes dramatically and pre-consumed elements are now ahead of the current index?  There are a whole host of problems associated with iterating over mutable data structures, and so rather than vainly attempt to solve these issues in a sane and consistent manner, JDK collections simply throw an exception.

All of this becomes moot once you start using immutable data structures.  There is no way to modify a structure while iterating over it because there is no way to modify the structure at all!  Concurrency is not an issue because without any mutable state to require locking, every thread can operate simultaneously on the structure.  Not only is it thread safe, but it is unsynchronized and thread safe.  Immutable data structures retain all of the asynchronous throughput of non-locking implementations without any of the race conditions and non-determinacy which usually results.

A Brief Note on Idioms

At this point, the question you must be asking yourself is: “So if the data structure is immutable, what good is it?”  The answer is “for reading”.  Data structures spend most of their lives being read and traversed by other code.  Immutable data structures can be read in exactly the same fashion as mutable ones.  The trick of course is constructing that data structure in the first place.  After all, if the data structure is completely immutable, where does it come from?  A simple example from a prior article is sufficient to demonstrate both aspects:

def toSet[T](list: List[T]): Set[T] = list match {
  case hd :: tail => hd + toSet(tail)
  case Nil => Set[T]()
}

This is neither the most concise nor the most efficient way to accomplish this task.  The only purpose served by this example is to illustrate that it is very possible to build up immutable data structures without undue syntactic overhead.  You’ll notice that every time we want to “change” a data structure - either removing from the list or adding to the set - we use a function call and either pass or return the modified structure.  In essence, the state is kept entirely on the stack, with each new version of the data structure in question becoming the “changed” version from the previous operation.

This idiom is actually quite powerful and can be applied to even more esoteric (and less uniformly iterative) operations.  As long as you are willing to let execution jump from one function to the next, it becomes extremely natural to deal with such immutability.  In fact, you start to think of immutable data structures as if they were in fact mutable, simply due to the fact that you are idiomatically “modifying” them at each step.  Note that this pattern of modifying data between functions is critical to actor-based programming and any seriously concurrent application design.

Problems Abound

For the sake of argument, let’s assume that my pitiful arguments have convinced you to lay aside your heathen mutating ways and follow the path of functional enlightenment.  Let’s also assume that you’re consumed with the desire to create an application which tracks the status of an arbitrary number of buttons.  These buttons may be pressed in any order regardless of what other buttons are already pressed.  Following the path of immutability and making use of the patron saint of persistent data structures (List), you might come up with the following solution:

class ButtonStrip private (buttons: List[Boolean]) {
  def this(num: Int) = {
    this((0 until num).foldLeft(List[Boolean]()) { (list, i) =>
      false :: list
    })
  }
 
  def status(index: Int) = buttons(index)
 
  def push(index: Int) = modify(index, true)
 
  def unpush(index: Int) = modify(index, false)
 
  /**
   * Modify buttons list and return a new ButtonStrip with the results.
   */
  private def modify(index: Int, status: Boolean) = {
    val (_, back) = (buttons :\ (buttons.length - 1, List[Boolean]())) { (tuple, button) =>
      val (i, total) = tuple
      (if (i == index) status else button) :: total
    }
    new ButtonStrip(back)
  }
}

This is a horrible mess of barely-idiomatic functional code.  It’s difficult to read and nearly impossible to maintain; but it’s purely immutable!  This is not how you want your code to look.  In fact, this is an excellent example of what David MacIver would call “bad functional code“.

Perhaps even worse than the readability (or lack thereof) is the inefficiency of this code.  It’s terribly slow for just about any sort of operation.  Granted, we can imagine this only being used with a list of buttons of limited length, but it’s the principle of the thing.  The fact is that we are relying on a number of operations which are extremely inefficient with lists, most prominently length and apply() (accessing an arbitrary index).  Not only that, but we’re recreating the entire list every time we change the status of a single button, something which is bad for any number of reasons.

What we really need here is a random-access structure, something which allows us to access and “change” any index with some degree of efficiency.  Likely the most intuitive thing to do here would be to just use a good ol’ array of Boolean(s) and make a new copy of this array any time we need to change something.  Unfortunately, this is almost as bad as our copying the list every time.  Normally, when you use an immutable data structure, modifications do not require copying large amounts of data.  Our toSet example from above uses almost zero data copying under the surface, due to the way that Set and List are implemented.

Specifically, Set and List are persistent data structures.  This doesn’t mean that they live on a filesystem.  Rather, the term “persistent” refers to the fact that each instance of the collection may share significant structure with another instance.  For example, prepending an element onto an immutable list yields a new list which consists of the new element and a tail which is precisely the original list.  Thus, each list contains its predecessor (if you will) within itself.  List is an example of a fully persistent data structure; not everything can be so efficient.  Set and Map for example are usually implemented as some sort of tree structure, and so insertions require some data copying (specifically, the parent nodes).  However, this copying is minimized by the nature of the structure.  This notion of persistence in the data structure works precisely because these structures are immutable.  If you could change an element in a persistent data structure it would likely result in the modification of that same element in totally disparate instances of that structure across the entire runtime (not a pleasant outcome).

So List is persistent, arrays are not.  Even if we treat arrays as being completely immutable, the overhead of copying a potentially huge array on each write operation is rather daunting.  What we need is some sort of data structure with the properties of an array (random access, arbitrary modification) with the persistent properties of a list.  As I said before, this turns out to be a very difficult problem.

Partitioned Tries

One solution to this problem which provides a compromise between these two worlds is that of a partitioned trie (pronounced “try” by all that is sane and holy).  In essence, a partitioned trie is a tree of vectors with a very high branching factor (the number of children per node).  Each vector is itself a tree of vectors and so on.  Note that these are not not like the binary search trees that every had to create back in college, partitioned tries can potentially have dozens of branches per node.  As it turns out, it is this unusual property which makes these structures so efficient.

To get a handle on this concept, let’s imagine that we have a trie with a branching factor of 3 (much smaller than it should be, but it’s easier to draw).  Into this vector we will insert the following data:

 Index 
Data
0
Daniel
7
Chris
2
Joseph
4
Renee
1
Bethany
3
Grace
9
Karen
13
Larry
5
Moya

After all the jumbling necessary to make this structure work, the result will look like the following:

image

It’s hard to see where the “trie” part of this comes into play, so bear with me.  The important thing to notice here is the access times for indexes 0-2: it’s O(1).  This is of course a tree, so not all nodes will be one step away from the root, but the point is that we have achieved constant-time access for at least some of the nodes.  Mathematically speaking, the worst-case access time for any index n is O( log3(n) ).  Not too horrible, but we can do better.

First though, we have to lay some groundwork.  I said that the structures we were working with are tries rather than normal trees.  A trie implies that the key for each element is encoded in the path from the root to that node.  So far, it just appears that we have built a fancy tree with numeric keys and a higher branching factor than “normal”.  This would be true if all we were given is the above diagram, but consider the slightly modified version below:

image

Structurally, nothing has changed, but most of the edges have been renumbered.  It is now a bit more apparent that each node has three branches numbered from 0 to 2.  Also, with a little more thought, we can put the rest of the pieces together.

Consider the “Moya” node.  In our input table, this bit of data has an index of 5.  To find its “index” in our renumbered trie, we follow the edges from the root down to the destination node, arriving at a value of 12.  However, remember that each node has only 3 branches.  Intuitively, we should be thinking about base-3 math somewhere about now.  And indeed, converting 12 into base-3 yields a final value of 5, indicating that the index of the node is indeed encoded in the path from the root based on column.  By the way, this works on any node (try it yourself).  The path to “Karen” is 100, which converted into base-3 becomes 9, the input index of the element.

This is all fine and dandy, but we haven’t really solved our original problem yet: how to achieve constant-time access to arbitrary indexes in a persistent structure.  To really approach a solution to our problem, we must increase the branching factor substantially.  Rather than working with a branching factor of 3 (and thus, O( log3(n) ) efficiency), let’s dial the branching factor up to 32 and see what happens.

The result is completely undiagramable; but it does actually provide constant time access for indexes between 0 and 31.  If we were to take our example data set and input it into our revised trie, the result would be a single layer of nodes, numbered at exactly their logical index value.  In the worst case, the efficiency of our more complicated trie is O( log32(n) ).  Generally speaking, we can infer (correctly) that for any branching factor b and any index n, the lookup efficiency will be precisely logb(n).  As we increase the branching factor, the read-efficiency of our trie increases exponentially.  To put a branching factor of 32 into perspective, this means that the algorithmic complexity of accessing index 1,000,000 would only be 3.986!  That’s incredibly small, especially given the sheer magnitude of the index in question.  It’s not technically constant time, but it’s so incredibly small for all conceivable indexes that we can just pretend that it is.  As Rich Hickey says:

…when it’s a small enough value, I really don’t care about it.

So that takes care of the reading end of life, but what about writing?  After all, if all we needed was constant time lookups, we could just use an array.  What we really need to take care to do is ensure that modifications are also as fast as we can make them, and that’s where the tricky part comes in.

We can think of an array as a partitioned trie with a branching factor of infinity.  When we modify an array immutably, we must copy every element from the original array into a new one which will contain the modification.  This contrasts with List - effectively, a partitioned trie with a branching factor of 1 - which in the best case (prepending) requires none of the elements to be copied.  Our 32-trie is obviously somewhere in between.  As I said previously, the partitioned trie doesn’t really solve the problem of copying, it just compromises on efficiency somewhat (the difference between 1 and 3.986).

The truth is that to modify a partitioned trie, every node in the target sub-trie must be copied into a new subtrie, which then forces the copying of its level and so-on recursively until the root is reached.  Note that the contents of the nodes are not being copied, just the nodes themselves (a shallow copy).  Thus, if we return to our example 3-trie from above and attempt to insert a value at index 12, we will have to copy the “Larry” node along with our new node to form the children of a copy of the “Renee” node.  Once this is done, the “Grace” and “Moya” nodes must also be copied along with the new “Renee” to form the children of a new “Bethany”.  Finally, the “Daniel” and “Joseph” nodes are copied along with the new “Bethany” to form the children of a new root, which is returned as the modified trie.  That sounds like a lot of copying, but consider how much went untouched.  We never copied the “Karen” or the “Chris” nodes, they just came over with their parent’s copies.  Instead of copying 100% of the nodes (as we would have had it been an array), we have only copied 80%.  Considering that this was an example contrived to force the maximum copying possible, that’s pretty good!

Actually, we can do even better than this by storing the children of each node within an array (we would have to do this anyway for constant-time access).  Thus, only the array and the modified nodes need be copied, the other nodes can remain untouched.  Using this strategy, we further reduce the copying from 80% to 30%.  Suddenly, the advantages of this approach are becoming apparent.

Now of course, the higher the branching factor, the larger the arrays in question and so the less efficient the inserts.  However, insertion is always going to be more efficient than straight-up arrays so long as the inserted index is greater than the branching factor.  Considering that most vectors have more than 32 elements, I think that’s a pretty safe bet.

Implementation

I bet you thought I was going to get to this section first.  Foolish reader…

Once we have all this theoretical ground-work, the implementation just falls into place.  We start out with a Vector class parameterized covariantly on its element type.  Covariant type parameterization just means that a vector with type Vector[B] is a subtype of Vector[A] whenever B is a subtype of AList works this way as well, as do most immutable collections, but as it turns out, this sort of parameterization is unsafe (meaning it could lead to a breakdown in the type system) when used with mutable collections.  This is part of why generics in Java are strictly invariant.

Coming back down to Earth (sort of), we consider for our design that the Vector class will represent the partitioned trie.  Since each child node in the trie is a trie unto itself, it is only logical to have each of the nodes also represented by Vector.  Thus, a Vector must have three things:

  • data
  • length
  • branches

Put into code, this looks like the following:

class Vector[+T] private (private val data: Option[T], 
      val length: Int, private val branches: Seq[Vector[T]]) extends RandomAccessSeq[T] {
  def this() = this(None, 0, new Array[Vector[T]](BRANCHING_FACTOR))
 
  // ...
}

RandomAccessSeq is a parent class in the Scala collections API which allows our vector to be treated just like any other collection in the library.  You’ll notice that we’re hiding the default constructor and providing a no-args public constructor which instantiates the default.  This only makes sense as all of those fields are implementation-specific and should not be exposed in the public API.  It’s also worth noting that the branches field is typed as Seq[Vector[T]] rather than Array[Vector[T]].  This is a bit of a type-system hack to get around the fact that Array is parameterized invariantly (as a mutable sequence) whereas Seq is not.

With this initial design decision, the rest of the implementation follows quite naturally.  The only trick is the ability to convert an index given in base-10 to the relevant base-32 (where 32 is the branching factor) values to be handled at each level.  After far more pen-and-paper experimentation than I would like to admit, I finally arrived at the following solution to this problem:

def computePath(total: Int, base: Int): List[Int] = {
  if (total < 0) {
    throw new IndexOutOfBoundsException(total.toString)
  } else if (total < base) List(total) else {
    var num = total
    var digits = 0
    while (num >= base) {
      num /= base
      digits += 1
    }
 
    val rem = total % (Math.pow(base, digits)).toInt
 
    val subPath = computePath(rem, base)
    num :: (0 until (digits - subPath.length)).foldRight(subPath) { (i, path) => 0 :: path }
  }
}

As a brief explanation, if our branching factor is 10 and the input index (total) is 20017, the result of this recursive function will be List(2, 0, 0, 1, 7).  The final step in the method (dealing with ranges and folding and such) is required to solve the problem of leading zeroes dropping off of subsequent path values and thus corrupting the final coordinates in the trie.

The final step in our implementation (assuming that we’ve got the rest correct) is to implement some of the utility methods common to all collections.  Just for demonstration, this is the implementation of the map function.  It also happens to be a nice, convenient example of good functional code.  :-)

override def map[A](f: (T)=>A): Vector[A] = {
  val newBranches = branches map { vec =>
    if (vec == null) null else vec map f
  }
 
  new Vector(data map f, length, newBranches)
}

Properties

Before moving on from this section, it’s worth noting that that our implementation of the vector concept has some rather bizarre properties not held by conventional, mutable vectors.  For one thing, it has a logically infinite size.  What I mean by this is it is possible to address any positive integral index within the vector without being concerned about resize penalties.  In fact, the only addresses which throw an IndexOutOfBoundsException are negative.  The length of the vector is defined to be the maximum index which contains a valid element.  This actually mirrors the semantics of Ruby’s Array class, which also allows placement at any positive index.  Interestingly enough, the efficiency of addressing arbitrary indexes is actually worst-case much better in the persistent trie than it is in a conventional amortized array-based vector.

Vectored Buttons

Since we now have an immutable data structure with efficient random-access, we may as well rewrite our previous example of the button strip using this structure.  Not only is the result far more efficient, but it is also intensely cleaner and easier to read:

class ButtonStrip private (buttons: Vector[Boolean]) {
  def this(num: Int) = this(new Vector[Boolean])     // no worries about length
 
  def status(index: Int) = buttons(index)
 
  def push(index: Int) = new ButtonStrip(buttons(index) = true)
 
  def unpush(index: Int) = new ButtonStrip(buttons(index) = false)
}

You’ll notice that the update method is in fact defined for Vector, but rather than returning Unit it returns the modified Vector.  Interestingly enough, we don’t need to worry about length allocation or anything bizarre like that due to the properties of the persistent vector (infinite length).  Just like arrays, a Vector is pre-populated with the default values for its type.  In the case of most types, this is null.  However, for Int, the default value is 0; for Boolean, the default is false.  We exploit this property when we simply return the value of dereferencing the vector based on any index.  Thus, our ButtonStrip class actually manages a strip of infinite length.

Conclusion

The truth is that we didn’t even go as far as we could have in terms of optimization.  Clojure has an implementation of a bit-partitioned hash trie which is basically the same thing (conceptually) as the persistent vector implemented in this article.  However, there are some important differences. 

Rather than jumping through strange mathematical hoops and hacky iteration to develop a “path” to the final node placement, Clojure’s partitioned trie uses bit masking to simply choose the trailing five bits of the index.  If this node is taken, then the index is right-shifted and the next five bits are masked off.  This is far more efficient in path calculation, but it has a number of interesting consequences on the final shape of the trie.  Firstly, the average depth of the tree for random input is less, usually by around 1 level (on average).  This means that the array copying on insert must occur less frequently, but must copy more references at each step.  Literally, the trie is more dense.  This probably leads to superior performance.  Unfortunately, it also requires that the index for each value be stored along with the node, requiring more memory.  Also, this sort of “bucket trie” (to coin a phrase) is a little less efficient in lookups in the average case.  Not significantly so, but the difference is there.  Finally, this masking technique requires that the branching factor be a multiple of 2.  This isn’t such a bad thing, but it does impose a minor restriction on the flexibility of the trie.

Most importantly though, Clojure’s trie uses two arrays to contain the children: one for the head and one for the tail.  This is a phenomenally clever optimization which reduces the amount of array copying by 50% across the board.  Rather than having a single array of length 32, it has two arrays of length 16 maintained logically “end to end”.  The correct array is chosen based on the index and recursively from there on down the line.  Naturally, this is substantially more efficient in terms of runtime and heap cost on write.

At the end of the day though, these differences are all fairly academic.  Just having an idiomatic partitioned trie for Scala is a fairly significant step toward functional programming relevance in the modern industry.  With this structure it is possible to still maintain lightning-fast lookup times and decent insertion times without sacrificing the critical benefits of immutable data structures.  Hopefully structures like this one (hopefully, even better implementations) will eventually find their way into the standard Scala library so that all may benefit from their stream-lined efficiency.

Update

The implementation of Vector which I present in this article is inherently much less efficient than the one Rich Hickey created for Clojure.  I finally broke down and created a line-for-line port from Clojure’s Java implementation of PersistentVector into a Scala class.  I strongly suggest that you use this (much faster) implementation, rather than my own flawed efforts.  :-)  You can download the improved Vector here: Vector.scala.

How Do You Apply Polyglotism?

18
Aug
2008

For the past two years or so, there has been an increasing meme across the developer blogosphere encouraging the application of the polyglot methodology.  For those of you who have been living under a rock, the idea behind polyglot programming is that each section of a given project should use whatever language happens to be most applicable to the problem in question.  This makes for a great topic for arm-chair bloggers, leading to endless pontification and flame-wars on forum after forum, but it seems to be a bit more difficult to apply in the real world.

The fact is that very few companies are open to the idea of diversity in language selection.  Just look at Google, one of the most open-minded and developer-friendly companies around.  They employ some of the smartest people I know, programmers who have actually invented languages with wide-scale adoption.  However, this same company mandates the use of a very small set of languages including Python, Java, C++ and JavaScript.  If a company like Google can’t even bring itself to dabble in language diversity, what hope do we have for the Apples of the world?

A few months ago, I received an internal email from the startup company where I work.  This email was putting forth a new policy which would restrict all future developments to one of two languages: PHP or Java.  In fact, this policy went on to push for the eventual rewrite of all legacy projects which had been written in other languages including Objective-C, Ruby, Python and a fair number of shell scripts.  I was utterly flabbergasted (to say the least).  A few swift emails later, we were able to come to a more moderate position, but the prevailing attitude remains extremely focused on minimizing the choice of languages.

To my knowledge, this sort of policy is fairly common in the industry.  Companies (particularly those employing consultants) seem to prefer to keep the technologies employed to a minimum, focusing on the least-common denominator so as to reduce the requirements for incoming developer skill sets.  This is rather distressing to me, because I get a great deal of pleasure out of solving problems differently using alternative languages.  For example, I would have loved to build the clustering system at my company using the highly-scalable actor model with Scala, but the idea was shot down right out of the gate because it involved a non-mainstream language.  To be fair to my colleagues, the overall design involved was given more serious consideration, but it was always within the confines of Java, rather than the original actor-driven concept.

There is actually another aspect to this question: assuming you are allowed to use a variety of languages to "get the job done", how do you apply them?  Ola Bini has talked about the various layers of a system, but this is harder to see in practice than it would seem.  How do you define where to "draw the line" between using Java and Scala, or even the more dramatic differences between Java and JRuby or Groovy?  Of course, we can base our decision strictly on lines of code, but in that case, Scala would trump Java every time.  For that matter, Ruby would probably beat out the two of them, and I’m certainly not writing my next large-scale enterprise app exclusively in a dynamic language.

I realize this is somewhat of a cop-out post, just asking a question and never arriving at a satisfactory conclusion, but I would really like to know how other developers approach this issue.  What criteria do you weigh in making the decision to go with a particular language?  What sorts of languages work well for which tasks?  And above all, how do you convince your boss that this is the right way to go?  The floor is open, please enlighten me!  :-)

Case Classes Are Cool

11
Aug
2008

Of all of Scala’s many features, this one has probably taken the most flack over the past year or so.  Not immutable data structures or even structural types, but rather a minor variation on a standard object-oriented construct.  This is more than a little surprising, especially considering how much work they can save when properly employed.

Quick Primer

Before we get into why they’re so nice, we should probably look at what they are and how to use them.  Syntactically, case classes are standard classes with a special modifier: case.  This modifier signals the compiler to assume certain things about the class and to define certain boiler-plate based on those assumptions.  Specifically:

  • Constructor parameters become public “fields” (Scala-style, which means that they really just have an associated accessor/mutator method pair)
  • Methods toString(), equals() and hashCode() are defined based on the constructor fields
  • A companion object containing:
    • An apply() constructor based on the class constructor
    • An extractor based on constructor fields

What this means is that we can write code like the following:

case class Person(firstName: String, lastName: String)
 
val me = Person("Daniel", "Spiewak")
val first = me.firstName
val last = me.lastName
 
if (me == Person(first, last)) {
  println("Found myself!")
  println(me)
}

The output of the above is as follows:

Found myself!
Person(Daniel,Spiewak)

Notice that we’re glossing over the issue of pattern matching and extractors for the moment.  To the regular-Joe object-oriented developer, the really interesting bits are the equals() method and the automatic conversion of the constructor parameters into fields.  Considering how many times I have built “Java Bean” classes solely for the purpose of wrapping data up in a nice neat package, it is easy to see where this sort of syntax sugar could be useful.

However, the above does deserve some qualification: the compiler hasn’t actually generated both the accessors and the mutators for the constructor fields, only the accessors.  This comes back to Scala’s convention of “immutability first”.  As we all know, Scala is more than capable of expressing standard imperative idioms with all of their mutable gore, but it tries to encourage the use of a more functional style.  In a sense, case classes are really more of a counterpart to type constructors in languages like ML or Haskell than they are to Java Beans.  Nevertheless, it is still possible to make use of the syntax sugar provided by case classes without giving up mutability:

case class Person(var firstName: String, var lastName: String)
 
val me = Person("Daniel", "Spiewak")
me.firstName = "Christopher"   // call to a mutator

By prefixing each constructor field with the var keyword, we are effectively instructing the compiler to generate a mutator as well as an accessor method.  It does require a bit more syntactic bulk than the immutable default, but it also provides more flexibility.  Note that we may also use this var-prefixed parameter syntax on standard classes to define constructor fields, but the compiler will only auto-generate an equals() (as well as hashCode() and toString()) method on a case class.

Why Are They Useful?

All of this sounds quite nice, so why are case classes so overly-maligned?  Cedric Beust, the creator of the TestNG framework, even went so far as to call case classes “…a failed experiment”.

From my understanding of Scala’s history, case classes were added in an attempt to support pattern matching, but after thinking about the consequences of the points I just gave, it’s hard for me to see case classes as anything but a failure. Not only do they fail to capture the powerful pattern matching mechanisms that Prolog and Haskell have made popular, but they are actually a step backward from an OO standpoint, something that I know Martin [Odersky] feels very strongly about and that is a full part of Scala’s mission statement.

Well, he’s right…at least as far as the pattern matching bit is involved.  Case classes are almost essential for useful pattern matching.  I say “almost” because it is possible to have pattern matching in Scala without ever using a single case class, thanks to the powerful extractors mechanism.  Case classes just provide some nice, auto-generated magic to speed things along, as well as allowing the compiler to do a bit more checking than would be otherwise possible.

The point that I think Cedric (and others) have missed entirely is that case classes are far more than just a means to get at pattern matching.  Even the most stringent object-oriented developer has to admit that a slick syntax for declaring a data container (like a bean) would be a nice thing to have.  What’s more, Scala’s automatic generation of a companion object for every case class lends itself very nicely to some convenient abstractions.  Consider a scenario I ran into a few months back:

class MainWindow(parent: Shell) extends Composite(parent, SWT.NONE) {
  private lazy val display = parent.getDisplay
 
  private val panels = Map("Foreground" -> ForegroundPanel, 
                           "Background" -> BackgroundPanel, 
                           "Font" -> FontPanel)
 
  setLayout(new FillLayout())
 
  val folder = new TabFolder(this, SWT.BORDER)
  for ((text, make) <- panels) {
    val item = new TabItem(folder, SWT.NONE)
    val panel = make(folder)
 
    item.setText(text)
    item.setControl(panel)
  }
 
  def this() = this(new Shell(new Display()))
 
  def open() {
    parent.open()
    layout()
 
    while (!parent.isDisposed) {
      if (!display.readAndDispatch()) {
        display.sleep()
      }
    }
  }
}
 
case class ForegroundPanel(parent: Composite) extends Composite(parent, SWT.NONE) {
  ...
}
 
case class BackgroundPanel(parent: Composite) extends Composite(parent, SWT.NONE) {
  ...
}
 
case class FontPanel(parent: Composite) extends Composite(parent, SWT.NONE) {
  ...
}

If you ignore the SWT boiler-plate, the really interesting bits here are the Map of panels and the initialization loop for the TabItem(s).  In essence, I am making use of a cute little trick with the companion objects of each of the panel case classes.  These objects are automatically generated by the compiler extending function type: (Composite)=>ForegroundPanel, where ForegroundPanel is replaced by the case class in question.  Because each of these classes extends Composite, the inferred type of panels will be: Map[String, (Composite)=>Composite](actually, I’m cheating a bit and not giving the precise inference, only its effective equivalent)

This definition allows the iteration over the elements of panels, generating a new instance by using the value element as a function taking a Composite and returning a new Composite instance: the desired child panel.  It’s all statically typed without giving up either the convenience of a natural configuration syntax (in the panels declaration) or the familiarity of a class definition for each panel.  This sort of thing would certainly be possible without case classes, but more work would be required on my part to properly declare each companion object by hand.

Conclusion

I think the reason that a lot of staid object-oriented developers tend to frown on case classes is their close connection to pattern matching, a more powerful relative of the much-despised switch/case mechanism.  What these developers fail to realize is that case classes are really much more than that, freeing us from the boiler-plate tyranny of endless getter/setter declarations and the manual labor of proper equals() and toString() methods.  Case classes are the object-oriented developer’s best friend, just no one seems to realize it yet.

Scala Collections for the Easily Bored Part 3: All at Once

4
Aug
2008

In the previous installment of this series, we looked at how Scala’s collections provide common mechanisms for iteration, as well as many higher-order operations in the same conceptual vein.  In this, the third and final part of the series, we will examine some mechanisms for conceptually operating on the collection as a whole.  Thus, rather than transforming individual collection elements, we will be looking at ways to inspect and modify the data structure itself.

Filtration

Stepping out of Scala for a moment, let’s consider the common design paradigm of the relational database.  Imagine that we have defined a table, people, which contains hundreds of thousands of records, assembled by some despotic government.  Now perhaps the secret service in this government wants to dispatch legal enforcers to the residences of all adults with the “wrong” political leaning.  The most natural way to accomplish this task would be to perform an SQL SELECT, filtering by age and politics:

SELECT * FROM people WHERE age > 18 AND affiliation = 'Green Party'

This query would of course return a data set which could be iterated over, performing the necessary operations (in this case, incarceration) for each record.  As it turns out, this sort of use-case is not confined solely to databases.  It is often necessary to restrict or filter a data structure based on certain criteria.  As a trivial example, imagine that we have been passed a List of Int(s) and we want to remove all odd numbers.  We could accomplish this by using the filter method and passing a function value to describe the criterion:

def onlyEven(nums: List[Int]) = {
  nums.filter((n) => n % 2 == 0)
}

If we call the onlyEven method, passing List(1, 2, 3, 4, 5), the result will be List(2, 4), since 2 and 4 are the only numbers into which 2 divides evenly.  As with many of the other common collection operations, Scala includes a syntax sugar for filter.  We can rewrite the previous example using for-comprehensions in the following way:

def onlyEven(nums: List[Int]) = {
  for {
    n <- nums
    if n % 2 == 0
  } yield n
}

This is a little different from the for-comprehensions we have seen already in that we have placed the yield statement outside the braces with the comprehension definition within.  Believe it or not, this syntax is perfectly valid, coming from Haskell’s do-notation.  This construct will be translated at compile time into a corresponding invocation of filter (and map, but that isn’t terribly relevant here) prior to type checking.  This sort of notation can be very convenient for many tasks, similar to LINQ in the .NET languages.

Partition

In the previous example, we looked at how to selectively remove elements from a collection based on a single criterion.  However, sometimes the requirement is not to remove elements, but rather to separate them into different collections.  For example, we might not want to simply throw away all odd elements in a list, it might be useful to keep both even and odd lists on hand for further operation.  This can be accomplished using the partition method:

def separateEven(nums: List[Int]) = {
  nums.partition((n) => n % 2 == 0)
}

You will notice that the signature for partition looks remarkably similar to filter.  This uniformity in the API is by design, making usage both easier to remember and reducing the number of changes necessary to switch between similar operations.  However, despite the similarity in dispatch, partition returns a very different result:

val numbers = List(1, 2, 3, 4, 5, 6, 7)
val sep = separateEven(numbers)
 
sep == (List(2, 4, 6), List(1, 3, 5, 7))    // true

Literally, the partition method splits elements into two different lists, based on the boolean value of the given predicate (in this case, even or odd).  These lists are returned as a tuple, Scala’s idiom for returning multiple values from a single method.

Searching

Having the capability to filter and split an entire list is all well and good, but it is perhaps even more common to need to find a specific element within a collection.  This is most often seen when dealing with sets or maps, but it can also be useful in the context of linear structures such as list and array.  A simple example might be a trivial caching mechanism for database lookups:

val cache = new HashMap[Int, String]
 
def readData(id: Int) = {
  if (cache.contains(id)) {    // gonna find her
    cache(id)
  } else {
    val conn = createConnection()
 
    val back = try {
      val stmt = conn.prepareStatement("SELECT value FROM dataTab WHERE id = ?")
      stmt.setInt(1, id)
 
      val res = stmt.executeQuery()
      if (res.next()) {
        res.getString(1)
      } else {
        null
      }
    } finally {
      conn.close()
    }
 
    cache += (id -> back)
    back
  }
}

Unlike Java, Scala’s collections API is extremely consistent in what methods correspond to what functionality.  The contains method on a Map does in fact search based on key, not value.  However, sometimes the situation calls for a solution which is not so specific.  Looking for a particular element is great (and very efficient on maps and sets), but it isn’t the most general implementation.  A more flexible form of searching would be to match based on a higher-order function (just like filter), rather than an explicit value.  This not only allows searching for a specific element, but it also provides the ability to look for a range.  More generally, the exists method makes it possible to check to see if an element of a given collection satisfies a certain property.

val nums = List(1, 2, 3, 4, 5, 6)
nums.exists((n) => (3 to 5).contains(n))

In this example, we are literally checking the list, nums, for any values which are in the mathematical range [3, 5].  The exists method calls the predicate (our function parameter - or “lambda” if you prefer) for each element in the list until it returns true, at which point the iteration is short circuited.  The predicate itself creates a new Range and checks to see if the specified value is within that range.  As it turns out, Range itself is a collection just like any other, defining the same methods that we’ve come to know at love.

There is one final variation on the “search” theme that is worth examining: find.  While it’s great to know that some element within a collection satisfies a certain property, sometimes it is even more useful to be able to ascertain what element that was.  Thus, rather than returning a Boolean, the find method returns an instance of the Option monad containing the first satisfactory element find, or None if the property remains unsatisfied.  Adapting our example from above yields the following code and its associated result:

val nums = List(1, 2, 3, 4, 5, 6)
val elem = nums.find((n) => (3 to 5).contains(n))
 
elem == Some(3)   // true

Conclusion

Well, it’s been a rather short series, but hopefully still worth reading.  In truth, I skipped a great deal of detail and idiomatic beauty that can be found within the Scala collections API.  While the type definitions could certainly stand improvement in some areas, it is already without a doubt the best-designed collections framework I have ever used (in any language).  Literally, once you figure out how to best employ one collection type, you will have learned them all.  For further reading on the topic, you can always peruse the scaladoc, or alternatively just play around in the Scala interpreter.  Have fun!