Skip to content

More Persistent Vectors: Performance Analysis

1
Sep
2008

In my previous post, I introduced the concept of "persistent vectors" and walked through one implementation of the idea.  When I actually pushed the post out, I was pretty happy with my code, but it seems I still have much to learn.  :-)  A number of very smart people replied, suggesting ways that the implementation could be cleaned up and improved.  Among these intrepid commenters was David MacIver, who correctly pointed out the similarities between my persistent vector and his IntMap class (coming in Scala 2.7.2).  Needless to say, my interest was piqued, so over the course of last week, I spent a fair amount of time going over his implementation as well as the implementations proposed by researchers in the past.

I also took the time to create a proper performance test suite for my vector implementation, one which could compare with other conceptually-similar implementations in a controlled and repeatable environment.  The results were both interesting and instructive, so I figured I would take the time to share them here.

Essentially, the performance suite runs through six tests, each of which designed to illuminate either a strength or a weakness in the underlying implementation.  These tests are run against four different classes:

  • Vector (my persistent implementation)
  • ArrayBuffer (from scala.collection.mutable)
  • IntMap (David MacIver)
  • Map

The addition of the last test target was more curiosity than anything else.  I wanted to see just how improved was IntMap over Map for integer keys.  The results turned out to be rather surprising:

  Time Memory
  Vector ArrayBuffer IntMap Map Vector ArrayBuffer IntMap Map
Fill Sequential 190.51ms 15.39ms 37.15ms 115.14ms 67.11 MB 3.93 MB 22.29 MB 12 MB
Fill Random 392.98ms 2028.43ms 128.35ms 103.3ms 64.97 MB 513.59 MB 39.89 MB 10.94 MB
Read Sequential 28.01ms 3.83ms 23.21ms 35.24ms 6.67 MB 0.02 KB 0.02 KB 3.37 MB
Read Random 92.8ms 11.14ms 54.81ms 63.8ms 8.01 MB 0.02 KB 0.02 KB 2.01 MB
Reverse 0.02ms 0.01ms - - 0.09 KB 0.04 KB - -
Compute Length 0.01ms 0.01ms 5.11ms 0.3ms 0.02 KB 0.02 KB 0.02 KB 2.23 MB

As you can see, IntMap triumphed over the other immutable data structures (including Vector) at just about every turn.  To understand why this is surprising, we need to spend a little time examining the theoretical properties of the two primary implementations: IntMap and Vector.

Patricia Tries

I already spent a fair bit of time explaining the concept of partitioned tries in the previous article, so I’m not going to reiterate all of that information here.  In a nutshell, the implementation of Vector is based upon the concept of a trie with an extremely high branching factor where the path to each node encodes its index.  Unlike List, the data structure is not fully persistent, meaning that some data copying must take place upon insert.  Specifically, a new array of branches must be allocated for the specific parent node of the inserted value and so on recursively on to the root.  The advantage to this partially-persistent implementation is that we can take advantage of the constant access time afforded by the use of arrays under the surface.  The unfortunate truth is that fully persistent data structures do not offer constant access time (at least, none that I know of), and thus are generally unsuitable for implementing random-access vectors.

The idea for this implementation comes originally from Phil Bagwell (interestingly enough, a researcher at LAMP, Martin Odersky’s research department at EPFL) in a paper entitled "Ideal Hash Trees".  His original concept though was for a more efficient hash table data structure, not necessarily with immutability as a requirement.  This concept was then adapted by Rich Hickey for his language, Clojure.  Finally, I expanded upon Clojure’s persistent vectors somewhat by changing their trie distribution from little- to big-endian and wrote up the result in Scala.  There are some other minor differences between Hickey’s design and my own, but the data structures are essentially identical.

Like my Vector implementation, the idea for IntMap began its life as a research paper, this time by Chris Okaski and Andrew Gill.  This paper is quite an interesting read if you’ve got a spare afternoon, although it does make use of SML rather than Scala as a base language.  In summary, the idea was to create an efficient, persistent map structure for integer keys.  Superficially, this sounds quite similar to Hickey’s modification of Bagwell’s concept, but there are many important distinctions below the surface.

At the extremely lowest level, IntMap actually makes use of a structure known as a "Patricia trie" with a fixed branching factor of two.  Much like Vector, IntMap encodes the key (index) of the data node within its path.  This encoding is based on the individual bits of the index.  However, unlike Vector, the ordering is little-endian.  Also, to avoid needlessly growing trees to absurd depths, linear common sub-keys are merged into a single prefix node.  This is what differentiates Patricia tries.  To illustrate, if our branching factor were 10 and we were to store at indexes 1234 and 2234, the common prefix "234" would be represented in a single node, rather than three separate nodes trivially linked to one-another.

This use of a low branching factor in the trie is extremely useful when performing insertions.  Specifically, more of the trie structure is preserved untouched from one modification to another.  Literally, IntMap is more persistent than Vector.  While this is great for writes, it does make read times a little less efficient.  Specifically, IntMap reads are worst-case O( log2(k) ), where k is the index in question.  For random data input, the average case is reduced somewhat by the prefix collapsing, but this should not be too significant.

By contrast, Vector uses an extremely high branching factor (by default) and so offers read efficiency which is O( logb(k) ), where k is the index and b is the branching factor.  Due to the practical limitations imposed on integer length, this translates into an upper-bound of O(7), which is (for all intents and purposes) constant.  Unfortunately, this analysis does not seem to be born-out by the performance data.

Possible Explanation

The only answer I can think of to explain the disparity between IntMap and Vector (both in time and in space utilization) is the use of a List[Int] in Vector to find the target node in the data structure.  This List is required primarily because I wanted the data distribution in the trie to be optimized for sequential access, therefore requiring the trie encoding to be big-endian on the index rather than little-endian.  The unfortunate truth is there’s no clean mathematical method (that I know of) which would allow the deconstruction of a number based on its most significant value in an arbitrary base.  In fact, the implementation of computePath (as suggested by Jules) actually cheats and makes use of the fact that persistent List(s) are constructed from the tail-end:

@inline
def computePath(total: Int, base: Int) = {
  if (total < 0) {
    throw new IndexOutOfBoundsException(total.toString)
  } else {
    var back: List[Int] = Nil
    var num = total
 
    do {
      back = (num % base) :: back
      num /= base
    } while (num > 0)
 
    back
  }
}

As efficient as this method is on most modern processors, it can never be as fast as a simple bit-masking operation.  Not only that, but it requires the creation of massive numbers of small, immutable objects (cons cells).  I believe that it is this instantiation overhead which is eating up the extra memory on reads and killing the overall performance.

Unfortunately, I can’t seem to conceive a better way of doing big-endian data distribution.  So are there any clever mathy people out there who have a brilliant way of deconstructing the index head-first rather than from the tail end?  If I could do that, then I could remove the List entirely from the implementation and rely instead on in-place calculations.  Maybe then I could catch up with David’s blisteringly fast IntMap:-)

Pipe Dream: Static Analysis for Ruby

30
Jun
2008

Yes, yes I know: Ruby is a dynamic language.  The word “static” is literally opposed to everything the language stands for.  With that said, I think that even Ruby development environments would benefit from some simple static analysis, just enough to catch the really idiotic errors.

Here’s the crux of the problem: people don’t test well.  Even with nice, behavior-driven development as facilitated by frameworks like RSpec, very few developers sufficiently test their code.  This isn’t just a problem with dynamic languages either, no one is safe from the test disability.  In some ways, it’s a product of laziness, but I think in most cases, good developers just don’t want to work on mundane problems.  It’s boring having to write unit test after unit test, checking and re-checking the same snippet of code with different input.

In some sense, it is this problem that compilers and static type systems try to avert, at least partially.  The very purpose of a static type system is to be able to prove certain things about your code simply by analysis.  By enabling the compiler to say things using the type system, the language is providing a safety net which filters out ninety percent of the annoying “no-brainer” mistakes.  A simple example would be invoking a method with the wrong parameters; or worse yet, misspelling the name of the method or type altogether.

The problem is that there are some problems which are more simply expressed in ways which are not provably sound.  In static languages, we get around this by casting, but such techniques are ugly and obviously contrived.  It is this problem which has given rise to the kingdom of dynamic languages; it is for this reason that most scripting languages have dynamic type systems: simple expression of algorithm without worrying about provability.  In fact, there are so many problems which do not fit nicely within most type systems that many developers have chosen to eschew static languages altogether, claiming that static typing just gets in the way.

Unfortunately, by abandoning static types, these languages lose that typo safety net.  It’s too easy to make a trivial mistake in a dynamic language, buried somewhere deep in the bowls of your application.  This mistake could easily be averted by a compiler with validating semantic analysis, but in a dynamic language, such a mistake could go unnoticed, conceivably even making it into production.  For this reason, most dynamic language proponents are also strong advocates of solid, comprehensive testing.  They have to be, for without such testing, one should never trust dynamic code in a production system (or any code, for that matter, but especially the unchecked dynamic variety).

Most large, production systems written in languages like Ruby or Groovy have large test suites which sometimes take hours to run.  These suites are extremely fine-grained, optimally checking every line of code with every possible kind of input, so as to be sure that mistakes are caught.  This is where the flexibility of dynamic typing really comes back to haunt you: extra testing is required to ensure that silly mistakes don’t slip through.  The irony is that a lot of developers using dynamic languages do so to get away from the “nuisance” of compilation, when all they have done is trade one inconvenience for another (testing).

Given this situation, it’s not unreasonable to conclude that what dynamic languages really need is a tool which can look through code and find all of those brain-dead mistakes.  Such a tool could be run along with the normal test suite, finding and reporting errors in much the same way.  It wouldn’t really have to be a compiler, so the tool wouldn’t slow down the development process, it would just be an effective layer of automated white-box testing.

But how could such a thing be accomplished in a language like Ruby?  After all, it is a truly dynamic language.  Methods don’t even exist until runtime, and sometimes only if certain code paths are run.  Types are completely undeclared, and every object can potentially respond to any method.  The answer is to perform extremely permissive inference.

It was actually a recent post by James Ervin on the nomenclature of type systems which got me thinking along these lines.  It should be possible by static analysis to infer the structural type of any value based on its usage.  Consider:

def do_something(obj)
  if obj.to_i == 0
    obj[:test]
  else
    other = obj.find :name => 'Daniel'
    other.to_s
  end
end

Just by examining this code, we can say certain things about the types involved.  For instance, we know that obj must respond to the following methods:

  • to_i
  • [Symbol]
  • find(Hash)

In turn, we know that the find(Hash) method must return a value which defines to_s.  Of course, this last bit of information isn’t very useful, because every object defines that method, but it’s still worth the inference.  The really useful inference which comes out of to_s is the knowledge that this method sometimes returns a value of type String (making the assumption that to_s hasn’t been redefined to return a different type, which isn’t exactly a safe assumption).  At other times, do_something will return whatever value comes from the square bracket operator ([]) on obj.  This bit of information we must remember in the analysis.  We can’t just assume that this method will return a String all the time, even if to_s does because method return types need not be homogeneous in dynamic languages.

Now, at this point we have effectively built up a structural type which is accepted by do_something.  Literally, we have formalized in the analysis what our intuition has already told us about the method.  There are some gaps, but that is to be expected.  The key to this analysis is not attempting to be comprehensive.  Dynamic languages cannot be analyzed as if they were static, one must expect to have certain limitations.  In such situations where the analysis is insufficient, it must assume that the code is valid, otherwise there will be thousands of false positives in the error checking.

So what is it all good for?  Well, imagine that somewhere else in our application, we have the following bit of code:

do_something 42

This is something we know will fail, because we have a simple value (42) which has a nominal type we can easily infer.  A little bit of checking on this type reveals the fact that it does not define square brackets, nor does it define a find(Hash) method.  This finding could be reported as an error by the analysis engine.

Granted, we still have to account for the fact that Ruby has things like method_missing and open classes, but all of this can fall into the fuzzy area of the analysis.  In situations where it might be alright to pass an object which does not satisfy a certain aspect of the structural type, the analysis must let it pass without question.

You can imagine how this analysis could traverse the entire source tree, making the strictest inferences it can and allowing for dynamic fuzziness where applicable.  Since the full sources of every Ruby function, class and module are available at runtime, analysis could be performed without any undue concern regarding obfuscation or parsing of binaries.  Conceivably, most trivial errors could be caught without any tests being written, taking some of the burden off of the developer.  There is a slight concern that developers would build up a false sense of security regarding their testing (or lack thereof), but I think we just have to trust that won’t happen, or won’t last long if it does.

Most advanced Ruby toolsets already have an analysis somewhat similar to the one I outlined.  NetBeans Ruby for example has some fairly advanced nominal type inference to allow things like semantic highlighting and content assist.  But as far as I know, this type inference is only nominal, and fairly local at that.  The structural type inference that I am proposing could conceivably provide far better assurances and capabilities than mere nominal inference, especially if enhanced through successive iteration and a more “global” approach (similar to Hindley/Milner in static languages).

One thing is certain, it isn’t working to just rely on developers being conscientious with their testing.  With the rapid rise in production systems running on dynamic languages, it is in all of our best interests to try to find a way to make these systems more stable and reliable.  The best way to do this is to start with code assurance and try to make it a little less painful to catch mistakes before deployment.

The Brilliance of BDD

9
Jun
2008

As I have previously written, I have recently been spending some time experimenting with various aspects of Scala, including some of the frameworks which have become available.  One of the frameworks I have had the privilege of using is the somewhat unassumingly-titled Specs, and implementation of the behavior-driven development methodology in Scala.

Specs takes full advantage of Scala’s flexible syntax, offering a very natural format for structuring tests.  For example, we could write a simple specification for a hypothetical add method in the following way:

object AddSpec extends Specification {
  "add method" should {
    "handle simple positives" in {
      add(1, 2) mustEqual 3
    }
 
    "handle simple negatives" in {
      add(-1, -2) mustEqual -3
    }
 
    "handle mixed signs" in {
      add(1, -2) mustEqual -1
      add(-1, 2) mustEqual 1
    }
  }
}

We could go on, of course, but you get the picture.  This code will lead to the execution of four separate assertions in three tests (to put things into JUnit terminology).  Fundamentally, this isn’t too much different than a standard series of unit tests, just with a slightly nicer syntax.

Specs defines a domain-specific language for structuring test assertions in a simple and intuitive way.  However, this is hardly the only framework for BDD.  Perhaps the most well-known such framework is RSpec, which answers a similar use-case in the Ruby programming language.  Our previous specification could be rewritten using RSpec as follows:

describe AddLib do
  it 'should handle simple positives' do
    add(1, 2).should == 3
  end
 
  it 'should handle simple negatives' do
    add(-1, -2).should == -3
  end
 
  it 'should handle mixed signs' do
    add(1, -2).should == -1
    add(-1, 2).should == 1
  end
end

The end-result is basically the same: the add method will be tested against the given assertions (all four of them) and the results printed in some sort of report form.  In this area, RSpec is significantly more mature than Specs, generating very slick HTML reports and nicely formatted console output.  This isn’t really a fundamental weakness of the Specs framework however, just indicative of the fact that RSpec has been around for a lot longer.

These two frameworks are interesting of course, but they are merely implementations of a much larger concept: behavior-driven development.  I’ve never been much of a fan of unit testing.  It’s always seemed to be incredibly dull and a very nearly fruitless waste of effort.  As much as I hate it though, I have to bow to the benefits of a self-contained test suite; and so I press on, cursing JUnit every step of the way.

BDD provides a nice alternative to unit testing.  At its core, it is not much different in that test groupings and primitive assertions are used to check all aspects of a test unit against predefined data.  However, there is something about the “flow” of a behavioral spec that is considerably easier to deal with.  For some reason, it is far less painful to devise a comprehensive test suite using BDD principles than conventional unit testing.  It seems a little far-fetched, but BDD actually makes it easier to write (and more importantly, formulate) exactly the same tests.

It’s an odd phenomenon, one which can only be caused by the storyboard flow of the code itself.  It is very natural to think of distinct requirements for a test unit when each of these requirements are being labeled and entered in a logical sequence.  Moreover, the syntax of both Specs and RSpec is such that there is very little boiler-plate required to setup an additional test.  Compare the previous BDD specs with the following JUnit4 example:

public class MathTest {
    @Test
    public void testSimplePositives() {
        assertEquals(3, add(1, 2));
    }
 
    @Test
    public void testSimpleNegatives() {
        assertEquals(-3, add(-1, -2);
    }
 
    @Test
    public void testMixedSigns() {
        assertEquals(-1, add(1, -2));
        assertEquals(1, add(-1, 2));
    }
}

JUnit just requires that much more syntax.  It breaks up the logical flow of the tests and (more importantly) the developer train of thought.  What can be worse is this syntax bloat makes it very tempting to just group all of the assertions into a single test - to save typing if nothing else.  This is problematic because one assertion may shadow all the others in the case of a failure, preventing them from ever being executed.  This can make certain problems much more difficult to isolate.

Logical flow is extremely important to test structure.  BDD frameworks provide a very nice syntax for painlessly defining comprehensive test suites.  The really wonderful thing about all of this is that BDD is available on the JVM, right now.  There’s nothing stopping you from writing your code in Java as you normally would, then creating your test suite in Scala using Specs rather than JUnit.  Alternatively, you could use RSpec on top of JRuby, or Gspec with Groovy.  All of these are seamless replacements for a test framework like JUnit, and requiring of far less syntactic overhead.

The growing move toward polyglot programming encourages the use of a separate language when it is best suited to a particular task.  In this case, several languages are available which offer far more powerful test frameworks than those which can be found in Java.  Why not take advantage of them?

Algorithm Proof Inference in Scala

1
Apr
2008

Anyone who’s written any sort of program or framework knows first-hand the traumas of testing.  The story always goes something like this.  First, you spend six months writing two hundred thousand lines of code which elegantly expresses your intent.  Next, you spend six years writing two hundred million lines of code which tests that your program actually does what you think it does.  Of course, even then you’re not entirely sure you’ve caught everything, so you throw in a few more years writing code which tests your tests.  Needless to say, this is a vicious cycle which usually ends badly for all involved (especially the folks in marketing who promised the client that you would have the app done in six hours).  The solution to this “test overload cycle” is remarkably simple and well-known, but certain problems have constrained its penetration into the enterprise world (that is, until now).  The solution is program proving.

A More Civilized Age

Program proofs are basically a (usually large) set of mathematical expressions which rigorously prove that all accepted program outcomes are correct.  A program prover takes these expressions and performs the appropriate static analysis on your code, spitting out either a “yes” or a “no”.  It’s far more effective than boring old unit testing since you can be absolutely certain that all the bugs have been caught.  More than that, it doesn’t require you to write reams of test code.  All that is required is the series of expressions defining program intent:

\Gamma_m \to \{\Gamma_0 = \textit{input} | m \sub \Gamma_0\} \cup \Sigma^\star

\epsilon = e^{\epsilon \pm \delta}

f \colon \{ x \in [\epsilon, \infty) | x \to \Gamma_x \}

\rho_\delta = f(\Gamma_\delta \bullet \Gamma_\alpha) \Rightarrow \alpha \in \Sigma^\star

\mbox{prove}(\Omega) = \Omega \in \Gamma_\Beta \Rightarrow (\Omega \times \epsilon) \vee \rho_\omega

There, isn’t that elegant?  Much better than some nasty JUnit test suite.  In a happier world, all tests would be replaced by this simple, easy-to-read representation of intent.  Unfortunately, program provers have yet to overcome the one significant stumbling block that has barred them from general adoption: the limitations of the ASCII character set.

Sadly, the designers of ASCII never anticipated the widespread need to express a Greek delta, or to properly render an implication.  Of course, we could always fall back on Unicode, but support for that character set is somewhat lacking, even in modern programming languages.  And so program provers languish in the outer darkness, unable to see the wide-scale adoption they so richly deserve.

An Elegant Weapon

Fortunately, Scala can be the salvation for the program prover.  It’s hybrid functional / object-oriented nature lends itself beautifully to the expression of highly mathematical concepts of intense precision.  Theoreticians have long suspected this, but the research simply has not been there to back it up.  After all, most PhDs write all their proofs on a blackboard, making use of a character set extension to ASCII.  Fortunately for the world, that is no longer an issue.

The answer is to to turn the problem on its ear (as it were) and eschew mathematical expressions altogether.  Instead of the developer expressing intent in terms of epsilon, delta and gamma-transitions, a simple framework in Scala will infer intent just based on the input code.  All of the rules will be built dynamically using an internal DSL, without any need to mess about in low-level character encodings.  Scala is the perfect language for this effort.  Not only does its flexible syntax allow for powerful expressiveness, but it even supports UTF-8 string literals!  (allowing us to fall back on plan B when necessary)

Note that while Ruby is used in the following sample, the proof inference is actually language agnostic.  This is because the parsed ASTs for any language are virtually identical (which is what allows so many languages to run on the JVM).

class MyTestClass
  def multiply(a, b)
    a * b
  end
end
 
obj = MyTestClass.new
puts obj.multiply(5, 6)

Such a simple example, but so many possible bugs.  For example, we could easily misspell the multiply method name, leading to a runtime error.  Also, we could add instead of multiply in the method definition.  There are truly hundreds of ways this can go wrong.  That’s where the program prover steps in.

We define a simple Scala driver which reads the input from stdin and drives the proof inference framework.  The framework then returns output which we print to stdout.

object Driver extends Application {
  val ast = Parser.parseAST(System.in)
 
  val infer = new ProofInference(InferenceStyle.AGRESSIVE)
  val prover = new ProgramProver(infer.createInference(ast))
 
  val output = prover.prove(ast)
  println(output)
}

It’s as simple as that!  When we run this driver against our sample application, we get the following result:

image

Notice how the output is automatically formatted for easy reading?  This feature dramatically improves developer productivity by reducing the time devoted to understanding proof output.  One of the many criticisms leveled against program provers is that their output is too hard to read.  Not anymore!

Of course, anyone can write a program which outputs fancy ASCII art, the real trick is making the output actually mean something.  If there’s a bug in our program, we want the prover to find it and notify us.  To see how this works, let’s write a new Ruby sample with a minor bug:

class Person < AR::Base
end
 
me = Person.find(1)
me.ssn = '123-45-6789'
me.save

It’s an extremely subtle flaw.  The problem here is that the ssn field does not exist in the database.  This Ruby snippet will parse correctly and the Ruby interpreter will blithely execute it until the critical point in code, when the entire runtime will crash.  This is exactly the sort of bug that Ruby on Rails adopters have had to deal with constantly.

No IDE in the world will be able to check this code for you, but fortunately our prover can.  We feed the test program into stdin and watch the prover do its thing:

image

Once again, clear and to the point.  Notice how the output is entirely uncluttered by useless debug traces or line numbers.  The only thing we need to know is that something is wrong, and the prover can tell us that.

Conclusion

I can speak from experience when I say that this simple tool can work wonders on any project.  Catching bugs early in the development cycle is the Holy Grail of software engineering.  By learning there’s a problem early on, effort can be devoted to finding the bug and correcting it.  I strongly recommend that you take the time to check out this valuable aid.  By integrating this framework into your development process, you may save thousands of hours in QA and testing.

Adding Type Checking to Ruby

6
Feb
2008

What’s the first thing you think of when you consider the Ruby Language?  Dynamic types, right?  Ruby is famous (infamous?) for its extremely flexible type system, and as a so-called “scripting language”, the core of this mechanism is a lack of type checking.  This feature allows for some very concise expressions and a great deal of flexibility, but sometimes makes your code quite a bit harder to understand.  More importantly, it weakens the assurances that a certain method will actually work when passed a given value.

Several different solutions have been proposed to workaround this limitation.  The canonical technique involves intensifying tests and increasing test coverage.  Ruby has some excellent unit test frameworks (such as RSpec) which serve to ease the pain associated with this approach, but no matter how you slice it, tests are a pain.  Having to rely on tests to take the place of type checking in the code assurance process can be extremely frustrating.

Another, less common technique is to simply perform dynamic type checks within the method itself.  Like so:

def create_name(fname, lname)
  raise "fname must be a String" unless fname.kind_of? String
  raise "lname must be a String" unless lname.kind_of? String
 
  fname + " " + lname
end

This code explicitly checking the dynamic kind of the parameter values to ensure that they are of type or subtype of String.  The issues with this sample should be relatively obvious.

Primarily, it’s ugly!  This sort of repetitious, boiler-plate conditional checking is exactly the sort of thing Ruby tries to avoid.  What’s more is the added bulk of all of these repetitive checks (assuming you perform one check per-parameter per-method) because far more unwieldy than just improving the rspec test coverage.

While manually type checking may be a bad solution syntactically, it’s on the right track conceptually.  What we really want is some sort of assertion that the parameters are of a certain type, but that won’t overly bloat our existing code.  We need some sort of framework that will “weave in” (think AOP) its type assertions without getting in the way our our algorithms.

Well it turns out that someone’s already done thisEivind Eklund kindly pointed me to his type checking framework in a comment on a previous post.  The basic idea is to perform the type checking assertions, but to factor the work out into an API encapsulated by an intuitive DSL.  So rather than performing all those nasty unless statements as above, we could simply do something like this:

typesig String, String
def create_name(fname, lname)
  fname + " " + lname
end

It’s really as simple as that.  Passing the type values to the typesig method just prior to a method declaration give the cue to the Types framework to perform some extra work on each call that method.  Now we have the runtime assurances that the following code will not work (with a very intuitive error message):

create_name("Daniel", 123)

Will produce the folling output:

ArgumentError: Arg 1 is of invalid type (expected String, got Fixnum)

But the fun doesn’t stop there.  Ruby encourages the “duck typing” pattern, where algorithm developers concern themselves not with what the value is but rather what it does.  This means that the type checking really should be done based on what methods are available, not just the raw type.  It turns out that the Types framework supports this as well:

class Company
  def name
    "Blue Danube"
  end
end
 
class Person
  def name
    "Daniel Spiewak"
  end
end
 
typesig String, Type::Respond(:name)
def output(msg, value)
  puts msg + " " + value.name
end
 
c = Company.new
p = Person.new
 
output("The company name is: ", c)
output("The person is: ", p)
 
output("The programmer is: ", "a genius")    # error

Types can check not only the kind of the object but also to what methods it responds.  This is crucial to enabling its adoption into modern Ruby code bases, many of which rely heavily on this “duck typing” technique.

You can think of the Types framework just like another layer in your testing architecture.  Obviously it’s not performing any sort of static type checking (since Ruby has no compile phase).  All it’s doing is providing that extra certainty that you’re never passing something weird from somewhere in your code, something that would break your algorithm.

So what’s the catch?  Well, obviously you need to have the Types framework installed.  It’s not as easy as just typing gem install types either, since the framework actually predates Ruby Gems.  You’ll have to download the framework and then copy around the types.rb file yourself.  But this is just deployment semantics.  The more interesting issue are the limitations of the code itself.

As far as I can tell, the only restriction on the framework is that it must be used within a proper class, not in the root scope.  This means that all of my examples above would have to be enclosed in a class, rather than just copy-pasted into a .rb file and run in place.  But other than this one limitation, the framework is incredibly flexible.  I really haven’t shown you the seriously interesting stuff in terms of the API (there are more examples at the top of the types.rb file).  In many ways, Types is actually more powerful than any static type checking mechanism could be (yes, I’m even including Scala in that evaluation).

I haven’t had a chance to use Types on any serious project myself, but I can see tremendous potential, particularly for companies with large-scale Ruby/Rails deployments or even smaller projects looking for just a bit tighter code assurance.  As far as I’m concerned, there shouldn’t be a non-trivial Ruby project attempted without this lovely library, Rails or no Rails.