Skip to content

How Do You Apply Polyglotism?

18
Aug
2008

For the past two years or so, there has been an increasing meme across the developer blogosphere encouraging the application of the polyglot methodology.  For those of you who have been living under a rock, the idea behind polyglot programming is that each section of a given project should use whatever language happens to be most applicable to the problem in question.  This makes for a great topic for arm-chair bloggers, leading to endless pontification and flame-wars on forum after forum, but it seems to be a bit more difficult to apply in the real world.

The fact is that very few companies are open to the idea of diversity in language selection.  Just look at Google, one of the most open-minded and developer-friendly companies around.  They employ some of the smartest people I know, programmers who have actually invented languages with wide-scale adoption.  However, this same company mandates the use of a very small set of languages including Python, Java, C++ and JavaScript.  If a company like Google can’t even bring itself to dabble in language diversity, what hope do we have for the Apples of the world?

A few months ago, I received an internal email from the startup company where I work.  This email was putting forth a new policy which would restrict all future developments to one of two languages: PHP or Java.  In fact, this policy went on to push for the eventual rewrite of all legacy projects which had been written in other languages including Objective-C, Ruby, Python and a fair number of shell scripts.  I was utterly flabbergasted (to say the least).  A few swift emails later, we were able to come to a more moderate position, but the prevailing attitude remains extremely focused on minimizing the choice of languages.

To my knowledge, this sort of policy is fairly common in the industry.  Companies (particularly those employing consultants) seem to prefer to keep the technologies employed to a minimum, focusing on the least-common denominator so as to reduce the requirements for incoming developer skill sets.  This is rather distressing to me, because I get a great deal of pleasure out of solving problems differently using alternative languages.  For example, I would have loved to build the clustering system at my company using the highly-scalable actor model with Scala, but the idea was shot down right out of the gate because it involved a non-mainstream language.  To be fair to my colleagues, the overall design involved was given more serious consideration, but it was always within the confines of Java, rather than the original actor-driven concept.

There is actually another aspect to this question: assuming you are allowed to use a variety of languages to "get the job done", how do you apply them?  Ola Bini has talked about the various layers of a system, but this is harder to see in practice than it would seem.  How do you define where to "draw the line" between using Java and Scala, or even the more dramatic differences between Java and JRuby or Groovy?  Of course, we can base our decision strictly on lines of code, but in that case, Scala would trump Java every time.  For that matter, Ruby would probably beat out the two of them, and I’m certainly not writing my next large-scale enterprise app exclusively in a dynamic language.

I realize this is somewhat of a cop-out post, just asking a question and never arriving at a satisfactory conclusion, but I would really like to know how other developers approach this issue.  What criteria do you weigh in making the decision to go with a particular language?  What sorts of languages work well for which tasks?  And above all, how do you convince your boss that this is the right way to go?  The floor is open, please enlighten me!  :-)

Implementing Groovy’s Elvis Operator in Scala

7
Jul
2008

Groovy has an interesting shortening of the ternary operator that it rather fancifully titles “the Elvis Operator“.  This operator is hardly unique to Groovy - C# has had it since 2.0 in the form of the Null Coalescing Operator - but that doesn’t mean that it is not a language feature worth learning from.  Surprisingly (for a C-derivative language), Scala entirely lacks any sort of ternary operator.  However, the language syntax is more than flexible enough to implement something similar without ever having to dip into the compiler.

But before we go there, it is worth examining what this operator does and how it works in languages which already have it.  In essence, it is just a bit of syntax sugar, allowing you to easily check if a value is null and provide a value in the case that it is.  For example:

firstName = "Daniel"
lastName = null
 
println firstName ?: "Chris"
println lastName ?: "Spiewak"

This profound snippet really demonstrates about all there is to the Elvis operator.  The result is as follows:

Daniel
Spiewak

Not terribly exciting.  Essentially, what we have is a binary operator which evaluates the left expression and tests to see if it is null.  In the case of firstName, this is false, so the right expression (in this case, "Chris") is never evaluated.  However, lastName is null, which means that we have to evaluate the right expression and return its value, rather than null.  It’s all just so much syntax sugar that can be expressed equivalently in any language with a conditional operator (in this case, Java):

String firstName = "Daniel";
String lastName = null;
 
System.out.println((firstName == null) ? "Chris" : firstName);
System.out.println((lastName == null) ? "Spiewak" : lastName);

A bit verbose, don’t you think?  Of course, this isn’t really a fair comparison, since Groovy is a far more concise language than Java.  Let’s see how the above would render in a real man’s language like Scala:

val firstName = "Daniel"
val lastName: String = null
 
println(if (firstName == null) "Chris" else firstName)
println(if (lastName == null) "Spiewak" else lastName)

Better, but still a little clumsy.  The truth of the matter is that we’re forced to do this sort of null checking all the time (well, maybe a little less in Scala) and the constructs for doing so are woefully inadequate.  Thus, the motivation for the Elvis operator.

Getting Things Started

Like all good programmers should, we’re going to start with a runnable specification for every behavior desired from the operator.  I’ve written before about the excellent Specs framework, so that’s what we’ll use:

"elvis operator" should {
  "use predicate when not null" in {
    "success" ?: "failure" mustEqual "success"
  }
 
  "use alternative when null" in {
    val test: String = null
    test ?: "success" mustEqual "success"
  }
 
  "type correctly" in {		// if it compiles, then we're fine
    val str: String = "success" ?: "failure"
    val i: Int = 123 ?: 321
 
    str mustEqual "success"
    i mustEqual 123
  }
 
  "infer join of types" in {    // must compile
    val res: CharSequence = "success" ?: new java.lang.StringBuilder("failure")	
    res mustEqual "success"
  }
 
  "only eval alternative when null" in {
    var a = "success"
    def alt = {
      a = "failure"
      a
    }
 
    "non-null" ?: alt
    a mustEqual "success"
  }
}

Fairly straightforward stuff.  I imagine that this specification for the operator is a bit more involved than the one used in the Groovy compiler, due to the fact that Scala is a statically typed language and thus requires a bit more effort to ensure that everything is working properly.  From this specification, we can infer three core properties of the operator:

  1. Basic behavior when null/not-null
  2. The result type should be the unification of the static types of the left and right operands
  3. The right operand should only be evaluated when the left is null

The first property is fairly easy to understand; it is intuitive in the definition of the operator.  All this means is that the value of the operator expression is dependent on the value of the left operand.  When not null, the expression value is equal to the value of the left operand.  If the left operand is null, then the expression is valued equivalent to the right operand.  This is just formally expressing what we spent the first section of the article describing.

Ignoring the second and third properties, we can actually attempt an implementation.  For the moment, we will just assume that the left and right operands must be of exactly the same type, otherwise the operator will be inapplicable.  So, without further ado, implementation enters stage right:

implicit def elvisOperator[T](alt: T) = new {
  def ?:(pred: T) = if (pred == null) alt else pred
}

Notice the use of the anonymous inner class to carry the actual operator?  This is a fairly common trick in Scala to avoid the definition of a full-blown class just for the sake of adding a method to an existing type.  To break down what’s going on here, we have defined an implicit type conversion from any type T to our anonymous inner class.  This conversion will be inserted by the compiler whenever we invoke the ?: operator on an expression.

Sharp-eyed developers will notice something a little odd about the way this code is structured.  In fact, if you look closely, it seems that we evaluate the right operand and use its value if non-null (otherwise left), which is exactly the opposite of what our specification defines.  For a normal operator, this observation would be quite correct.  However, Scala defines the associatively of operators based on the trailing symbol.  In this case, because our trailing symbol is a colon (:), the operator itself will be right-associative.  Thus, the following expression:

check ?: alternate

…is transformed by the compiler into the following:

alternate.?:(check)

This is how right-associative operators function, by performing method calls on the right operand.  Thus, we need to define our implicit conversion such that the ?: method will be defined for the right operand, taking the left operand as a parameter.  We’ll see a bit later on how this can cause trouble, but for now, let’s continue with the specification.

A Little Type Theory

The second property is a little tougher.  Type unification is one of those pesky issues that plague statically typed languages and are simply irrelevant in those with dynamic type systems.  The issue arises from the following question: what happens if the left and right operands are of different types?  In Groovy, this is a non-issue because the value of the expression is simply dynamically typed according to the runtime type of the operand which is chosen.  However, Scala requires static type information, which means that we need to ensure that the static type of the expression is sound for either the left or the right operand (since Scala does not have non-nullable types).  The best way to do this is to compute the least upper bound of the two types, an operation which is also known as minimal unification.  Consider the following hierarchy:

image

Now imagine that the left operand is of static type Apple, while the right operand is of static type Pear.  We need to find a static type which is safe for both of these.  Intuitively, this type would be Fruit, since it is a common superclass of both Apple and Pear.  Regardless of which expression is chosen at runtime, we will be able to polymorphically treat the value as a value of type Fruit.  The intuition in this case is quite correct.  In fact, it actually has a rigorous mathematical proof…which I won’t go into.  (queue sighs of relief)

One additional example should serve to really drive the point home.  Consider the scenario where the left operand has type Vegitable and the right operand has type Apple.  This is a bit trickier, but it recursively boils down to the same case.  The only common superclass between these two types is Object, due to the fact that the hierarchies are disjoint.

This operation is fairly easy to perform by hand given the full type hierarchy.  For that matter, it isn’t very difficult to write an algorithm which can efficiently compute the minimal unification of two types.  Unfortunately, we don’t have that luxury here.  We cannot simply write code which is executed at compile time to determine type information, we must make use of the existing Scala type system in order to “trick” the compiler into inferring things for us.  We do this by making use of lower-bounds on type parameters.  With this in mind, we can (finally) make a first attempt at a well-typed implementation of the operator:

implicit def elvisOperator[T](alt: T) = new {
  def ?:[A >: T](pred: A) = if (pred == null) alt else pred
}

The only thing we have changed is the type of the pred variable from T to a new type parameter, A.  This new type parameter is defined by the lower-bound T.  Translated into English, the type expression reads something like the following:

Accept parameter pred of some type A which is a super-type of T.

The real magic of the expression is that pred need not be exactly of type A; it could also be a subtype.  Thus, A is some generic supertype which encompasses both the types of the left and the right operands.

Fancy Parameter Types

This allows us to move onto the third property: only evaluate the right operand if the left is null.  This is the normal behavior for conditional expressions.  After all, you wouldn’t want your code performing an expensive operation (such as grabbing data from a server somewhere) just to throw away the result because a different branch of the conditional was chosen.  Actually, the bigger issue with ignoring this property (as we have done so far) is that the right operand may actually have side-effects.  Scala isn’t a pure functional language, so evaluating expressions that we don’t need (or worse, that the developer isn’t expecting) can have extremely dire consequences.

Unfortunately, at first glance, there doesn’t really seem to be a way to avoid this evaluation.  After all, we need to invoke the ?: method on something.  We could try using a left-associative operator instead (such as C#’s ?? operator), but even that wouldn’t fully solve the problem as we would still need to pass the right operand as a parameter.  In short, it seems like we’re stuck.

The good news is that Scala’s designers chose to adopt an age-old construct known as “pass-by-name parameters”.  This technique dates all the way back to ALGOL (possibly even further).  In fact, it’s so old and obscure that I’ve actually had professors tell me that it has been completely abandoned in favor of the more conventional pass-by-value (what Java, C#, Scala and most languages use) and pass-by-reference (which is available in C++).  Pass-by-name parameters are very much like normal parameters in that they are used to copy values from a calling scope into the method in question.  However, unlike normal parameters, they are evaluated on an as-needed basis.  This means that a pass-by-name parameter will only be evaluated if its value is required within the method called.  For example:

def doSomething(a: =>Int) = 1 + 2
def createInteger() = {
  println("Made integer")
  42
}
 
println("In the beginning...")
doSomething(createInteger())
println("...at the end")

Counter to our first intuition, this will print the following:

In the beginning...
...at the end

In other words, the createInteger method is never called!  This is because the value of the pass-by-name parameter in the doSomething method is never accessed, meaning that the value of the expression is not needed.  The a parameter is denoted pass-by-name by the => notation (just in case you were wondering).  We can apply this to our implementation by changing the parameter of the implicit conversion from pass-by-value to pass-by-name:

implicit def elvisOperator[T](alt: =>T) = new {
  def ?:[A >: T](pred: A) = if (pred == null) alt else pred
}

The language-level implementation of the if/else conditional expression will ensure that the alt parameter is only accessed iff the value of pred is null, meaning we have finally satisfied all three properties.  We can check this by compiling and running our specification from earlier:

Specification "TernarySpecs"
  elvis operator should
  + use predicate when not null
  + use alternative when null
  + type correctly
  + infer join of types
  + only eval alternative when null

Total for specification "TernarySpecs":
Finished in 0 second, 78 ms
5 examples, 6 assertions, 0 failure, 0 error

Conclusion

We now have a working implementation of Groovy’s Elvis operator within Scala and we never had to move beyond simple API design.  Truly, one of Scala’s greatest strengths is its ability to expression extremely complex constructs within the confines of the language.  This makes it uniquely well-suited to hosting internal domain-specific languages.  Using techniques similar to the ones I have outlined in this article, it is possible to define operations which would require compiler-level implementation in most languages.

The full source (such as it is) for the Elvis operator in Scala is available for download, along with a bonus implementation of C#’s ?? syntax (just in case you prefer it).  The implementation differs slightly due to the fact that ?? is a left-associative operator, but the single-use (unchained) semantics are identical.  Enjoy!

Pipe Dream: Static Analysis for Ruby

30
Jun
2008

Yes, yes I know: Ruby is a dynamic language.  The word “static” is literally opposed to everything the language stands for.  With that said, I think that even Ruby development environments would benefit from some simple static analysis, just enough to catch the really idiotic errors.

Here’s the crux of the problem: people don’t test well.  Even with nice, behavior-driven development as facilitated by frameworks like RSpec, very few developers sufficiently test their code.  This isn’t just a problem with dynamic languages either, no one is safe from the test disability.  In some ways, it’s a product of laziness, but I think in most cases, good developers just don’t want to work on mundane problems.  It’s boring having to write unit test after unit test, checking and re-checking the same snippet of code with different input.

In some sense, it is this problem that compilers and static type systems try to avert, at least partially.  The very purpose of a static type system is to be able to prove certain things about your code simply by analysis.  By enabling the compiler to say things using the type system, the language is providing a safety net which filters out ninety percent of the annoying “no-brainer” mistakes.  A simple example would be invoking a method with the wrong parameters; or worse yet, misspelling the name of the method or type altogether.

The problem is that there are some problems which are more simply expressed in ways which are not provably sound.  In static languages, we get around this by casting, but such techniques are ugly and obviously contrived.  It is this problem which has given rise to the kingdom of dynamic languages; it is for this reason that most scripting languages have dynamic type systems: simple expression of algorithm without worrying about provability.  In fact, there are so many problems which do not fit nicely within most type systems that many developers have chosen to eschew static languages altogether, claiming that static typing just gets in the way.

Unfortunately, by abandoning static types, these languages lose that typo safety net.  It’s too easy to make a trivial mistake in a dynamic language, buried somewhere deep in the bowls of your application.  This mistake could easily be averted by a compiler with validating semantic analysis, but in a dynamic language, such a mistake could go unnoticed, conceivably even making it into production.  For this reason, most dynamic language proponents are also strong advocates of solid, comprehensive testing.  They have to be, for without such testing, one should never trust dynamic code in a production system (or any code, for that matter, but especially the unchecked dynamic variety).

Most large, production systems written in languages like Ruby or Groovy have large test suites which sometimes take hours to run.  These suites are extremely fine-grained, optimally checking every line of code with every possible kind of input, so as to be sure that mistakes are caught.  This is where the flexibility of dynamic typing really comes back to haunt you: extra testing is required to ensure that silly mistakes don’t slip through.  The irony is that a lot of developers using dynamic languages do so to get away from the “nuisance” of compilation, when all they have done is trade one inconvenience for another (testing).

Given this situation, it’s not unreasonable to conclude that what dynamic languages really need is a tool which can look through code and find all of those brain-dead mistakes.  Such a tool could be run along with the normal test suite, finding and reporting errors in much the same way.  It wouldn’t really have to be a compiler, so the tool wouldn’t slow down the development process, it would just be an effective layer of automated white-box testing.

But how could such a thing be accomplished in a language like Ruby?  After all, it is a truly dynamic language.  Methods don’t even exist until runtime, and sometimes only if certain code paths are run.  Types are completely undeclared, and every object can potentially respond to any method.  The answer is to perform extremely permissive inference.

It was actually a recent post by James Ervin on the nomenclature of type systems which got me thinking along these lines.  It should be possible by static analysis to infer the structural type of any value based on its usage.  Consider:

def do_something(obj)
  if obj.to_i == 0
    obj[:test]
  else
    other = obj.find :name => 'Daniel'
    other.to_s
  end
end

Just by examining this code, we can say certain things about the types involved.  For instance, we know that obj must respond to the following methods:

  • to_i
  • [Symbol]
  • find(Hash)

In turn, we know that the find(Hash) method must return a value which defines to_s.  Of course, this last bit of information isn’t very useful, because every object defines that method, but it’s still worth the inference.  The really useful inference which comes out of to_s is the knowledge that this method sometimes returns a value of type String (making the assumption that to_s hasn’t been redefined to return a different type, which isn’t exactly a safe assumption).  At other times, do_something will return whatever value comes from the square bracket operator ([]) on obj.  This bit of information we must remember in the analysis.  We can’t just assume that this method will return a String all the time, even if to_s does because method return types need not be homogeneous in dynamic languages.

Now, at this point we have effectively built up a structural type which is accepted by do_something.  Literally, we have formalized in the analysis what our intuition has already told us about the method.  There are some gaps, but that is to be expected.  The key to this analysis is not attempting to be comprehensive.  Dynamic languages cannot be analyzed as if they were static, one must expect to have certain limitations.  In such situations where the analysis is insufficient, it must assume that the code is valid, otherwise there will be thousands of false positives in the error checking.

So what is it all good for?  Well, imagine that somewhere else in our application, we have the following bit of code:

do_something 42

This is something we know will fail, because we have a simple value (42) which has a nominal type we can easily infer.  A little bit of checking on this type reveals the fact that it does not define square brackets, nor does it define a find(Hash) method.  This finding could be reported as an error by the analysis engine.

Granted, we still have to account for the fact that Ruby has things like method_missing and open classes, but all of this can fall into the fuzzy area of the analysis.  In situations where it might be alright to pass an object which does not satisfy a certain aspect of the structural type, the analysis must let it pass without question.

You can imagine how this analysis could traverse the entire source tree, making the strictest inferences it can and allowing for dynamic fuzziness where applicable.  Since the full sources of every Ruby function, class and module are available at runtime, analysis could be performed without any undue concern regarding obfuscation or parsing of binaries.  Conceivably, most trivial errors could be caught without any tests being written, taking some of the burden off of the developer.  There is a slight concern that developers would build up a false sense of security regarding their testing (or lack thereof), but I think we just have to trust that won’t happen, or won’t last long if it does.

Most advanced Ruby toolsets already have an analysis somewhat similar to the one I outlined.  NetBeans Ruby for example has some fairly advanced nominal type inference to allow things like semantic highlighting and content assist.  But as far as I know, this type inference is only nominal, and fairly local at that.  The structural type inference that I am proposing could conceivably provide far better assurances and capabilities than mere nominal inference, especially if enhanced through successive iteration and a more “global” approach (similar to Hindley/Milner in static languages).

One thing is certain, it isn’t working to just rely on developers being conscientious with their testing.  With the rapid rise in production systems running on dynamic languages, it is in all of our best interests to try to find a way to make these systems more stable and reliable.  The best way to do this is to start with code assurance and try to make it a little less painful to catch mistakes before deployment.

The Need for a Common Compiler Framework

23
Jun
2008

In recent years, we have seen a dramatic rise in the number of languages used in mainstream projects.  In particular, languages which run on the JVM or CLR have become quite popular (probably because sane people hate dealing with x86 assembly).  Naturally, such languages prefer to interoperate with other languages built on these core platforms, particularly Java and C# (respectively).  Collectively, years of effort have been put into devising and implementing better ways of working with libraries written in these “parent languages”.  The problem is that such efforts are crippled by one fundamental limitation: circular dependencies.

Let’s take Scala as an example.  Of all of the JVM languages, this one probably has the potential for the tightest integration with Java.  Even Groovy, which is renowned for its integration, still falls short in many key areas.  (generics, anyone?)  With Scala, every class is a Java class, every method is a Java method, and there is no API which cannot be accessed from Java as natively as any other.  For example, I can write a simple linked list implementation in Scala and then use it in Java without any fuss whatsoever (warning: untested sample):

class LinkedList[T] {
  private var root: Node = _
 
  def add(data: T) = {
    val insert = Node(data, null)
 
    if (root == null) {
      root = insert
    } else {
      root.next = insert
    }
 
    this
  }
 
  def get(index: Int) = {
    def walk(node: Node, current: Int): T = {
      if (node == null) {
        throw new IndexOutOfBoundsException(index.toString)
      }
 
      if (current < index) {
        walk(node.next, current + 1)
      } else {
        node.data
      }
    }
 
    if (index < 0) {
      throw new IndexOutOfBoundsException(index.toString)
    }
 
    walk(root, 0)
  }
 
  def size = {
    def walk(node: Node): Int = if (node == null) 0 else 1 + walk(node.next)
 
    walk(root)
  }
 
  private case class Node(data: T, var next: Node)
}

Once this class is compiled, we can use it in our Java code just as if it were written within the language itself:

public class Driver {
    public static void main(String[] args) {
        LinkedList<String> list = new LinkedList<String>();
 
        for (String arg : args) {
            list.add(arg);
        }
 
        System.out.println("List has size: " + list.size());
 
        for (int i = 0; i < list.size(); i++) {
            System.out.println(list.get(i).trim());
        }
    }
}

Impressively seamless interoperability!  We actually could have gotten really fancy and thrown in some operator overloading.  Obviously, Java wouldn’t have been able to use the operators themselves, but it still would have been able to call them just like normal Java instance methods.  Using Scala in this way, we can get all the advantages of its concise syntax and slick design without really abandoning our Java code base.

The problem comes in when we try to satisfy more complex cases.  Groovy proponents often trot out the example of a Java class inherited by a Groovy class which is in turn inherited by another Java class.  In Scala, that would be doing something like this:

public class Shape {
    public abstract void draw(Canvas c);
}
class Rectangle(val width: Int, val height: Int) extends Shape {
  override def draw(c: Canvas) {
    // ...
  }
}
public class Square extends Rectangle {
    public Square(int size) {
        super(size, size);
    }
}

Unfortunately, this isn’t exactly possible in Scala.  Well, I take that back.  We can cheat a bit and first compile Shape using javac, then compile Rectangle using scalac and finally Square using javac, but that would be quite nasty indeed.  What’s worse is such a technique would completely fall over if the Canvas class were to have a dependency on Rectangle, something which isn’t too hard to imagine.  In short, Scala is bound by the limitations of a separate compiler, as are most languages on the JVM.

Groovy solves this problem by building their own Java compiler into groovyc, thus allowing the compilation of both Java and Groovy sources within the same process.  This solves the problem of circular references because neither set of sources is completely compiled before the other.  It’s a nice solution, and one which Scala will be adopting in an upcoming release of its compiler.  However, it doesn’t really solve everything.

Consider a more complex scenario.  Imagine we have Java class Shape, which is extended by Scala class Rectangle and Groovy class Circle.  Imagine also that class Canvas has a dependency on both Rectangle and Circle, perhaps for some special graphics optimizations.  Suddenly we have a three-way circular dependency and no way of resolving it without a compiler which can handle all three languages: Java, Groovy and Scala.  This is starting to become a bit more interesting.

Of course, we can solve this problem in the same way we solved the Groovy-Java dependence problem: just add support to the compiler!  Unfortunately, it may have been trivial to implement a Java compiler as part of groovyc, but Scala is a much more difficult language from a compiler’s point of view.  But even supposing that we do create an integrated Scala compiler, we still haven’t solved the problem.  It’s not difficult to imagine throwing another language into the mix; Clojure, for example.  Do we keep going, tacking languages onto our once-Groovy compiler until we support everything usable on the JVM?  It should be obvious why this is a bad plan.

A more viable solution would be to create a common compiler framework, one which would be used as the basis for all JVM languages.  This framework would have common abstractions for things like name resolution and type checking.  Instead of creating an entire compiler from scratch, every language would simply extend this core framework and implement their own language as some sort of module.  In this way, it would be easy to build up a custom set of modules which solve the needs of your project.  Since the compilers are modular and based on the same core framework, they would be able to handle simultaneous compilation of all JVM languages involved, effectively solving the circular dependency problem in a generalized fashion.

The framework could even make things easier on would-be compiler implementors by handling common operations like bytecode emission.  Fundamentally, all of these tightly-integrated languages are just different front-ends to a common backend: the JVM.  I haven’t looked at the sources, but I would imagine that there is a lot of work which had to be done in each compiler to solve problems which were already handled in another.

Of course, all this is purely speculative.  Everyone builds their compiler in a slightly different way (slightly => radically in the case of languages like Scala) and I wouldn’t imagine that it would be easy to build this sort of common compiler backend.  However, the technology is in place.  We already have nice module systems like OSGi, and we’re certainly no strangers to the work involved in building up a proper CLASSPATH for a given project.  Why should this be any different?

It’s not without precedent either.  GCC defines a common backend for a number of compilers, such as G++, GCJ and even an Objective-C compiler.  Granted, it’s neither as high-level nor as modular as we would need to solve circular dependencies, but it’s something to go on.

It will be interesting to see where the JVM language sphere is headed next.  The rapid emergence of so many new languages is leading to problems which will have to be addressed before the polyglot methodology will be truly accepted by the industry.  Some of the smartest people in the development community are working toward solutions; and whether they take my idea of a modular framework or not, somewhere along the line the problem of simultaneous compilation must be solved.

The Brilliance of BDD

9
Jun
2008

As I have previously written, I have recently been spending some time experimenting with various aspects of Scala, including some of the frameworks which have become available.  One of the frameworks I have had the privilege of using is the somewhat unassumingly-titled Specs, and implementation of the behavior-driven development methodology in Scala.

Specs takes full advantage of Scala’s flexible syntax, offering a very natural format for structuring tests.  For example, we could write a simple specification for a hypothetical add method in the following way:

object AddSpec extends Specification {
  "add method" should {
    "handle simple positives" in {
      add(1, 2) mustEqual 3
    }
 
    "handle simple negatives" in {
      add(-1, -2) mustEqual -3
    }
 
    "handle mixed signs" in {
      add(1, -2) mustEqual -1
      add(-1, 2) mustEqual 1
    }
  }
}

We could go on, of course, but you get the picture.  This code will lead to the execution of four separate assertions in three tests (to put things into JUnit terminology).  Fundamentally, this isn’t too much different than a standard series of unit tests, just with a slightly nicer syntax.

Specs defines a domain-specific language for structuring test assertions in a simple and intuitive way.  However, this is hardly the only framework for BDD.  Perhaps the most well-known such framework is RSpec, which answers a similar use-case in the Ruby programming language.  Our previous specification could be rewritten using RSpec as follows:

describe AddLib do
  it 'should handle simple positives' do
    add(1, 2).should == 3
  end
 
  it 'should handle simple negatives' do
    add(-1, -2).should == -3
  end
 
  it 'should handle mixed signs' do
    add(1, -2).should == -1
    add(-1, 2).should == 1
  end
end

The end-result is basically the same: the add method will be tested against the given assertions (all four of them) and the results printed in some sort of report form.  In this area, RSpec is significantly more mature than Specs, generating very slick HTML reports and nicely formatted console output.  This isn’t really a fundamental weakness of the Specs framework however, just indicative of the fact that RSpec has been around for a lot longer.

These two frameworks are interesting of course, but they are merely implementations of a much larger concept: behavior-driven development.  I’ve never been much of a fan of unit testing.  It’s always seemed to be incredibly dull and a very nearly fruitless waste of effort.  As much as I hate it though, I have to bow to the benefits of a self-contained test suite; and so I press on, cursing JUnit every step of the way.

BDD provides a nice alternative to unit testing.  At its core, it is not much different in that test groupings and primitive assertions are used to check all aspects of a test unit against predefined data.  However, there is something about the “flow” of a behavioral spec that is considerably easier to deal with.  For some reason, it is far less painful to devise a comprehensive test suite using BDD principles than conventional unit testing.  It seems a little far-fetched, but BDD actually makes it easier to write (and more importantly, formulate) exactly the same tests.

It’s an odd phenomenon, one which can only be caused by the storyboard flow of the code itself.  It is very natural to think of distinct requirements for a test unit when each of these requirements are being labeled and entered in a logical sequence.  Moreover, the syntax of both Specs and RSpec is such that there is very little boiler-plate required to setup an additional test.  Compare the previous BDD specs with the following JUnit4 example:

public class MathTest {
    @Test
    public void testSimplePositives() {
        assertEquals(3, add(1, 2));
    }
 
    @Test
    public void testSimpleNegatives() {
        assertEquals(-3, add(-1, -2);
    }
 
    @Test
    public void testMixedSigns() {
        assertEquals(-1, add(1, -2));
        assertEquals(1, add(-1, 2));
    }
}

JUnit just requires that much more syntax.  It breaks up the logical flow of the tests and (more importantly) the developer train of thought.  What can be worse is this syntax bloat makes it very tempting to just group all of the assertions into a single test - to save typing if nothing else.  This is problematic because one assertion may shadow all the others in the case of a failure, preventing them from ever being executed.  This can make certain problems much more difficult to isolate.

Logical flow is extremely important to test structure.  BDD frameworks provide a very nice syntax for painlessly defining comprehensive test suites.  The really wonderful thing about all of this is that BDD is available on the JVM, right now.  There’s nothing stopping you from writing your code in Java as you normally would, then creating your test suite in Scala using Specs rather than JUnit.  Alternatively, you could use RSpec on top of JRuby, or Gspec with Groovy.  All of these are seamless replacements for a test framework like JUnit, and requiring of far less syntactic overhead.

The growing move toward polyglot programming encourages the use of a separate language when it is best suited to a particular task.  In this case, several languages are available which offer far more powerful test frameworks than those which can be found in Java.  Why not take advantage of them?