Skip to content

The Plague of Polyglotism

28
Apr
2008

For those of you who don’t know, polyglotism is not some weird religion but actually a growing trend in the programming industry.  In essence, it is the concept that one should not be confined to a single language for a given system or even a specific application.  With polyglot programming, a single project could use dozens of different languages, each for a different task to which they are uniquely well-suited.

As a basic example, we could write a Wicket component which makes use of Ruby’s RedCloth library for working with Textile.  Because of Scala’s flexible syntax, we can use it to perform the interop between Wicket and Ruby using an internal DSL:

class TextileLabel(id:String, model:IModel) extends WebComponent(id, model) with JRuby {
  require("textile_utils")
 
  override def onComponentTagBody(stream:MarkupStream, openTag:ComponentTag) {
    replaceComponentTagBody(markupStream, openTag, 
        'textilize(model.getObject().toString()))
  }
}
# textile_utils.rb
require 'redcloth'
 
def textilize(text)
  doc = RedCloth.new text
  doc.to_html
end

Warning: Untested code

We’re actually using three languages here, even though we only have source for two of them.  The Wicket library itself is written in Java, our component is written in Scala and we work with the RedCloth library in Ruby.    This is hardly the best example of polyglotism, but it suffices for a simple illustration.  The general idea is that you would apply this concept to a more serious project and perform more significant tasks in each of the various languages.

The Bad News

This is all well and good, but there’s a glaring problem with this philosophy of design: not everyone knows every language.  You may be a language aficionado, picking up everything from Scheme to Objective-C, but it’s only a very small percentage of developers who share that passion.  Many projects are composed of developers without extensive knowledge of diverse languages.  In fact, even with a really good sampling of talent, it’s doubtful you’ll have more than one or two people fluent in more than two languages.  And unfortunately, there’s this pesky concern we all love called “maintainability”.

Let’s pretend that Slava Pestov comes into your project as a consultant and decides that he’s going to apply the polyglot programming philosophy.  He writes a good portion of your application in Java, Lisp and some language called Factor, pockets his consultant’s fee and then moves on.  Now the code he wrote may have been phenomenally well-designed and really perfect for the task, but you’re going to have a very hard time finding a developer who can maintain it.  Let’s say that six months down the road, you decide that your widget really needs a red push button, rather than a green radio selector.  Either you need a developer who knows Factor (hint: there aren’t very many), or you need a developer who’s willing to learn it.  The thing is that most developers with the knowledge and motivation to learn a language have either already done so, or are familiar enough with the base concepts as to be capable of jumping right in.  These developers fall into that limited group of people fluent in many different languages, and as such are a rare find.

Now I’m not picking on Factor in any way, it’s a very interesting language, but it still isn’t very widespread in terms of developer expertise.  That’s really what this all comes down to: developer expertise.  Every time you make a language choice, you limit the pool of developers who are even capable of groking your code.  If I decide to build an application in Java, even assuming that’s the only language I use, I have still eliminated maybe 20% of all developers from ever touching the project.  If I make the decision to use Ruby for some parts of the application, while still using Java for the others, I’ve now factored that 80% down to maybe 35% (developers who know Java and Ruby).  Once I throw in Scala, that cuts it down still further (maybe at 15% now).  If I add a fourth language - for example, Haskell - I’ve now narrowed the field so far, that it’s doubtful I’ll find anyone capable of handling all aspects within a reasonable price range.  It’s the same problem as with framework choice, except that frameworks are much easier to learn than languages.

The polyglot ideal was really devised by a bunch of nerdy folks like me.  I love languages and would like nothing better than to get paid to learn half a dozen new ones (assuming I’m coming into a project with a strange combination I haven’t seen before).  However, as I understand the industry, that’s not a common sentiment.  So a very loud minority of developers (/me waves) has managed to forge a very hot methodology, one which excludes almost all of the hard-working developer community.  If I didn’t know better, I would be tempted to say that it was a self-serving industry ploy to foster exclusivity in the job market.

I want to work on multi-language projects as much as anyone, but I really don’t think it’s the best thing right now.  I’m working on a project now which has an aspect for which Scala would be absolutely perfect, but since I’m the only developer on hand who is remotely familiar with the language, I’m probably going to end up recommending against its adoption.  Consider carefully the ramifications of trying new languages on your own projects, you may not be doing future developers any favors by going down that path.

Algorithm Proof Inference in Scala

1
Apr
2008

Anyone who’s written any sort of program or framework knows first-hand the traumas of testing.  The story always goes something like this.  First, you spend six months writing two hundred thousand lines of code which elegantly expresses your intent.  Next, you spend six years writing two hundred million lines of code which tests that your program actually does what you think it does.  Of course, even then you’re not entirely sure you’ve caught everything, so you throw in a few more years writing code which tests your tests.  Needless to say, this is a vicious cycle which usually ends badly for all involved (especially the folks in marketing who promised the client that you would have the app done in six hours).  The solution to this “test overload cycle” is remarkably simple and well-known, but certain problems have constrained its penetration into the enterprise world (that is, until now).  The solution is program proving.

A More Civilized Age

Program proofs are basically a (usually large) set of mathematical expressions which rigorously prove that all accepted program outcomes are correct.  A program prover takes these expressions and performs the appropriate static analysis on your code, spitting out either a “yes” or a “no”.  It’s far more effective than boring old unit testing since you can be absolutely certain that all the bugs have been caught.  More than that, it doesn’t require you to write reams of test code.  All that is required is the series of expressions defining program intent:

\Gamma_m \to \{\Gamma_0 = \textit{input} | m \sub \Gamma_0\} \cup \Sigma^\star

\epsilon = e^{\epsilon \pm \delta}

f \colon \{ x \in [\epsilon, \infty) | x \to \Gamma_x \}

\rho_\delta = f(\Gamma_\delta \bullet \Gamma_\alpha) \Rightarrow \alpha \in \Sigma^\star

\mbox{prove}(\Omega) = \Omega \in \Gamma_\Beta \Rightarrow (\Omega \times \epsilon) \vee \rho_\omega

There, isn’t that elegant?  Much better than some nasty JUnit test suite.  In a happier world, all tests would be replaced by this simple, easy-to-read representation of intent.  Unfortunately, program provers have yet to overcome the one significant stumbling block that has barred them from general adoption: the limitations of the ASCII character set.

Sadly, the designers of ASCII never anticipated the widespread need to express a Greek delta, or to properly render an implication.  Of course, we could always fall back on Unicode, but support for that character set is somewhat lacking, even in modern programming languages.  And so program provers languish in the outer darkness, unable to see the wide-scale adoption they so richly deserve.

An Elegant Weapon

Fortunately, Scala can be the salvation for the program prover.  It’s hybrid functional / object-oriented nature lends itself beautifully to the expression of highly mathematical concepts of intense precision.  Theoreticians have long suspected this, but the research simply has not been there to back it up.  After all, most PhDs write all their proofs on a blackboard, making use of a character set extension to ASCII.  Fortunately for the world, that is no longer an issue.

The answer is to to turn the problem on its ear (as it were) and eschew mathematical expressions altogether.  Instead of the developer expressing intent in terms of epsilon, delta and gamma-transitions, a simple framework in Scala will infer intent just based on the input code.  All of the rules will be built dynamically using an internal DSL, without any need to mess about in low-level character encodings.  Scala is the perfect language for this effort.  Not only does its flexible syntax allow for powerful expressiveness, but it even supports UTF-8 string literals!  (allowing us to fall back on plan B when necessary)

Note that while Ruby is used in the following sample, the proof inference is actually language agnostic.  This is because the parsed ASTs for any language are virtually identical (which is what allows so many languages to run on the JVM).

class MyTestClass
  def multiply(a, b)
    a * b
  end
end
 
obj = MyTestClass.new
puts obj.multiply(5, 6)

Such a simple example, but so many possible bugs.  For example, we could easily misspell the multiply method name, leading to a runtime error.  Also, we could add instead of multiply in the method definition.  There are truly hundreds of ways this can go wrong.  That’s where the program prover steps in.

We define a simple Scala driver which reads the input from stdin and drives the proof inference framework.  The framework then returns output which we print to stdout.

object Driver extends Application {
  val ast = Parser.parseAST(System.in)
 
  val infer = new ProofInference(InferenceStyle.AGRESSIVE)
  val prover = new ProgramProver(infer.createInference(ast))
 
  val output = prover.prove(ast)
  println(output)
}

It’s as simple as that!  When we run this driver against our sample application, we get the following result:

image

Notice how the output is automatically formatted for easy reading?  This feature dramatically improves developer productivity by reducing the time devoted to understanding proof output.  One of the many criticisms leveled against program provers is that their output is too hard to read.  Not anymore!

Of course, anyone can write a program which outputs fancy ASCII art, the real trick is making the output actually mean something.  If there’s a bug in our program, we want the prover to find it and notify us.  To see how this works, let’s write a new Ruby sample with a minor bug:

class Person < AR::Base
end
 
me = Person.find(1)
me.ssn = '123-45-6789'
me.save

It’s an extremely subtle flaw.  The problem here is that the ssn field does not exist in the database.  This Ruby snippet will parse correctly and the Ruby interpreter will blithely execute it until the critical point in code, when the entire runtime will crash.  This is exactly the sort of bug that Ruby on Rails adopters have had to deal with constantly.

No IDE in the world will be able to check this code for you, but fortunately our prover can.  We feed the test program into stdin and watch the prover do its thing:

image

Once again, clear and to the point.  Notice how the output is entirely uncluttered by useless debug traces or line numbers.  The only thing we need to know is that something is wrong, and the prover can tell us that.

Conclusion

I can speak from experience when I say that this simple tool can work wonders on any project.  Catching bugs early in the development cycle is the Holy Grail of software engineering.  By learning there’s a problem early on, effort can be devoted to finding the bug and correcting it.  I strongly recommend that you take the time to check out this valuable aid.  By integrating this framework into your development process, you may save thousands of hours in QA and testing.

JRuby Interop DSL in Scala

24
Mar
2008

JRuby is an amazing bit of programming.  It has managed to rise from its humble beginnings as a hobby project on SourceForge to the most viable third-party Ruby implementation currently available.  As far as I am aware, JRuby is the only Ruby implementation other than MRI which is capable of running an unmodified Rails application.  But JRuby’s innovation is not just limited to a rock-solid Ruby interpreter, it also provides tight integration between Java and Ruby.

There’s a lot of material out there on how to replace Java with Ruby “glue code” in your application.  The so-called “polyglot programming” technique states that we should embrace multiplicity of language in our applications.  Java may be very suitable for the core business logic of the application, but for actually driving the frontend UI, we may want to use something more expressive (like Ruby).  JRuby provides some powerful constructs which allow access to Java classes from within any Ruby application.  For example:

require 'java'
 
JFrame = javax.swing.JFrame
JButton = javax.swing.JButton
JLabel = javax.swing.JLabel
 
BorderLayout = java.awt.BorderLayout
 
class MainWindow < JFrame
  def initialize
    super 'My Test Window'
 
    setSize(300, 200)
    setDefaultCloseOperation EXIT_ON_CLOSE
 
    label = JLabel.new('You pushed the button', JLabel::CENTER)
      label.visible = false
    add label
 
    button = JButton.new 'Push Me'
    button.add_action_listener do
      label.visible = true
    end
    add(button, BorderLayout::SOUTH)
  end
end
 
window = MainWindow.new
window.visible = true

sshot-1  sshot-2 

Not a terribly complex example, but it illustrates some of the major advantages of JRuby.  Notice how clean and concise this code is.  It wouldn’t have been much longer had I done this using Java, but it would certainly have been less readable.  Ruby is absolutely perfect for this sort of use case (driving a UI).

As I said though, there are a myriad of examples showing this sort of thing.  As such, it’s not a very interesting topic for a posting.  What the masses have failed to cover, however, is how to accomplish the opposite: calling from Java into Ruby.

Likely the reason this topic has received less attention is because Java is the language will the veritable zoo of libraries and frameworks.  The amount of effort and research that has been put into Java simply dwarfs the comparative immaturity of the Ruby offerings.  Given the disparity, why would you even want to call into Ruby from Java?  This conclusion seems logical until one remembers that almost any application which uses Ruby for the frontend must actually pass flow control to Ruby at some point.  This means calling some sort of Ruby code.

The Java Way

There is some information available on the JRuby Wiki.  The wiki article really should include the caveat that “some experimentation may be required.”  Sufficient information is available, but it is neither intuitive nor convenient.  From Java, the syntax for executing an arbitrary Ruby statement looks like this:

ScriptEngineManager m = new ScriptEngineManager();
ScriptEngine rubyEngine = m.getEngineByName("jruby");
ScriptContext context = engine.getContext();
 
context.setAttribute("label", new Integer(4), ScriptContext.ENGINE_SCOPE);
 
try {
    rubyEngine.eval("puts 2 + $label", context);
} catch (ScriptException e) {
    e.printStackTrace();
}

It’s a typical Java API: over-bloated, over-designed and over-generic.  What would be really nice is to have a syntax for accessing Ruby objects that is as seamless as accessing Java from Ruby.  I want to be able to call Ruby methods and use Ruby classes with the same ease that I can use Java methods and classes.  In short, I want an internal DSL for Ruby.

Unfortunately, Java is a bit constrained in this regard.  Java’s syntax is extremely rigid and does not lend itself well to DSL construction.  It’s certainly possible, but the result is usually less than satisfactory.  We could certainly construct an API around the the Java Scripting API (JSR-233) which provides more high-level access (such as direct method calls and object wrappers), but it would be clunky and only a marginal improvement over the original.

The good news is there’s another language tightly integrated with Java that has a far more flexible syntax.  Rather than building our JSR-233 wrapper in Java, we can avail ourselves of Scala’s power and flexibility, hopefully arriving at a DSL which approaches native “feel” in its syntax.

The Scala Way

Since we’re attempting to construct a tightly-integrated API for language calls, the most effective route would be to apply techniques already discussed in the context of DSL design.  As always, we start with the syntax and allow it to drive the implementation:

// syntax.scala
import com.codecommit.scalaruby._
 
object Main extends Application with JRuby {
  require("test")
 
  associate('Person)(new Person(""))
 
  println("Received from multiply: " + 'multiply(123, 23))
  println("Functional test: " + funcTest('test_string))
 
  val me = new Person("Daniel Spiewak")
  println("Name1: " + me.name)
  println("Name2: " + (me->'name)())
 
  me.name = "Daniel"
  println("New Name: " + me.name)
 
  println("Person#toString(): " + me)
 
  val otherPerson = 'create_person().asInstanceOf[AnyRef]
  println("create_person type: " + otherPerson.getClass())
  println("create_person value: " + otherPerson.send[String]("name")())
 
  eval("puts 'Ruby integration is amazing'")
 
  def funcTest(fun:(Any*)=>String) = fun()
}
 
class Person(name:String) extends RubyClass('Person, name) {
  def name = send[String]("name")()
  def name_=(n:String) = 'name = n
}

And the associated Ruby code:

# test.rb
class Person
  attr_reader :name
  attr_writer :name
 
  def initialize(name)
    @name = name
  end
 
  def to_s
    "Person: {name=#{name}}"
  end
end
 
def test_string
  'Daniel Spiewak'
end
 
def multiply(a, b)
  a * b
end
 
def create_person
  Person.new 'Test Person'
end

Obviously we’re going to need some heavy implicit type conversions.  The important thing to note is that we don’t see any residue of the Java Scripting API, it’s all been encapsulated by our DSL.  We’ve taken an API which is oriented around single-call, low-level invocations and created a high-level wrapper framework which allows method calls, instantiation and even some form of type-checking.

Starting from the top, we see a call which should be familiar to Rubyists, the require statement.  In our framework, this method call is just a bit of syntactic sugar around a call to eval(String).  This semantics are basically the same as within Ruby directly, with the exception of how Ruby source files are resolved.  Any script file on the CLASSPATH is fair game, in addition to the normal Ruby locations.  This allows us to easily embed Ruby scripts within application JARs, libraries and other Java distributables.

Moving down a bit further, we find a somewhat mysterious call to the curried associate(Symbol)(RubyObject) method.  The purpose of this invocation will become more apparent later on.  Suffice it to say that this step is necessary to allow Scala class wrappers around existing Ruby classes.

On the next line of interest, we see for the first time how the framework allows for seamless Ruby method invocation.  Unlike Ruby, Scala doesn’t allow us to simply handle calls to non-existent methods.  Because of this limitation, we have to be a bit more clever in how we structure the syntax.  In this case, we use Scala symbols to represent the method.  There doesn’t seem to be a terribly good explanation of symbols in Scala, but there’s plenty of information regarding how they work in Ruby.  Since the concepts are virtually identical, techniques are cross-applicable.

The key to the whole “symbols as methods” idea is implicit type conversion.  The JRuby trait inherits a set of conversions which look something like this:

implicit def sym2Method[R](sym:Symbol):(Any*)=>R = send[R](sym2string(sym))
implicit def sym2MethodAssign[R](sym:Symbol) = new SpecialMethodAssign[R](sym)
 
private[scalaruby] class SpecialMethodAssign[R](sym:Symbol) {
  def intern_=(param:Any) = new RubyMethod[R](str2sym(sym2string(sym) + "="))(param)
}

Though we haven’t looked at it yet, it is possible to infer the purpose of the send(String) method.  It’s function is to prepare a call to a Ruby method without actually invoking it.  This distinction allows us to pass Ruby methods around as method parameters, just like standard Scala methods.  The method returned is actually an instance of class RubyMethod[R] (where R is the return type).  Scala allows classes to extend structural types like methods, allowing us to redefine the method invocation semantics for wrapped Ruby calls.

class RubyMethod[R](method:Symbol) extends ((Any*)=>R) {
  import JRuby.engine
 
  override def apply(params:Any*) = call(params.toArray)
 
  private[scalaruby] def call(params:Array[Any]):R = {
    val context = engine.getContext()
    val plist = new Array[String](params.length)
 
    for (i <- 0 until params.length) {
      plist(i) = "res" + i
      context.setAttribute(plist(i), JRuby.resolveValue(params(i)), ScriptContext.ENGINE_SCOPE)
      plist(i) = "$" + plist(i)
    }
 
    evaluate(() => if (plist.length > 0) {
      sym2string(method) + "(" + plist.reduceLeft[String](_ + ", " + _) + ")"
    } else {
      sym2string(method) + "()"
    })
  }
 
  protected def evaluate(invoke:()=>String):R = {
    val toRun = invoke()
    Logger.getLogger("com.codecommit.scalaruby").info(toRun)
 
    JRuby.handleExcept(JRuby.wrapValue[R](engine, engine.eval(toRun, engine.getContext())))
  }
}

The gist of this code is simply to assign every parameter value to an attribute in the Ruby runtime.  Attributes of ENGINE_SCOPE (as defined by JSR-233) are represented as global variables within Ruby.  These variables are named sequentially starting from zero.  (e.g. $res0, $res1, …)  As you can imagine, this technique tends to be a bit of a concurrency killer.  To keep things simple, I decided to completely ignore the issues associated with asynchronous execution.  It is certainly possible to adapt the framework to function in a multi-threaded environment, but I didn’t bother to do it.  (one of the perks of blogging is a license to extreme laziness)

Once these parameters are assigned, the method call is evaluated within the context of the runtime.  This is done by literally generating the corresponding Ruby code (done in the anonymous method) and then wrapping the return value in an instance of RubyObject (if necessary).  Note that the send(String) method does not actually kick-start this invocation process at all.  Rather, it creates an instance of RubyMethod[R] which corresponds to the method name.  This class extends (Any*)=>R, so it may be used in the normal “method fashion” - by appending parentheses which enclose parameters (if any).

Supporting Cast

At this point, it’s worth taking a moment to examine the specifics of the framework class hierarchy.  A number of classes exist to wrap around Ruby objects and methods.  We’ve already seen a few of them (RubyMethod[R] and RubyObject), but it’s worth going into more detail as to their purpose and relation to one another. 

Note that these class names often conflict with existing classes in the JRuby implementation.  This odd coincidence is precipitated by the fact that the framework seems to deal with a lot of the same concepts as the JRuby runtime (go figure).  Rather than obfuscating my class naming to avoid conflict, I just assume that you will either make use of the enhanced Scala import feature (as I have in the implementation), or just avoid using the JRuby internal classes.

image

  • RubyObject - The root of the object hierarchy.  This abstract class is designed to encapsulate the core functionality of the generic object (roughly: send, -> and eval) as well as containing all of the implicit type conversions.  Most of the syntax-defining magic happens here (more on this later).
  • JRuby - This is the primary type interface between the developer and the framework.  Classes which wish to make use of Ruby integration must inherit from this trait.  This is where the Logger (for executed statements) is initialized and deactivated.  Within the corresponding object, all of the backend resources are managed.  This is where the actual ScriptEngineManager instance lives, as well as a set of utility methods to handle wrapping and unwrapping of framework-specific objects.
  • RubyWrapperObject - This implementation of RubyObject is designed to wrap around instances which already exist within the Ruby interpreter.  For example, if a Ruby method returns an instance of ActiveRecord::Base, it will be represented in Scala by a corresponding instance of RubyWrapperObject.  Note that objects which are equivalent in the Ruby interpreter are not guaranteed to be pointer-equivalent.  However, the equals(Object) method is well-defined within RubyObject, thus comparisons between RubyObject instances will return sane results.  The == method in Scala is defined in terms of equals(Object), so existing code will behave rationally.
  • RubyClass - With the exception of the JRuby trait, this is likely the only class within the framework which the developer will have to reference explicitly.  This class allows developer-defined Scala classes to wrap around existing Ruby classes, providing type-safe method calls and even extended functionality.  More on this feature later.
  • RubyMethod - We’ve already seen how this serves as a wrapper around calls to Ruby methods.  However, its default implementation assumes that the method is defined in the global namespace.  This is impractical for many method calls (such as dispatch on an object).
  • RubyInstanceMethod - This class solves the problem of object dispatch with RubyMethod.  All of the core functionality is identical to its superclass with the exception of the generated Ruby code.  Instead of just generating a method call passing parameters, this class will generate a method call on a given Ruby object.  Thus, this class depends upon RubyWrapperObject which maintains a reference to a corresponding Ruby instance.

Alternative Dispatch

Not every method call is made on the enclosing scope.  Sometimes it is necessary to call a method in an object to which you have a reference.  For example, a method may return an instance of a some Ruby class.  This instance will be automatically wrapped by a Scala instance of RubyWrappedObject.  Since this Scala class doesn’t actually define any methods which correspond to the Ruby class, it is necessary to once more utilize the “symbols as methods” trick in method dispatch.  There are two ways to call a method on an object like this: the send[R](String) method (where R is the return type), and the -> (arrow) operator.

Using the arrow operator is a lot like normal method calls, except with symbols instead of method names.  Just like dispatch on the enclosing scope, the call is converted into an instance of RubyMethod (actually, an instance of RubyInstanceMethod) which can then be used as a standard Scala method.  The difference between using arrow and dispatching on the enclosing scope is the syntax must be a little more contrived.

Parentheses have the second-highest priority of all the Scala operators (the dot operator (.) has the highest).  This means that if we simply “follow our nose” where the syntax is concerned, we will arrive at an order of invocation which leads to an undesirable result.  Consider the following sample:

val obj = 'create_person()
obj->'name()

The first call is a standard dispatch on the enclosing scope.  The second call is what is interesting to us.  Reading this line naturally (at least to old C/C++ programmers) we would arrive at the following sequence of events:

  1. Get a reference to the name method from the instance contained within obj
  2. Invoke the method, passing no parameters

Unfortunately, this is not how the compiler sees things.  Because parentheses bind tighter than the arrow operator, it actually resolves the expression in the following way:

  1. Get a reference to the name method contained within the enclosing scope
  2. Invoke the method, passing no parameters
  3. Invoke the -> method on the instance within obj, passing the result of name as a parameter

This is obviously not what we wanted.  Unfortunately, there’s no way to make the arrow operator bind tighter than parentheses.  This is a good thing from a language standpoint, but it causes problems for our syntax.

The solution is to enclose any “arrow dispatch” statement within parentheses so as to force the order of evaluation:

val obj = 'create_person()
(obj->'name)()

It looks a bit weird, but it’s the only way Scala will allow this to work.  This call now evaluates properly, calling the name method on the obj instance, passing no parameters.

There’s actually another problem associated with arrow dispatch in our DSL: Scala already has an implicit meaning for the arrow operator.  The following sample should look familiar to those of you who have worked with Scala in other applications:

val numbers = Map(1 -> "one", 2 -> "two", 3 -> "three")

By default, Scala defines the arrow operator as an alternative syntax for defining 2-tuples.  This is good for most things, but bad for us.  What we want is to define a new implicit type conversion which converts Any into a corresponding instance of RubyWrappedObject.  This would allow us to satisfy the syntax given above.  However, Scala’s 2-tuple syntax already defines an implicit type conversion for the Any type which deals with the arrow operator.  Rather than examining the context to attempt to disambiguate, the Scala compiler simply gives up and prints an error stating that the implicit type conversions are ambiguous.  This poses a bit of a problem and nearly killed the arrow operator idea in design.

The solution is actually to override Scala’s built-in conversion by defining our own conversion with the same name and signature but which provides us with the option of using our own arrow operator definition.  The behavior we want is to allow normal use of the arrow operator when dealing with Any -> Any, but convert to RubyWrappedObject and dispatch when dealing with Any -> Symbol.  After a little digging through the Scala standard library, I arrived at the following solution (defined in RubyObject):

implicit def any2ArrowAssoc[A](a:A) = new SpecialArrowAssoc(a)
 
private[scalaruby] class SpecialArrowAssoc[A](a:A) extends ArrowAssoc(a) {
  def ->(sym:Symbol) = (a match {
    case obj:RubyObject => obj
    case other => new RubyWrapperObject(other)
  })->sym
}

Notice that we extend Scala’s pre-existing ArrowAssoc[A] class (which handles the special 2-tuple syntax) and then overload the -> method to work differently with symbols.  This code now does precisely what we need.  By introducing this extra layer of indirection, as well as by overriding Scala’s existing conversion, we’re able to support the arrow syntax as shown in the above examples.

Sending Messages

There is one final form of dispatch which allows typed return values: send[R](String).  This is actually the method to which all the other dispatch forms delegate (as it is the most general).  This method is very similar to the Ruby send method which allows Smalltalk-style message passing on arbitrary objects.  The really important thing about this method though is that it will automatically cast the return value from the method to whatever type you specify, allowing you to define type-safe wrappers around existing Ruby methods in Scala:

def multiply(a:Int, b:Int) = send[Int]("multiply")(a, b)
 
val result:Int = multiply(123, 23)

send is effectively defined as a curried function since it takes a method name as a parameter and returns an instance of RubyMethod as a result.  This mimics the behavior of dispatch with symbol literals in that you can use send to generate type-safe partially-applied functions for corresponding Ruby methods.

Note that send could just as easily have taken a symbol as a parameter, rather than a string.  However, the metaphor throughout the DSL is “symbols as methods”, thus string was used to avoid logical conflict.  Scala itself was perfectly happy passing symbol literals around in addition to treating them as methods.

Class Wrapping

The final bit of code in the example now so far above us serves as a sample of how one might wrap an existing Ruby class within Scala.  Person is actually a class defined in Ruby (as you can see from the Ruby sources).  It has a read/write attribute, name, as well as an overridden to_s method.  RubyObject already contains the logic for handling calls to toString() and proxying them to Ruby’s to_s, but the name attribute must be handled explicitly in code.

The goal is basically to provide a type-safe wrapper around the Person Ruby class.  We could just as easily dispatch on the automatically wrapped instance of RubyWrappedObject using either syntax described above, but an explicit wrapper is a bit nicer.  The compiler can check things for us, and we can even add methods to the class (at least, as far as Scala is concerned) in true Ruby “open class” style.  All that is necessary to accomplish this wrapper is to extend RubyClass and to define the delegating wrapper methods:

class Person(name:String) extends RubyClass('Person, name) {
  def name = send[String]("name")()
  def name_=(n:String) = 'name = n
}

We specify which Ruby class we are wrapping as the first parameter in the constructor for RubyClass.  The parameters which follow are passed directly to the constructor of the corresponding Ruby class.  This Ruby constructor is invoked automatically, instantiating the corresponding wrapped Ruby object in the background.  Notice that we specify the name of the Ruby class using a symbol.  This is the one place in the framework that we break with the “symbols as methods” metaphor.  The consequence is a nice, clean syntax for Ruby class wrapping.  Unfortunately, it also means that wrapping a class within a non-included namespace (e.g. ActiveRecord::Base) can be a little clunky.  The only way to do it is to explicitly invoke the Symbol(String) constructor.  (this is required because Scala symbols can only contain alpha-numerics and underscores)

Once we have our wrapped class signature, it’s easy to define the delegate methods.  Scala encourages a blurring of field and method, similar to Ruby.  As such, it supports a very Ruby-esque syntax for accessor/mutator pairs.  This makes the wrapped syntax just a bit nicer.  For the accessor, we make a call to the send method, specifying the return type necessary for the wrapper.  The mutator allows us to be a bit more creative.

We don’t really need type-safe return values for a mutator.  We would normally just set the return type as Unit and ignore the result.  Thus we can once again use the symbol dispatch syntax.  Notice that this time we’re not directly treating a symbol as a method.  We’re apparently assigning a value to the symbol using the = operator (corresponds to the operator= assignment operator in C++).  This is possible through a separate implicit type conversion which generates a one-off utility instance:

private[scalaruby] class SpecialMethodAssign[R](sym:Symbol) {
  def intern_=(param:Any) = new RubyMethod[R](str2sym(sym2string(sym) + "="))(param)
}

As you can see, all this method does is generate a new symbol which includes the ‘=’ character and returns the result of dispatching on the corresponding Ruby method.  Note that mutators in Ruby are defined as “=“, thus appending “=” to the method name is the appropriate behavior.

Return Value Wrapping

There’s actually a slight problem involved in allowing Scala wrappers around existing Ruby types.  Well, not so much a problem as an inconsistency.  The problem is simply this: if a Ruby method creates an instance of a Ruby class for which there is a Scala wrapper and returns this value through the framework into Scala, one would expect this value would be wrapped into an instance of the Scala wrapper.  If you look in the example far above, there is an example of this in the create_person method.  The method creates an instance of Ruby class Person and returns it as a result.

Somehow, the framework must identify that there is a corresponding Scala wrapper and then properly create an instance.  This actually poses something of a dilemma in two ways.  Number one, Scala has no equivalent to Ruby’s ObjectSpace, so there’s no way to get a comprehensive list of all classes which have been defined.  Even if we could get this list, the corresponding Ruby class is specified in the constructor parameters to RubyClass, so there’s no way to obtain the information statically from outside the class.  Number two, we have to somehow create an instance of the Scala wrapper class without creating a corresponding instance of the wrapped Ruby class (since we already have one).  This means we need some sort of override in the RubyClass constructor.

The best solution to all of these problems is to introduce the associate method.  The usage is demonstrated at the top of the example where we associate the Person Ruby class with the Person Scala wrapper class.  More specifically, we associate the Ruby class with a pass-by-name parameter which defines how to instantiate the Scala class.  This is an important distinction as it solves our second problem of instance creation.  The framework has no way of knowing what parameters must be passed to the Scala wrapper constructor, so the instantiation itself must be passed:

associate('Person)(new Person(""))

As I mentioned previously, this is a pass-by-name parameter which means that it will not be immediately evaluated, but rather on-demand somewhere in the body of associate.  The associate method actually takes this value and wraps it in an anonymous method which invokes the instantiation each time a value of Ruby type Person must be wrapped.  Just prior to invoking the constructor, an override is put in place within the RubyClass singleton object (not shown in the class hierarchy) to prevent the creation of a corresponding Ruby instance.  This is what allows the new instance of Scala class Person to correspond with an existing Ruby value.  Here again we’re sacrificing concurrency for a hacky work-around to a complex problem.  Any sort of “proper” implementation would have to solve this problem in a more elegant way.

It Never Ends!

This post, that is.  There’s so much more I could ramble on about (I never even talked about how exceptions are handled), but this entry is already far too long.  Hopefully the material presented here only serves to whet your appetite for slicker JRuby-Scala integration and all the benefits it can bring.  I’ve packaged up the framework presented here as a downloadable archive.  The package includes the Ruby engine for the Java Scripting API as well as a jar-complete build made from the JRuby SVN.  The project may work with JRuby 1.0, but I doubt it.  Anyway, JRuby 1.1 is due shortly, so why bother.  Remember that this is extremely untested and very experimental.  (I did warn you about the concurrency issues, right?)  If this is interesting to people, I may do a proper release into an OSS project somewhere.  For right now, I just don’t have the time.  :-(

I hope this entry gives you an idea of what’s involved in Scala DSL implementation, as well as an idea of where such a technique may be useful in your own projects.  After all, what would be better than everyone being able to write their own Rails-killer and define highly fluid APIs!

Should ORMs Insulate Developers from SQL?

25
Feb
2008

This is a question which is fundamental to any ORM design.  And really from a philosophical standpoint, how should ORMs deal with SQL?  Isn’t the whole point of the ORM to sit between the developer and the database as an all-encompassing, object oriented layer?

A long time ago in an office far, far away, a very smart cookie named Gavin King got to work on what would become the seminal reference implementation for object relational mapping frameworks the world over (or so Java developers would like to think).  This project was to be bundled with JBoss, possibly the most popular enterprise application server, and would support dozens of databases out of the box.  It was to offer heady benefits such as totally object-oriented database access, transparent multi-tier caching and a flexible transaction model.  At its core though, Hibernate was design to resolve a single problem: application developers hate SQL.

No really, it’s true!  Bread-and-butter application developers really dislike accessing data with SQL.  This has led to endless conflict (and bad jokes) between application developers and database administrators.  Often times the developer team would write a set of boilerplate lines in Java and then copy/paste these arbitrarily throughout their code, swapping in the relevant query as supplied by the DBA.  For obvious reasons, this would become very hard to maintain and just intensified the bad blood between developer and database.

If you think about it though, it’s a bit odd that this intense dislike would mutate from just hating the insanity of JDBC to hating JDBC, SQL and RDBMS in general.  SQL is a very nice, almost mathematical language which allows phenomenally powerful queries to be expressed simply and elegantly.  It abstracts developers from the headache of database-specific hashing APIs and algorithms which are almost filesystems in complexity.  The language was designed to make it as easy as possible to get data out of a relational database.  The fact that this effort backfired so utterly is a source of endless confusion to me.

But irregardless, we were talking about ORMs.  When it was first introduced, Hibernate held out the promise that developers would never again have to wade knee deep through a sea of half-set SQL.  Instead, developers would pass around POJOs (Plain Old Java Object(s)), modifying their values like any other Java bean and then handing these objects off to the data mapper, which would handle the details of persistence.  Furthermore, Hibernate promised that developers would never again have to worry about which databases support which non-standard SQL extensions.  Since developers would never have to work with SQL, anything database-specific could be handled within the persistence manager deep in the bowels of Hibernate itself.

This all seems lovely and wonderful, but there’s a catch: it doesn’t work so well in practice.  Now before you stone me, I’m not talking about Hibernate specifically now, but ORMs in general.  It turns out to be completely impossible to interact with a relational database solely through an object-oriented filter.  This is easily seen with a simple example:

SELECT * FROM people WHERE age > 21 GROUP BY lastName

How in the world are you going to represent that in an object model?  Sure, maybe you can provide a little abstraction for the query details, but it starts to get complex if you try to handle things like grouping non-declaratively.  The developers working on Hibernate quickly realized this problem and came up with an innovative solution: write their own query language!  After all, SQL is too confusing, so why not invent an entirely new query language with the “feel” of SQL (to keep the DBAs happy) but without all of the database-specific wrinkles?

This query language is now called “HQL”, and as the name implies, it’s really SQL, but not quite.  Here’s how the aforementioned example would look in HQL (disclaimer: I’m not a Hibernate expert, so I may have gotten the syntax wrong):

FROM Person WHERE :age > 21 GROUP BY :lastName

Remarkably similar, that.  Executing this query in a Hibernate persistence manager yields an ordered list of Person entities pre-populated with data from the query.  It seems to make a lot of sense, but there are a number of problems with this approach.  First, it requires Hibernate to literally have its own compiler to translate HQL queries into database-specific SQL.  Second, it hasn’t really solved the core problem that many developers have with SQL: it’s a declarative query language.  As you can see, HQL is really just SQL in disguise, so it really doesn’t eliminate SQL from your database access, just dresses it in a funny hat.

Other ORMs have appeared over the years, taking alternative approaches to the problem of object-relational mapping, but none of them quite eliminating the query language.  Even DSL-based ORMs like ActiveRecord fail to remove SQL entirely:

class Person < AR::Base; end
 
Person.find(:all, :conditions => 'age > 21', :group => 'lastName')

It’s sort of SQL-free, but you can still see bits and pieces of a query language around the edges.  In fact, what ActiveRecord is actually doing here is building a proper SQL query around the SQL fragments which are passed as parameters.  It’s a system which is ripe for SQL injection, but surprisingly leads to very few problems in real-world applications.  This is the approach which is also taken by ActiveObjects for its database query API.

So ORMs in and of themselves seem to have failed to entirely eliminate SQL from the picture, but what about other frameworks?  There are a few quite recent efforts which seem to have nearly succeeded in eliminating the direct use of SQL completely from application code.  Ambition is perhaps the best (and most clever) example of this, though others like scala-rel are catching up fast.  Ambition is designed from the ground up to interact naturally with ActiveRecord, so the two combined perhaps represent the first “true” ORM: one which does not require the developer at any point to deal with any SQL whatsoever.

But was it really worthwhile?  As clever as things like Ambition are, is it really that much easier than just writing queries in SQL?  As Nathan Hamblen so eloquently said (when referring to a totally different topic):

…is the end of the ORM rainbow.  You get there, throw yourself a party and realize that important things are broken.

A quote taken out of context perhaps, but I think it applies to the “cult of SQL genocide” with as much validity.  In the end, by denying yourself access to the powerful and well-understood mechanism that is SQL, you’re just crippling your own application and forcing yourself to write more code instead of less.

So what’s the “right” approach?  Is there a happy medium between ActiveRecord+Ambition and full-blown SQL on Rails?  I think so, and that is the approach I have been trying to implement with ActiveObjects.  As I’m sure you know, ActiveObjects takes a lot of its inspiration from ActiveRecord, so the syntax for querying the database is very similar:

EntityManager em = ...
em.find(Person.class, Query.select().where("age > 21").group("lastName"));
 
// ...or
em.find(Person.class, "name > 21");   // no grouping

You still have the full power of SQL available to you.  You can still write complex, nested boolean conditionals and funky subqueries, but there’s no longer any need to be burdened with the whole of SQL’s verbosity.  As with vanilla ActiveRecord, this code intends to be a bit of a hand-holder, shielding innocent application developers from the fierce world of RDBMS.

Is this the right way to go?  I’m honestly not sure.  I’ve met a lot of developers that would give their left eye to never have to look at another SQL statement again (for developers already missing a right eye, this isn’t much of a stretch).  On the other hand, there are purists like myself who revel in the freedom afforded by a powerful, declarative language.  It’s hard to say which path is better, but at the end of the day, it’s really the question itself that matters.  Giving application developers the choice to select whichever approach they feel is most appropriate, that is the solution.

Adding Type Checking to Ruby

6
Feb
2008

What’s the first thing you think of when you consider the Ruby Language?  Dynamic types, right?  Ruby is famous (infamous?) for its extremely flexible type system, and as a so-called “scripting language”, the core of this mechanism is a lack of type checking.  This feature allows for some very concise expressions and a great deal of flexibility, but sometimes makes your code quite a bit harder to understand.  More importantly, it weakens the assurances that a certain method will actually work when passed a given value.

Several different solutions have been proposed to workaround this limitation.  The canonical technique involves intensifying tests and increasing test coverage.  Ruby has some excellent unit test frameworks (such as RSpec) which serve to ease the pain associated with this approach, but no matter how you slice it, tests are a pain.  Having to rely on tests to take the place of type checking in the code assurance process can be extremely frustrating.

Another, less common technique is to simply perform dynamic type checks within the method itself.  Like so:

def create_name(fname, lname)
  raise "fname must be a String" unless fname.kind_of? String
  raise "lname must be a String" unless lname.kind_of? String
 
  fname + " " + lname
end

This code explicitly checking the dynamic kind of the parameter values to ensure that they are of type or subtype of String.  The issues with this sample should be relatively obvious.

Primarily, it’s ugly!  This sort of repetitious, boiler-plate conditional checking is exactly the sort of thing Ruby tries to avoid.  What’s more is the added bulk of all of these repetitive checks (assuming you perform one check per-parameter per-method) because far more unwieldy than just improving the rspec test coverage.

While manually type checking may be a bad solution syntactically, it’s on the right track conceptually.  What we really want is some sort of assertion that the parameters are of a certain type, but that won’t overly bloat our existing code.  We need some sort of framework that will “weave in” (think AOP) its type assertions without getting in the way our our algorithms.

Well it turns out that someone’s already done thisEivind Eklund kindly pointed me to his type checking framework in a comment on a previous post.  The basic idea is to perform the type checking assertions, but to factor the work out into an API encapsulated by an intuitive DSL.  So rather than performing all those nasty unless statements as above, we could simply do something like this:

typesig String, String
def create_name(fname, lname)
  fname + " " + lname
end

It’s really as simple as that.  Passing the type values to the typesig method just prior to a method declaration give the cue to the Types framework to perform some extra work on each call that method.  Now we have the runtime assurances that the following code will not work (with a very intuitive error message):

create_name("Daniel", 123)

Will produce the folling output:

ArgumentError: Arg 1 is of invalid type (expected String, got Fixnum)

But the fun doesn’t stop there.  Ruby encourages the “duck typing” pattern, where algorithm developers concern themselves not with what the value is but rather what it does.  This means that the type checking really should be done based on what methods are available, not just the raw type.  It turns out that the Types framework supports this as well:

class Company
  def name
    "Blue Danube"
  end
end
 
class Person
  def name
    "Daniel Spiewak"
  end
end
 
typesig String, Type::Respond(:name)
def output(msg, value)
  puts msg + " " + value.name
end
 
c = Company.new
p = Person.new
 
output("The company name is: ", c)
output("The person is: ", p)
 
output("The programmer is: ", "a genius")    # error

Types can check not only the kind of the object but also to what methods it responds.  This is crucial to enabling its adoption into modern Ruby code bases, many of which rely heavily on this “duck typing” technique.

You can think of the Types framework just like another layer in your testing architecture.  Obviously it’s not performing any sort of static type checking (since Ruby has no compile phase).  All it’s doing is providing that extra certainty that you’re never passing something weird from somewhere in your code, something that would break your algorithm.

So what’s the catch?  Well, obviously you need to have the Types framework installed.  It’s not as easy as just typing gem install types either, since the framework actually predates Ruby Gems.  You’ll have to download the framework and then copy around the types.rb file yourself.  But this is just deployment semantics.  The more interesting issue are the limitations of the code itself.

As far as I can tell, the only restriction on the framework is that it must be used within a proper class, not in the root scope.  This means that all of my examples above would have to be enclosed in a class, rather than just copy-pasted into a .rb file and run in place.  But other than this one limitation, the framework is incredibly flexible.  I really haven’t shown you the seriously interesting stuff in terms of the API (there are more examples at the top of the types.rb file).  In many ways, Types is actually more powerful than any static type checking mechanism could be (yes, I’m even including Scala in that evaluation).

I haven’t had a chance to use Types on any serious project myself, but I can see tremendous potential, particularly for companies with large-scale Ruby/Rails deployments or even smaller projects looking for just a bit tighter code assurance.  As far as I’m concerned, there shouldn’t be a non-trivial Ruby project attempted without this lovely library, Rails or no Rails.