Skip to content

Scala as a Scripting Language?

3
Nov
2008

I know, the title seems a bit…bizarre.  I don’t know about you, but when I think of Scala, I think of many of the same uses to which I apply Java.   Scala is firmly entrenched in my mind as a static, mid-level language highly applicable to things like large-scale applications and non-trivial architectures, but less so for tasks like file processing and system maintenance.   However, as I have been discovering, Scala is also extremely well suited to basic scripting tasks, things that I would normally solve in a dynamic language like Ruby.

One particular task which I came across quite recently was the parsing of language files into bloom filters, which were then stored on disk.  To me, this sounds like a perfect application for a scripting language.  It’s fairly simple, self-contained, involves a moderate degree of file processing, and should be designed, coded and then discarded as quickly as possible.   Dynamic languages have a tendency to produce working designs much faster than static ones, and given the fact that the use-case required access to a library written in Scala, JRuby seemed like the obvious choice (Groovy would have been a fine choice as well, but I’m more familiar with Ruby).  The result looked something like this:

require 'scala'
 
import com.codecommit.collection.BloomSet
 
import java.io.BufferedOutputStream
import java.io.FileOutputStream
 
WIDTH = 2000000
 
def compute_k(lines, width)
  # ...
end
 
def compute_m(lines)
  #...
end
 
Dir.foreach 'wordlists' do |fname|
  unless File.directory? fname
    count = 0
    File.new "wordlists/#{fname}" do |file|
      file.each { |line| count += 1 }
    end
 
    optimal_m = compute_m(count)
    optimal_k = compute_k(count, WIDTH)
 
    set = BloomSet.new(optimal_m, optimal_k)
 
    File.new "wordlists/#{fname}" do |fname|
      file.each do |line|
        set += line.strip
      end
    end
 
    os = BufferedOutputStream.new FileOutputStream.new("gen/#{fname}")
    set.store os
    os.close
  end
end

As far as scripts go, this one isn’t too bad.  I’ve written some real whoppers for things like video encoding and incremental backups.  The main trick here is the fact that we need to make two separate passes over the same file in order to get the number of lines before constructing the set.  We could load the file into an array buffer in a single pass, count its length and then iterate over the array, placing each element in the bloom filter.   However, this really wouldn’t be too much faster than just hitting the file twice (we still need two separate passes) and it has the additional drawback of requiring a fair amount of memory.

All in all, this script is a fairly natural representation of my requirements.   I needed to loop over a number of word lists, push the results into separate bloom filters and then freeze-dry the state.  However, look at what we’ve actually done here.  Remember earlier where we were considering which language to use?  We wanted a language which could concisely and quickly express our intent.  For that decision making process, we just assumed that a dynamic language would suffice better than one hampered by a static type system.  However, at no point in the above script do we actually do anything truely dynamic.  By that I mean: open classes, unfixed parameter types, method_missing, that sort of thing.   In fact, we haven’t really done anything that we couldn’t do in Scala:

import com.codecommit.collection.BloomSet
import java.io.{BufferedOutputStream, File, FileOutputStream}
import scala.io.Source
 
val WIDTH = 2000000
 
def computeK(lines: Int, width: Int) = // ...
 
def computeM(lines: Double) = // ...
 
for (file <- new File("wordlists").listFiles) {
  if (!file.isDirectory) {
    val src = Source.fromFile(file)
    val count = src.getLines.foldLeft(0) { (i, line) => i + 1 }
 
    val optimalM = computeM(count)
    val optimalK = computeK(count, optimalM)
 
    val init = new BloomSet[String](optimalM, optimalK)
 
    val set = src.reset.getLines.foldLeft(init) { _ + _.trim }
 
    val os = new BufferedOutputStream(new FileOutputStream("gen/" + file.getName))
    set.store(os)
    os.close()
  }
}

This is actually runnable Scala.  I’m not omitting boiler-plate or cheating in any similar respect.  If you copy this code into a .scala file and make sure that BloomSet is on your CLASSPATH (which you would have needed anyway for JRuby), you would be able to run the script uncompiled using the scala command.  Unlike Java, Scala actually includes an “interpreter” which can parse raw Scala sources and execute the representative program just as if it had been pre-compiled using scalac.   One of the perquisites of this approach is the ability to simply omit any main method or Application class.  In nearly every sense of the word, Scala is a scripting language…as well as an enterprise-ready Java-killer (let the flames begin).

Now that we’re fairly convinced that the above is valid Scala, let’s compare it with the original version of the script written using JRuby.  If we just go off LoC (Lines of Code), Scala actually wins here.  This was a more-than-slightly surprising discovery for me, given how often dynamic languages (and Ruby in particular) are touted as being more concise and expressive than static languages.  But of course, sheer LoC-brevity isn’t everything: we also should consider things like readability.  A few characters of Befunge can accomplish more than I can do in several lines of Scala, but that doesn’t mean I’ll be able to figure out what it means tomorrow morning.

On the readability score, I think Scala wins here too.  The file processing and set creation is all done in a highly functional style (using foldLeft).   At least to my eyes, this is a lot easier to follow than the imperative form in Ruby.  More importantly, I think it’s a bit harder to make silly mistakes.  When I wrote the Ruby version of the script, it took several tries before I solidly pinned down the exact incantation I was seeking.  The Scala version literally required only one revision after the initial prototype.   Granted, I had the Ruby version to go off of, but I think we would all agree that the scripts use some fairly different libraries and methodologies for accomplishing identical tasks.

So what is it that makes Scala so surprisingly well suited to the task of quick-and-dirty file processing and scripting?  After all, isn’t is just a fancy syntax wrapping around the plain-old-Java standard library?  While it is true that Scala has first-class access to Java libraries (as demonstrated in the script), that isn’t all that it offers.  I believe that Scala has two important features which make it so suitable for these tasks:

  • Type inference
  • Powerful core libraries

The first feature is of course evident wherever you look in the script.  With the exception of the two methods and the BloomSet constructor, we never actually declare a type anywhere in the script.  This gives the whole thing a very “dynamic feel” without actually sacrificing static type safety.   The first time you try this sort of language feature it is an almost euphoric experience (especially coming from highly-verbose languages like Java).

The second feature is a bit harder to see.  It is most evident in the way in which we handle file IO.  The directory listing is of course yet another application of the venerable java.io.File class, but the process of opening and reading the file line-by-line seems to be a lot easier than anything Java can muster.  This is made possible by Scala’s Source API.   Rather than fiddling with BufferedReader and the whole menagerie that goes along with it, we just get a new Source from a File instance and then use conventional Scala methods to iterate over its contents.  In fact, we’re actually applying a functional idiom (fold) rather than a standard imperative iteration.  Finally, when we’re done with our first pass, we don’t need to re-open the file from scratch (inviting initialization mistakes in our coding), we just reset the Source and start from the beginning once more.

Using Scala as a scripting language comes with some pretty hefty benefits.   For one thing, you get immediate and idiomatic access to the mighty wealth of libraries which exist in Java.  Even for scripting, this sort of interoperability is invaluable.  JRuby does provide some excellent Java interop, but it simply can’t compare to what you get with Scala.  Further, Scala has a static type system to check you (at runtime with a script) to ensure that you haven’t done anything obviously bone-headed.  This too is nothing to sniff at.

Given the fact that Scala’s “scripting syntax” is just as concise as Ruby’s (sometimes more), it’s hard to see a reason not to employ it for around-the-server tasks.  Amusingly, the most compelling reason not to use Scala for scripting just might be its comment syntax.  Not having direct support for the magic “hash bang” (#!) incantation to define a file interpreter just means that Scala scripts have to go through some extra steps to be directly executable.  However, if immediately-executable scripts aren’t an issue, you may want to consider Scala as your scripting language of choice for your next non-trivial outing.  You may reap the rewards in ways you weren’t even expecting.

Is Scala Not “Functional Enough”?

20
Oct
2008

In one of Rich Hickey’s excellent presentations introducing Clojure, he mentions in passing that Scala “isn’t really a functional language”.  He says that Java and Scala are both cut from the same mold, and because Scala doesn’t force immutability it really shouldn’t qualify.  These viewpoint is something I’ve been hearing a lot of from various sources, people talking about how F# is really the only mainstream functional language, or how once Erlang takes off it will leave Scala in the dust.

When I first heard this sentiment voiced by Rich, I brushed it off as a little odd and only slightly self-serving (after all, if you don’t use Scala, there’s a better chance you will use Clojure).  Rich has his own opinions about a lot of things, but I have found with most that I can still understand his rationale, even if I don’t agree.  So, realizing that many of his other kooky ideas seemed to have some basis in reality, I decided to come back to his opinion on Scala and give it some deeper consideration.

The core of the argument made by Rich (and others) against Scala as a functional language goes something like this:

  • Mutable variables as first-class citizens of the language
  • Uncontrolled side-effects (ties in with the first point)
  • Mutable collections and other imperative libraries exist on equal footing
  • Object-oriented structures (class inheritance, overloading, etc)
  • Verbosity

Comparative Type Inference

If you’re coming from Java-land, the final point may have caught you a bit by surprise.  After all, Scala is vastly more concise than Java, so how could anyone possibly claim that it is “too verbose”?  Well, to answer that question, you have to compare Scala with the other side of the language jungle: the functional languages.  Here’s an explicitly-recursive function which sums a list of integers:

def sum(ls: List[Int]): Int = ls match {
  case hd :: tail => hd + sum(tail)
  case Nil => 0
}

That’s not too bad.  The use of pattern matching eliminates an entire class of runtime errors (selecting a non-existent element) and makes the code a lot cleaner than the equivalent Java.  However, compare this with the same function ported directly to SML (a functional language:

fun sum nil = 0
  | sum (hd :: tail) = hd + sum tail

One thing you’ll notice here is the complete lack of any type annotations.  Like most static functional languages, ML (and derivatives) has a form of type inference called “Hindley - Milner” (sometimes called “global type inference”).  Rather than just looking at a single expression to infer a type (like Scala), Hindley - Milner looks at the entire function and derives the most general (least restrictive) type which satisfies all expressions.  This means that everything can be statically type-checked with almost no need to declare types explicitly.

“Now, wait!” (you say), “You would never write a function just to sum a list; you should be using a fold.”  That’s true.  So let’s see how well these two languages do when the problem is solved in a more realistic fashion.  Once again, Scala first:

def sum(ls: List[Int]) = ls.foldLeft(0) { _ + _ }

Let’s see ML top that!

fun sum ls = foldl (op+) 0 ls

Then again, maybe we’ll just quite while we’re behind…

The fact is that Scala requires significantly more ceremony to accomplish some things which are trivial in pure-functional languages like ML and Haskell.  So while Scala may be a huge step up from Java and C++, it’s still a far cry from being the quickest and most readable way of expressing things in a functional style.

One obvious solution to this would be to just add Hindley - Milner type inference to Scala.  Well, this may be the “obvious” solution, but it doesn’t work.  Scala has an extremely powerful and complex type system, one with a number of properties which Hindley - Milner just can’t handle.  A full object-oriented inheritance hierarchy causes some serious problems with the “most general” inference of Hindley - Milner: just about everything becomes type Any (or close to it).  Also, method overloading can lead to ambiguities in the inferred types.  This is actually a problem even in the venerable Haskell, which imposes hard limitations on what functions can be in scope at any given point in time (so as to avoid two functions with the same name).

Simply put, Scala’s design forbids any type inference (that I know of) more sophisticated than local expression-level.  Don’t get me wrong, it’s still better than nothing, but a language with local type inference alone will never be as generally concise as a language with Hindley - Milner.

Side Effects

One big ticket item in the litany of complaints against Scala is the admission of uncontrolled side effects.  It’s not hard to find an example which demonstrates this property:

val name = readLine
println("Hello, " + name)

This example alone shows how fundamental side-effects are within the Scala language.  All we have done here is made two function calls, one of them passing a String and receiving nothing as a result.  From a mathematical standpoint, this code snippet is virtually a no-op.  However, we all know that the println function has an additional side effect which involves sending text to standard out.  Coming from Java, this makes perfect sense and it’s probably hard to see why this would be considered a problem.  However, coming from Haskell, what we just wrote was a complete abomination.

You see, Haskell says that no function should ever have side effects unless they are explicitly declared using a special type constructor.  In fact, this is one of the areas where monads have had a huge impact on Haskell’s design.  Consider the following Haskell equivalent:

main :: IO ()
main = do
         name <- getLine
         putStrLn ("Hello, " ++ name)

Even if you don’t know Haskell, the above should be pretty readable.  The first line is the type declaration for the main “function” (it’s actually a value, but why quibble).  Haskell does have Hindley - Milner type inference, but I wanted to be extra-explicit.  You’ll notice that main is not of type void or Unit or anything similar, it is actually of type IO parameterized with Haskell’s form of Unit: ().  This is an extremely important point: IO is a monad which represents an action with side-effects returning a value which matches its type parameter (in this case, ()).  The little dance we perform using do-notation is just a bit of syntax sugar allowing us to compose two other IO values together in a specific order.  The getLine “function” is of type IO String, meaning that it somehow reads a String value by using side effects (in this case, reading from standard in).  Similarly, putStrLn is a function of type String -> IO ().  This means that it takes a String as a parameter and uses it to perform some side effects, from which it obtains no result value.  The do-notation takes these two monadic values and composes them together, forming one big value of type IO ().

Now, this may seem horribly over-complicated, especially when compared to the nice clean side effects that we have in Scala, but it’s actually quite mathematically elegant.  You see, the IO monad is how we represent actions with side effects.  In fact, the only (safe) way to have side effects in Haskell is to wrap them up inside monad instantiations like these.  Haskell’s type system allows you to actually identify and control side effects so that they remain contained within discrete sections of your code base.

This may not sound so compelling, but remember that functional programming is all about eliminating side effects.  You compute your result, you don’t just accidentally find yourself with a magic value at the end of a long run.  The ability to work with side effects as packaged values just like any other constant is extremely powerful.  More importantly, it is far closer to the “true” definition of functional programming than what we have in Scala.

Conclusion

I hate to say it, but Rich Hickey and the others are quite right: Scala isn’t a terribly functional language.  Variables, mutable data structures, side effects and constant type declarations all seem to conspire to remove that crown from Scala’s proverbial brow.  But let’s not forget one thing: Scala wasn’t designed to be a functional language.

That may sound like heresy, but it’s true.  Scala was created primarily as an experiment in language design, specifically focusing on type systems.  This is the one area where I think Scala excels far beyond the rest of the field.  Scala makes it possible to model many problems in an abstract way and then leverage the type system to prove correctness at compile time.  This is approach is both revolutionary and an extremely natural way to solve problems.  The experience of using the type system in this fashion is a little difficult to describe (I’m still on the lookout for good examples), but trust me, you’ll like it when you see it.

Scala’s not really a functional language, and as Cedric Beaust has pointed out, it’s not really the best object-oriented language either; so what is it good for?  Scala sits in a strange middle ground between the two worlds of functional and object-oriented programming.  While this does have some disadvantages like being forced to take second place in terms of type inference, it also lets you do some really interesting stuff like build a mutable ListBuffer with constant time conversion to an immutable List, or sometimes recognize the fact that fold is not the universal solution.  It’s an experiment to be sure, but one which I think has yielded some very powerful, very useful results…just not many of a purely functional nature.

Implicit Conversions: More Powerful than Dynamic Typing?

15
Sep
2008

One of the most surprising things I’ve ever read about Scala came in the form of a (mostly positive) review article.  This article went to some lengths comparing Scala to Java, JRuby on Groovy, discussing many of its advantages and disadvantages relative to those languages.  Everyone seems to be writing articles to this effect these days, so the comparison in and of itself was not surprising.  What was interesting was an off-hand comment discussing Scala’s “dynamic typing” and how it aids in the development of domain specific languages.

Now this article had just finished a long-winded presentation of type inference and compilation steps, so I’m quite certain that the author was aware of Scala’s type system.  The more likely target of the “dynamic typing” remark would be Scala’s implicit conversions mechanism.  I have heard this language feature described many times as being a way of “dynamically” adding members to an existing class.  While it would be incorrect to say that this feature constitutes a dynamic type system, it is true that it may be used to satisfy many of the same design patterns.  Consider the facetious example of a string “reduction” method, one which produces an acronym based on the upper-case characters within the string:

val acronym = "Microsoft Certified Systems Engineer".reduce
println(acronym)            // MCSE

The immediate problem with this snippet is the fact that string literals are of type java.lang.String, a class which comes pre-defined by the language.  The only way to ensure that the above syntax works properly is to “add” the reduce method to the String class separate from its definition.  In a language such as Ruby or Groovy which have dynamic type systems, we could simply open the class definition and add a new method at runtime.  However, in Scala we have to be a bit more tricky.  We can’t actually add methods to an existing class, but we can define a new class which contains the desired method.  Once we have that, we can define an implicit conversion from our target class to our new class.  The Scala compiler sees this and performs the appropriate magic behind the scenes.  In code, it looks like this:

class MyRichString(str: String) {
  def reduce = str.toCharArray.foldLeft("") { (t, c) =>
    t + (if (c.isUpperCase) c.toString else "")
  }
}
 
implicit def str2MyRichString(str: String) = new MyRichString(str)

This contrasts quite dramatically with the Ruby implementation of the same concept via open classes (somewhat less-graciously known as “Monkey Patching”):

class String
  def reduce
    arr = unpack('c*').select { |c| (65..90).include? c }
    arr.pack 'c*'
  end
end
 
puts 'HyperText Transfer Protocol'.reduce       # HTTP

No visible type conversion is taking place here, all we did is add a method to an existing class and trust that the runtime can figure out the rest.  Indeed, for this application, we don’t really need anything else.  However, as anyone with experience implementing internal domain-specific languages will tell you, seldom is life as simple as adding a few methods to an existing class.  Consider a more complicated scenario where we need to overload the < operator on integers to operate on String values, returning true if the length of the string is less than the integer value, otherwise false.  In Scala, we would once again make use of the implicit conversion mechanism, this time with an even more concise syntax:

implicit def lessThanOverload(i: Int) = new {
  def <(str: String) = str.length < i
}

In fact, we don’t even need to go this far.  It is possible to create an implicit conversion from String to Int defined on the length of the String.  This would allow existing method implementations within the Int class to operate upon String values:

implicit def str2Int(str: String) = str.length

As a matter of interest, this particular situation can be managed by one of the most convoluted and verbose languages on the market, C++:

bool operator<(const int &i, const std::string &str)
{
    return str.length() < i;
}

Despite the seemingly-dynamic nature of the problem, the statically typed language camp seems well represented in terms of solutions.  Ironically, this sort of problem is one which will be exceedingly difficult to solve in a language like Ruby.  This is primarily because method overloading is an innately static device.  That’s not to say that overloading is impossible in a dynamically typed language (Groovy), but it’s not easy.  To see why, let’s consider the most natural implementation of our operator problem in Ruby:

class Fixnum
  def <(str)
    str.size < self
  end
end

Intuitively, this may seem like the right way to approach the problem, but the results of such an implementation would be disastrous.  At the very least, the first time anyone attempted to perform a < comparison targeting an integer, the interpreter will overflow the call stack.  In fact, any time any code uses the less-than operator on an instance of Fixnum, the interpreter will crash.  The reason for this is the invocation of < upon str.size within our “overloaded” definition.  This call creates a very tight recursive loop which will very quickly eat through all available stack frames.  We can avoid this problem by reversing the comparison like so:

class Fixnum
  def <(str)
    self >= str.size
  end
end

Now we don’t have to worry about stack overflow, but in the process we have accidentally redefined integer-to-integer comparison in a very strange way:

irb(main):006:0> 123 < 'test'
=> true
irb(main):007:0> 123 < 123
=> true

Clearly, more effort is going to be required if we are to put to rest our little dilemma.  As it turns out, the final solution is surprisingly ugly and verbose:

class Fixnum
  alias_method :__old_less_than__, '<'.to_sym
  def <(target)
    if target.kind_of? String
      __old_less_than__ target.size
    else
      __old_less_than__ target
    end
  end
end

Whatever happened to Ruby as a “more elegant” language?  The unfortunate truth is that in order to emulate method overloading based on input type, we must hold onto the old method implementation while we implement a type-sensitive facade in its place.  The alias_method invocation literally copies the old less-than operator implementation and provides us with a way of referencing it within our later redefinition.  And what happens if someone else happens to monkey patch Fixnum and (for whatever reason) uses the identifier “__old_less_than__“?  Well, then we have problems.  It’s like the old days of Lisp macros and endless identifier collisions.

It is true that this was an example specifically contrived to make Ruby look bad.  I could have implemented the overload using Groovy’s meta-classes and been reasonably certain that everything would work out fine, but that’s not the point.  The point is that there are a surprising number of situations where static typing serves not only to check for errors but also to allow extension patterns which would be otherwise impossible (or very, very difficult).  Dynamic typing isn’t the panacea of extensibility that its proponents make it out to be, sometimes it isn’t quite up to the task.

In fact (and this is where we come to my Digg-friendly point), I would submit that Scala (and to a lesser extent, C++) have created a mechanism for controlled extensibility which is more powerful than Ruby’s open classes design.  That’s not to say that there aren’t situations which are easily solved using open classes and entirely intractable using only implicit conversions, but in my experience these scenarios are very rare.  In fact, I believe that it is far more common to run against a problem like my contrived overload which is greatly simplified through the use of static typing.

Ironically enough, some of Ruby’s greatest pundits are starting to come around to the belief that a more controlled and well-defined model of class extension is required.  ParseTree is a Ruby framework which provides mechanisms for dynamically manipulating the AST of an expression prior to evaluation.  Conceptually, it is very similar to Lisp’s macros and peripherally related to .NET’s expression trees (used in LINQ).  ParseTree is used by a number of complex Ruby domain-specific languages, including Ambition, a fact which is extremely telling of how great the need is for just such a tool.  Having myself attempted a domain-specific language for constructing queries, I can state categorically that to do such a thing solely on the basis of open classes would be nearly impossible.  Even if successful, such a framework would be extremely volatile, sensitive to the slightest change in the Ruby core library, either caused by update or by other packages injecting their own meddlesome implementations into runtime classes.

Lex Spoon (co-author of Programming in Scala) once said that any language which seriously targeted domain-specific languages would have to create some sort of implicit conversion mechanism.  At the time, I was skeptical, convinced that Ruby (and similar) would always have the upper-hand in the area of class extension due to their dynamic treatment of modules and classes.  However, after some serious dabbling in the field of internal domain-specific languages, I’m beginning to come ’round to his point of view.  Implicit conversions are far from a weak imitation of Scala’s dynamically typed “betters”, they are a powerful and controlled way of extending types far beyond anything which can be easily accomplished through open classes.

How Do You Apply Polyglotism?

18
Aug
2008

For the past two years or so, there has been an increasing meme across the developer blogosphere encouraging the application of the polyglot methodology.  For those of you who have been living under a rock, the idea behind polyglot programming is that each section of a given project should use whatever language happens to be most applicable to the problem in question.  This makes for a great topic for arm-chair bloggers, leading to endless pontification and flame-wars on forum after forum, but it seems to be a bit more difficult to apply in the real world.

The fact is that very few companies are open to the idea of diversity in language selection.  Just look at Google, one of the most open-minded and developer-friendly companies around.  They employ some of the smartest people I know, programmers who have actually invented languages with wide-scale adoption.  However, this same company mandates the use of a very small set of languages including Python, Java, C++ and JavaScript.  If a company like Google can’t even bring itself to dabble in language diversity, what hope do we have for the Apples of the world?

A few months ago, I received an internal email from the startup company where I work.  This email was putting forth a new policy which would restrict all future developments to one of two languages: PHP or Java.  In fact, this policy went on to push for the eventual rewrite of all legacy projects which had been written in other languages including Objective-C, Ruby, Python and a fair number of shell scripts.  I was utterly flabbergasted (to say the least).  A few swift emails later, we were able to come to a more moderate position, but the prevailing attitude remains extremely focused on minimizing the choice of languages.

To my knowledge, this sort of policy is fairly common in the industry.  Companies (particularly those employing consultants) seem to prefer to keep the technologies employed to a minimum, focusing on the least-common denominator so as to reduce the requirements for incoming developer skill sets.  This is rather distressing to me, because I get a great deal of pleasure out of solving problems differently using alternative languages.  For example, I would have loved to build the clustering system at my company using the highly-scalable actor model with Scala, but the idea was shot down right out of the gate because it involved a non-mainstream language.  To be fair to my colleagues, the overall design involved was given more serious consideration, but it was always within the confines of Java, rather than the original actor-driven concept.

There is actually another aspect to this question: assuming you are allowed to use a variety of languages to "get the job done", how do you apply them?  Ola Bini has talked about the various layers of a system, but this is harder to see in practice than it would seem.  How do you define where to "draw the line" between using Java and Scala, or even the more dramatic differences between Java and JRuby or Groovy?  Of course, we can base our decision strictly on lines of code, but in that case, Scala would trump Java every time.  For that matter, Ruby would probably beat out the two of them, and I’m certainly not writing my next large-scale enterprise app exclusively in a dynamic language.

I realize this is somewhat of a cop-out post, just asking a question and never arriving at a satisfactory conclusion, but I would really like to know how other developers approach this issue.  What criteria do you weigh in making the decision to go with a particular language?  What sorts of languages work well for which tasks?  And above all, how do you convince your boss that this is the right way to go?  The floor is open, please enlighten me!  :-)

Case Classes Are Cool

11
Aug
2008

Of all of Scala’s many features, this one has probably taken the most flack over the past year or so.  Not immutable data structures or even structural types, but rather a minor variation on a standard object-oriented construct.  This is more than a little surprising, especially considering how much work they can save when properly employed.

Quick Primer

Before we get into why they’re so nice, we should probably look at what they are and how to use them.  Syntactically, case classes are standard classes with a special modifier: case.  This modifier signals the compiler to assume certain things about the class and to define certain boiler-plate based on those assumptions.  Specifically:

  • Constructor parameters become public “fields” (Scala-style, which means that they really just have an associated accessor/mutator method pair)
  • Methods toString(), equals() and hashCode() are defined based on the constructor fields
  • A companion object containing:
    • An apply() constructor based on the class constructor
    • An extractor based on constructor fields

What this means is that we can write code like the following:

case class Person(firstName: String, lastName: String)
 
val me = Person("Daniel", "Spiewak")
val first = me.firstName
val last = me.lastName
 
if (me == Person(first, last)) {
  println("Found myself!")
  println(me)
}

The output of the above is as follows:

Found myself!
Person(Daniel,Spiewak)

Notice that we’re glossing over the issue of pattern matching and extractors for the moment.  To the regular-Joe object-oriented developer, the really interesting bits are the equals() method and the automatic conversion of the constructor parameters into fields.  Considering how many times I have built “Java Bean” classes solely for the purpose of wrapping data up in a nice neat package, it is easy to see where this sort of syntax sugar could be useful.

However, the above does deserve some qualification: the compiler hasn’t actually generated both the accessors and the mutators for the constructor fields, only the accessors.  This comes back to Scala’s convention of “immutability first”.  As we all know, Scala is more than capable of expressing standard imperative idioms with all of their mutable gore, but it tries to encourage the use of a more functional style.  In a sense, case classes are really more of a counterpart to type constructors in languages like ML or Haskell than they are to Java Beans.  Nevertheless, it is still possible to make use of the syntax sugar provided by case classes without giving up mutability:

case class Person(var firstName: String, var lastName: String)
 
val me = Person("Daniel", "Spiewak")
me.firstName = "Christopher"   // call to a mutator

By prefixing each constructor field with the var keyword, we are effectively instructing the compiler to generate a mutator as well as an accessor method.  It does require a bit more syntactic bulk than the immutable default, but it also provides more flexibility.  Note that we may also use this var-prefixed parameter syntax on standard classes to define constructor fields, but the compiler will only auto-generate an equals() (as well as hashCode() and toString()) method on a case class.

Why Are They Useful?

All of this sounds quite nice, so why are case classes so overly-maligned?  Cedric Beust, the creator of the TestNG framework, even went so far as to call case classes “…a failed experiment”.

From my understanding of Scala’s history, case classes were added in an attempt to support pattern matching, but after thinking about the consequences of the points I just gave, it’s hard for me to see case classes as anything but a failure. Not only do they fail to capture the powerful pattern matching mechanisms that Prolog and Haskell have made popular, but they are actually a step backward from an OO standpoint, something that I know Martin [Odersky] feels very strongly about and that is a full part of Scala’s mission statement.

Well, he’s right…at least as far as the pattern matching bit is involved.  Case classes are almost essential for useful pattern matching.  I say “almost” because it is possible to have pattern matching in Scala without ever using a single case class, thanks to the powerful extractors mechanism.  Case classes just provide some nice, auto-generated magic to speed things along, as well as allowing the compiler to do a bit more checking than would be otherwise possible.

The point that I think Cedric (and others) have missed entirely is that case classes are far more than just a means to get at pattern matching.  Even the most stringent object-oriented developer has to admit that a slick syntax for declaring a data container (like a bean) would be a nice thing to have.  What’s more, Scala’s automatic generation of a companion object for every case class lends itself very nicely to some convenient abstractions.  Consider a scenario I ran into a few months back:

class MainWindow(parent: Shell) extends Composite(parent, SWT.NONE) {
  private lazy val display = parent.getDisplay
 
  private val panels = Map("Foreground" -> ForegroundPanel, 
                           "Background" -> BackgroundPanel, 
                           "Font" -> FontPanel)
 
  setLayout(new FillLayout())
 
  val folder = new TabFolder(this, SWT.BORDER)
  for ((text, make) <- panels) {
    val item = new TabItem(folder, SWT.NONE)
    val panel = make(folder)
 
    item.setText(text)
    item.setControl(panel)
  }
 
  def this() = this(new Shell(new Display()))
 
  def open() {
    parent.open()
    layout()
 
    while (!parent.isDisposed) {
      if (!display.readAndDispatch()) {
        display.sleep()
      }
    }
  }
}
 
case class ForegroundPanel(parent: Composite) extends Composite(parent, SWT.NONE) {
  ...
}
 
case class BackgroundPanel(parent: Composite) extends Composite(parent, SWT.NONE) {
  ...
}
 
case class FontPanel(parent: Composite) extends Composite(parent, SWT.NONE) {
  ...
}

If you ignore the SWT boiler-plate, the really interesting bits here are the Map of panels and the initialization loop for the TabItem(s).  In essence, I am making use of a cute little trick with the companion objects of each of the panel case classes.  These objects are automatically generated by the compiler extending function type: (Composite)=>ForegroundPanel, where ForegroundPanel is replaced by the case class in question.  Because each of these classes extends Composite, the inferred type of panels will be: Map[String, (Composite)=>Composite](actually, I’m cheating a bit and not giving the precise inference, only its effective equivalent)

This definition allows the iteration over the elements of panels, generating a new instance by using the value element as a function taking a Composite and returning a new Composite instance: the desired child panel.  It’s all statically typed without giving up either the convenience of a natural configuration syntax (in the panels declaration) or the familiarity of a class definition for each panel.  This sort of thing would certainly be possible without case classes, but more work would be required on my part to properly declare each companion object by hand.

Conclusion

I think the reason that a lot of staid object-oriented developers tend to frown on case classes is their close connection to pattern matching, a more powerful relative of the much-despised switch/case mechanism.  What these developers fail to realize is that case classes are really much more than that, freeing us from the boiler-plate tyranny of endless getter/setter declarations and the manual labor of proper equals() and toString() methods.  Case classes are the object-oriented developer’s best friend, just no one seems to realize it yet.