Skip to content

Scala for Java Refugees Part 6: Getting Over Java

11
Feb
2008

Thus follows the sixth and final installment of my prolific “Scala for Java Refugees” series.  After this post, I will continue to write about Scala as the spirit moves, but I don’t think I’ll do another full-length series focused entirely on this one topic.  It’s a surprisingly exhausting thing to do, tying yourself to a single subject for so long.  (insert piteous moans from my keyboard)  Anyway, enough of my whining…

To be honest, I’ve been looking forward to this article from day one of the series.  This is the article where we get to open the door on all sorts of wonderful Scala-specific goodies.  So far, the focus has been mostly on areas where Scala’s semantics more-or-less parity Java’s.  In this article, we’ll look at some of the many ways in which Scala surpasses its lineage.  It’s time to get over that old girlfriend of yours and join me in the new tomorrow!

Class Extensions

There has been some chit-chat around the Java communal fireplace talking about adding class extensions to Java 7.  The basic idea is that classes need not have fixed members, but that methods can be weaved into the class or instance depending on imports.  This is similar to the concept of “open classes” supported by highly dynamic languages like Ruby:

class String
  def print_self
    puts self
  end
end
 
"Daniel Spiewak".print_self    # prints my name

This funky looking sample is actually adding a new method to the String class (the one already defined by the language) and making it available to all instances of String in the defining scope.  Actually, once this class definition executes, the print_self method will be available to all instances of String within any scope, but let’s not be confusing.

Obviously, Java class extensions have to be a bit more controlled.  Things are statically typed, and what’s more there are some hard and fast rules about namespaces and fully-qualified class names.  The compiler will actually prevent me from creating a class with the same fully qualified name as another.  The main proposal seems to be some sort of variation on static imports, with the use case being things like the Collections.sort(List) method.

As one would expect from a language not tied to such heavy legacy baggage, Scala has managed to solve the problem of class extensions in a very elegant (and type-safe) way.  Actually, the solution took a lot of inspiration from C#, but that’s not important right now.  The ultimate answer to the problem of class extensions is…implicit type conversions.

Scala allows you to define methods which take a value of a certain type in as a parameter and return a value of a different type as the result.  This in and of itself isn’t so unique until you add the real magic.  By declaring this method to be implicit (which is a modifier keyword), you tell the compiler to automatically use this conversion method in situations where a value of type A is called for but a value of type B was passed.

Maybe an example would clear this up:

implicit def str2int(str:String):Int = Integer.parseInt(str)
def addTwo(a:Int, b:Int) = a + b
 
addTwo("123", 456)

Notice the type “error” in the final line.  With this call, we are passing a String value to a method which is expecting an Int.  Normally this is frowned upon, but before the compiler throws a fit, it takes one last look around and discovers the str2int method.  This method takes a String, returns an Int and most importantly is declared implicit.  The compiler makes the assumption that however this method works, it somehow converts arbitrary String values into Ints.  With this bit of knowledge in hand, it is able to implicitly insert the method call to str2int into the output binary, causing this sample to compile and return 579 as the final value.

Now if that were all implicit type conversions were capable of, they would still be pretty amazing.  But fortunately for us, the cleverness doesn’t end there.  The Scala compiler is also capable of intelligently finding the type you need given the context; more intelligently than just relying on assignment or method parameter type.  This is where implicit type conversions become the enabling factor for extension methods.

Let’s imagine that I want to duplicate my Ruby example above in pure Scala.  My end goal is to “add” the printSelf() method to the String class.  This method should be usable from any String instances within the enclosing scope (so enabling the literal/call syntax we had in Ruby).  To accomplish these ends, we’re going to need two things: a composing class containing the extension method and an implicit type conversion.  Observe:

class MyString(str:String) {
  def printSelf() {
    println(str)
  }
}
 
implicit def str2mystring(str:String) = new MyString(str)
 
"Daniel Spiewak".printSelf()

I’d call that powerful.  Here the compiler sees that String does not declare the printSelf() method.  Once again, before it blows up it looks around for an implicit type conversion which might yield something with a printSelf() method.  Conveniently enough, there is just such a conversion method declared in scope.  The compiler adds the call, and we’re none-the-wiser.  As far as we’re concerned, we just called the printSelf() method on a String literal, when actually we were invoking the method on a composing instance which wraps around String.

Implicit conversion methods are just that, fully-functional methods.  There are no limitations (that I know of) on what you can do in these methods as opposed to “normal” methods.  This allows for conversions of arbitrary complexity (though most are usually quite simple).  Oh, and you should note that the compiler checks for conversions solely based on type information, the name is not significant.  The convention I used here (typea2typeb) is just that, a convention.  You can call your implicit conversion methods whatever you feel like.

Operator Overloading

Moving right along in our whirl-wind tour of random Scala coolness, we come to the forgotten island of operator overloading.  Founded by mathematicians accustomed to dealing with different operational semantics for identical operators, operator overloading was abandoned years ago by the developer community after the fiasco that was/is C++.  Until I saw Scala, I had assumed that the technique had gone the way of lazy evaluation (another Scala feature) and pointer arithmetic. 

Some languages like Ruby support operator overloading in a very limited way, but even they tend to discourage it for all but the most hard-core use cases.  Scala on the other hand is really much closer to how mathematicians envisioned operator overloading in Turing-complete languages.  The distinction is simple: in Scala, method names can contain arbitrary symbols.

This may seem like a trivial point, but it turns out to be very powerful.  One of the leading problems with operator overloading in languages like C++ and Ruby is that you cannot define new operators.  You have a limited set of operators with hard-coded call semantics (less-so in Ruby).  These operators may be overloaded within carefully defined boundaries, but that’s all.  Neither Ruby nor C++ succeed in elevating operator overloading to the level of a generally useful technique.

Scala avoids this trap by lifting the restriction against arbitrary operators.  In Scala, you can call your operators whatever you want because there is no special logic for dealing with them hard-coded into the compiler.  Little things like * precedence over + and so on are hard-coded, but the important stuff remains flexible.

So let’s imagine that I wanted to define an insertion operator in Scala similar to the infamous << in C++.  I could go about it in this way:

import java.io.PrintStream
 
implicit def ps2richps(ps:PrintStream) = new RichPrintStream(ps)
class RichPrintStream(ps:PrintStream) {
  // method with a symbolic name
  def <<(a:Any) = {
    ps.print(a.toString())
    ps.flush()
    ps
  }
}
 
val endl = '\n'
 
System.out << "Daniel" << ' ' << "Spiewak" << endl

Wow!  We can actually write code as ugly as C++ even in fancy, new-fangled languages!  Obviously we have an implicit type conversion here (see above if you weren’t paying attention the first time).  More interesting is the <<(Any) method declared within the RichPrintStream class.  This is actually a proper method.  There’s no magic associated with it nor any funky limitations to bite you in the behind when you least expect it.

Looking down a bit further in the code, we see the “nicely” chained PrintStream invocations using the <<(Any) method and the implicit conversion from PrintStream to RichPrintStream.  It may not look like it, but these are actually method calls just like the block-standard var.method(params) syntax.  The line could just as easily have looked like this:

System.out.<<("Daniel").<<(' ').<<("Spiewak").<<(endl)

Of course, I’m not sure why we would prefer the second syntax as opposed to the first.  This just illustrates the flexible nature of Scala’s invocation syntax.  You can actually extend this concept to other methods.  For example:

class Factory {
  def construct(str:String) = "Boo: " + str
}
 
val fac = new Factory()
 
fac construct "Daniel"
// is the same as...
fac.construct("Daniel")

With methods which only take a single parameter, Scala allows the developer to replace the . with a space and omit the parentheses, enabling the operator syntax shown in our insertion operator example.  This syntax is used in other places in the Scala API, such as constructing Range instances:

val firstTen:Range = 0 to 9

Here again, to(Int) is a vanilla method declared inside a class (there’s actually some more implicit type conversions here, but you get the drift).

Tuples

Mathematics defines a structure such that 2 or more values are contained in an ordered list of n dimension (where n is the number of values in the “list”).  This construct is called an n-tuple (or just “tuple”).  This is obviously a construct which is easily emulated in code through the use of an array or similar.  However the syntax for such constructions has always been bulky and unweildy, eliminating raw tuples from the stock toolset of most developers.  Shame, really.

Tuples are fundamentally a way of pairing discrete pieces of data in some sort of meaningful way.  Theoretically, they can be applied to many different scenarios such as returning multiple values from a method or examining key-value pairs from a map as a single, composite entity.  Really the only thing preventing programmers from exploiting the power of such simple constructs is the lack of an equivalently simple syntax.  At least, until now…

val keyValue = ("S123 Phoney Ln", "Daniel Spiewak")
 
println(keyValue._1)   // S123 Phoney Ln
println(keyValue._2)   // Daniel Spiewak

In this example, keyValue is a so-called 2-tuple.  Scala declares such values to be of type (String, String).  And yes, that is an actual type which you can use in declarations, type parameters, even class inheritance.  Under the surface, the (String, String) type syntax is actually a bit of syntax sugar wrapping around the Tuple2[String, String] type.  In fact, there are 22 different “n-tuple types” declared by Scala, one for each value of n up to (and including) 22.

Tuples don’t have to be all the same type either.  Here are a few tuples mapping between integer literals and their String literal equivalents:

val tuple1 = (1, "1")
val tuple2 = (2, "2")
val tuple3 = (3, "3")
 
val (i, str) = tuple1
 
println(i)     // 1
println(str)   // "1"

What ho!  The simultaneous declaration of values i and str is another bit of Scala syntax sugar one can use in dealing with tuples.  As expected, i gets the first value in the tuple, str gets the second.  Scala’s type-inference mechanism kicks in here and infers i to be of type Int and str to be of type String.  This is inferred from the type of the tuple1 value, which is (Int, String).

So what are they good for?  Well it turns out Scala allows you to put tuples to good use in a lot of ways.  For example, returning multiple values from a method:

class Circle {
  private val radius = 3
 
  def center():(Int, Int) = {
    var x = 0
    var y = 0
    // ...
    (x, y)
  }
}

Scala has no need for clumsy wrappers like Java’s Point class.  Effectively, the center() method is returning two separate values, paired using a tuple.  This example also showcases how we can use the tuple type syntax to specify explicit types for methods, variables and such.

The Map API can also benefit from a little tuple love.  After all, what are maps but effective sets of key-value tuples?  This next example shows tuples in two places, both the map iterator and the Map() initialization syntax:

val tastiness = Map("Apple" -> 5, "Pear" -> 3, "Orange" -> 8, "Mango" -> 7, "Pineapple" -> 8)
 
println("On a scale from 1-10:")
tastiness.foreach { tuple:(String, Int) =>
  val (fruit, value) = tuple
 
  println("    " + fruit + " : " + value)
}

Remember our old friend foreach?  (think back to the first article)  We’ll look at the semantics of this a bit more in a second, but the important thing to focus on here is the map initialization syntax.  Scala defines Map as an object with an apply() method taking a varargs array of tuples.  Got that?

The declaration for the object with just this method might look like this:

object Map {
  def apply[A,B](tuples:(A, B)*):Map[A,B] = {
    val back = new HashMap[A,B]
    tuples.foreach(back.+=)    // iterate over the tuple Array and add to back
    back
  }
}

The apply() method is going to return a new Map parameterized against whatever the type of the specified tuples happens to be (in this case String and Int).  The * character on the end of the parameter type just specifies the parameter as varargs, similar to Java’s “…” notation.

So all the way back to our tastiness example, the first line could be read as: declare a new value tastiness and assign it the return value from the expression Map.apply(…) where the parameter is an array of tuples.  The overloaded -> operator is just another way of declaring a tuple in code, similar to the (valueA, valueB) syntax we saw earlier.

Higher-Order Functions

Contrary to popular opinion, the term “higher-order function” doesn’t refer to some sort of elitist club to which you must gain entrance before you can understand.  I know it may seem that way sometimes, but trust me when I say that higher-order functions are really quite easy and surprisingly useful.

Taking a few steps back (so to speak), it’s worth pointing out that any Java developer with a modicum of experience has employed the patterns allowed by higher-order functions, knowingly or unknowingly.  For example, this is how you declare listeners on a JButton using Swing:

JButton button = new JButton("Push Me");
button.addActionListener(new ActionListener() {
    public void actionPerformed(ActionEvent e) {
        System.out.println("You pushed me!");
    }
});
add(button);

This example passes an instance of an anonymous inner class to the addActionListener() method.  The sole purpose of this inner class is to encapsulate the actionPerformed(ActionEvent) method in an object which can be passed around.  Effectively, this pattern is a form of higher-order function.  addActionListener() accepts a single argument (called a functional) which is itself a function delegate encapsulating a block of statements (in this case, one println()).

Of course, this isn’t really a higher-order function since Java doesn’t allow functional values.  You can’t just pass a method to another method and expect something to happen (other than a compiler error).  This sort of anonymous inner class delegate instance pattern is really like a distant cousin to proper functionals.

Let’s assume for one blissful moment that we could rewrite Swing to take full advantage of Scala’s syntax.  Let’s pretend that we changed the addActionListener() method so that it actually would accept a true functional as the parameter, rather than this ActionListener garbage.  The above example could then condense down to something like this:

val button = new JButton("Push Me")
button.addActionListener((e:ActionEvent) => {
  println("You pushed me!")
})
add(button)

Instead of a bulky anonymous inner class wrapping around our block of statements, we pass an anonymous method (a method without a name declared in-place similar to anonymous inner classes).  This method takes a single parameter of type ActionEvent and when called performs a simple println().  It is effectively the same as the Java example, except with one tenth the boiler-plate.

We can actually condense this example down even farther.  We can take advantage of some of the flexibility in Scala’s syntax when dealing with function parameters and remove some of those nasty parentheses (after all, it’s Scala, not LISP):

val button = new JButton("Push Me")
button.addActionListener { e:ActionEvent =>
  println("You pushed me!")
}
add(button)

Concise and intuitive, with no nasty surprises like only being able to access final variables (Scala anonymous methods can access any variable/value within its enclosing scope).  In fact, what we have here is currently the focus of a great deal of controversy within the Java language community.  This, dear friends, is a closure.

Wikipedia’s definition falls a little bit short in terms of clarity, so let me summarize: a closure is exactly what it looks like, a block of code embedded within an enclosing block which logically represents a function (or method, the terms are roughly analogous).  This is the type of construct which people like Neal Gafter are pushing for inclusion into Java 7.  This addition would enable code similar to the above Scala example to be written in pure Java.

Most of the closures proposals though have a single, overwhelming point of opposition: cryptic syntax.  As I’ve said many times, Java is tied to a great deal of legacy baggage, especially syntactically.  This baggage prevents it from evolving naturally beyond a certain point.  Scala on the other hand has virtually no history, thus the designers were able to create a clean, well-considered syntax which reflects the needs of most developers.  You’ve seen how Scala allows you to declare and pass functionals, but what about the receiving end?  Does the syntax bulk up under the surface?

Here’s a simple example which iterates over an array, calling a functional for each element:

def iterate(array:Array[String], fun:(String)=>Unit) = {
  for (i <- 0 to (array.length - 1)) {    // anti-idiom array iteration
    fun(array(i))
  }
}
 
val a = Array("Daniel", "Chris", "Joseph", "Renee")
iterate(a, (s:String) => println(s))

See?  The syntax is so natural you almost miss it.  Starting at the top, we look at the type of the fun parameter and we see the (type1, …)=>returnType syntax which indicates a functional type.  In this case, fun will be a functional which takes a single parameter of type String and returns Unit (effectively void, so anything at all).  Two lines down in the function, we see the syntax for actually invoking the functional.  fun is treated just as if it were a method available within the scope, the call syntax is identical.  Veterans of the C/C++ dark-ages will recognize this syntax as being reminiscent of how function pointers were handled back-in-the-day.  The difference is, no memory leaks to worry about, and no over-verbosity introduced by too many star symbols.

At the bottom of the example, we see another (slightly different) syntax for specifying an anonymous method.  In this case, the method is just a single expression, so we don’t need all the cruft entailed by a proper block.  So we drop the braces altogether and instead write the method on a single line, declaring parameters and handling them within.

We’re not done though.  Scala provides still more flexibility in the syntax for these higher-order function things.  In the iterate invocation, we’re creating an entire anonymous method just to make another call to the println(String) method.  Considering println(String) is itself a method which takes a String and returns Unit, one would think we could compress this down a bit.  As it turns out, we can:

iterate(a, println)

By omitting the parentheses and just specifying the method name, we’re telling the Scala compiler that we want to use println as a functional value, passing it to the iterate method.  Thus instead of creating a new method just to handle a single set of calls, we pass in an old method which already does what we want.  This is a pattern commonly seen in C and C++.  In fact, the syntax for passing a function as a functional value is precisely the same.  Seems that some things never change…

Now there is one outstanding dilemma here that the attentive will have picked up on: what about println() (accepting no parameters)?  Of course Scala allows zero-arg method invocations to optionally omit the parameters for brevity’s sake.  What’s to prevent the compiler from assuming that instead of wanting the value of println(String) as a functional, perhaps we actually want the return value of println().  Well the answer is that the Scala compiler is very smart.  It has no trouble with this particular sample in differentiating between the different cases and choosing the unambiguous answer.

But assuming that the compiler couldn’t figure it out, there’s still a syntax to force the compiler to accept a method name as a functional rather than an actual invocation (Scala calls these “partially applied functions”):

iterate(a, println _)

That dangling underscore there is not a weird typo introduced by WordPress.  No, it’s actually a weird construct introduced by Martin Odersky.  This underscore (preceded by a method name and a non-optional space) tells the compiler to look at println as a functional, rather than a method to be invoked.  Whenever you’re in doubt about whether you’re semantically passing a functional or a return value, try throwing in the underscore suffix.  If you can’t figure it out, the compiler probably can’t either.

I could go on talking about higher-order functions for days (and many people have), but I think I’ll just close with one final note.  A lot of features throughout the Scala API are designed as higher-order functions.  foreach(), the standard mechanism for iterating over any Iterable, is an excellent example of this:

val people = Array("Daniel", "Chris", "Joseph", "Renee")
 
people.foreach { name:String =>
  println("Person: " + name)
}

This is the idiomatic way to loop through an array in Scala.  In fact, as I said this is the idiom for looping through anything in Scala which is potentially “loopable”.  As you can now see, this is in fact a higher-order function taking an anonymous method as a parameter which it then calls once for each element in the array.  This makes sense from a logical standpoint.  After all, which is more “componentized”: manually managing a loop over a range of values, or asking the array for each value in turn?

So Long, Farewell…

That about wraps it up for my introductory series on Scala.  I certainly hope this set of articles was sufficient information to get you on your feet in this tremendously powerful new language.

If you’re like me, something like this series will only whet your appetite (or dampen your spirits to the point of manic despair).  I strongly suggest you read Alex Blewitt’s excellent introduction to Scala (if you haven’t already).  Much of the material he talks about was covered in an article in this series, but he provides a different perspective and a degree of insight which is valuable in learning a new language.  There is also a wiki for the Scala language.  It has a frustrating lack of information on some (seemingly arbitrary) topics, but it can often be a source of explanation and usage examples that cannot be found elsewhere.

On a more “hard core” level, I have found the scaladoc API for the Scala runtime to be an invaluable resource in my own projects.  Finally, when all else fails, there’s always the official Scala documentation.  Included with this package is the (very heavy) Scala tour, which doesn’t seem to be linked from anywhere except the Nabble mailing-list archive.

I leave you with this parting thought:  You’ve seen Scala, how it works, the benefits it can bring and the total transparency of its interop with Java.  If you haven’t at least tried this language first hand, trust me, you’re missing out.

The End of the Ruby Fad?

23
Jan
2008

Well, it’s a new year; and apparently no sooner are the resolutions forgotten and the hangovers behind us then the internet en mass decides that we need a new language.  Ruby was indisputably the hip language of 2006 and 2007.  However, in an opinion shift so sudden as to make one’s head spin, the blogosphere seems to have rebelled against the hype and gone in search of a new mistress.

It seems more and more these days like people just don’t want to hear about Ruby.  Ruby posts to link sites like DZone or Reddit get voted down before they have a chance to see the light of day.  Pointless flames litter the blogs, declaiming Ruby and alternatively crowning Groovy, Scala, Java or even XML in its place.  The sad thing is that no one seems to have found the middle ground yet.

Personally, I’m with Reganwald on this one.  I started coding with Ruby back in 2002 (or was it 2001?  I can’t remember now).  It was actually at the recommendation of some random guy on a forum who said that Ruby was a nice and clean language with a lot of potential.  I was getting pretty tired of Java at that point, so I figured I’d give it a try.  Since then, Ruby has become part of my essential scripting toolset, finding applications everywhere from complex utility scripts, build systems and even hacky dynamic web pages for my server monitoring tools.  Based on the strength of the language, I’ve tried Rails a few times without success.  I mean, honestly which is easier to remember?

link_to :page => ""

…or,

<a href=""></a>

The middle ground is really where Ruby belongs, where it flourishes.  It’s hardly a general-purpose language, so it could never replace Java and company.  With that said, it’s far easier to write an incremental backup script in Ruby than in Java.  And while Ruby may not be suitable for an enterprise level, high-traffic web application, it’s certainly up for some tasks within that application.  It’s also perfect for managing scripting and rapid prototyping against that application infrastructure.  Unfortunately the community as a whole seems either blind to its benefits or blinded by its hype.

It seems like it’s constantly an “all or nothing” attitude with these new languages.  Developers these days fall into two camps: those who have heard the hype and rejected what it stands for, and developers who are totally carried away by the emotion of the fad.  To the former camp, developers who straddle the middle ground are traitors to the cause and just as bad as the hyper fanatics.  But to the fanatics, the moderates are fence riders who refuse to fully embrace their destiny.  In short, it’s the moderates who catch flack from both camps.  Yet ironically, it’s the moderates who seem to be doing useful things with the technology, rather than wasting energy on five minute blog demos and eloquent rebuttals.

So on one hand, I’m glad to see the hype die.  It was frustrating having to deal with yet-another bigoted “Rails Rocks, Java Suxz” rant every time I opened my RSS.  On the flip side, the backlash is equally annoying.  It certainly would be nice to have some balance around here, instead of breaking out the petroleum sulfite every time someone accidentally expresses an opinion.  Perhaps now that the bubble has burst, we’ll finally get to see the popularity of Ruby in its proper place.

You Should Be Excited About Java on Mac

3
Dec
2007

I realize the very last thing I said about Java on the Mac was extremely negative, and I think that still holds.  Apple screwed up, big time.  What I’m talking about is a grass-roots effort to port Sun’s FreeBSD version of Java 6 over to MacOS X.  I’m talking about SoyLatte.

For those of you playing catch-up, SoyLatte was started by a bright guy named Landon Fuller with the expressed purpose of providing a Java 6 implementation for MacOS X 10.5 (Leopard).  It does this by availing itself of Mac’s BSD roots.  Because Darwin is basically a fork of the FreeBSD kernel, many BSD-based application can be easily ported to run in some form on MacOS X.  Sometimes all it takes is a recompilation linking against different libraries.  Obviously a full blown JVM is quite a bit more complicated than GNUChess, but the principle is the same.

The key phrase here is “run in some form”.  Mac applications are legendary for integration, sophistication and smoothness.  I have to admit that this reputation is well merited.  As a credit to Apple’s work on its (now outdated) JVM, this integration even extends to many applications written in Java.  Swing applications on Mac look native (because they’re using native Cocoa widgets, even more native than SWT’s Carbon implementation).  Java applications are also fully AppleScriptable, have access to core services like the application menu, services, file associations, the dock; the list just goes on and on.  This kind of tight integration is exactly what James Gosling was talking about when he said that Apple wanted to do the Java port on Mac themselves.  This kind of tight integration is very difficult for third-parties to accomplish.

Just to choose a comparatively trivial example, consider Swing (I did say comparatively).  Swing/AWT on Mac is peered by Cocoa and extremely performant, a trick that Sun failed to turn on Windows for half a dozen releases (partially succeeding in Java 6).  Swing/AWT on FreeBSD is backed by X11 and can be pseudo-peered using GTK+ widgets.  Now Apple does have a version of X11 bundled with MacOS X, and GTK applications do appear passably Mac-ish, but it’s still not the environment to which Mac users are accustomed.  Even solving this non-blocker issue will require thousands of man-hours and some really clever engineering.

But the point is: it’s happening.  There’s so much momentum behind this project it’s unbelievable.  LandonF is rapidly becoming the equivalent of a blogger household name (think “DHH”), and people are practically lining up to get in on the action.  Now I honestly don’t know how much contributor interest the project is seeing - there are quite a few hoops to jump through - but I know the attention from the community has been staggering.  It’s enough to make me want to buy a Mac just to help out.  :-)

What we’re seeing is virtually unprecedented.  The open-source community is taking a version of Java and independently porting it to another operating system.  Yes in the past we’ve seen projects like GNU Classpath and Harmony which have done clean-room Java implementations, but that’s a totally different problem.  Neither GNU Classpath or Harmony are JCK certified, which means that they’re technically not “Java”.  If Landon and company succeed in getting the FreeBSD Java 6 fully ported to Mac, they probably won’t face this issue.  Because it’s basically Sun’s implementation, and thanks to the new community control over the JCK certification process, SoyLatte could be the first third-party Java distribution to actually be “Java”.  This makes it a viable alternative for developers and (more importantly) big companies which are gun-shy on third-party ports.

So this is an incredibly exciting piece of work.  SoyLatte may be one of the most significant open-source projects in the last few years, and we’re seeing it unfold right before our eyes.  If you have a Mac and any extensive knowledge of Java, you should really consider helping out.  Here’s your chance to do something really significant for the future of Java on one of the most rapidly growing platforms on the market.  Those of us on the sidelines can only sit cheer, marveling and history in the making and admiring the power of the community.

In Search of a Better Build System

26
Nov
2007

There’s a consistent problem with developing applications of any reasonable size, a problem which has dated back even before the early days of C.  The problem is that any application of significance will be composed of several source files.  In fact, reasonable applications are often found to be composed of thousands if not hundreds of thousands of files.  Back in the day, it was felt (for some reason which escapes me) that it would be poor practice to type “gcc -Wall -o filename.o filename.cpp” several thousand times every time the app needed to be recompiled.

So from very early on, developers have been writing tools to aid in the build process.  Some of these tools (most of them) were somewhat ad hoc and specialized.  The most common example which springs to mind is a simple script, which handles the compilation:

#!/bin/sh
 
for f in *.c; do
  name=`echo $f | sed 's/.c//'`
  gcc -Wall -o ${name}.o ${name}.c
done

The limitations of such an approach should be obvious.  For one thing, you can only use this script on a single directory which contains all source files.  This is very rarely the case.  More importantly, there is no linking or dependency checking taking place.  This means that with the exception of very simple applications, this script will outright fail every time.  Of course, you could modify the script extensively to hard-code the dependency information, check for file modification, etc.  However, this would be a long, dull process which would have to be repeated for every application you write.  Not a very productive way to spend your time.

And so was born make.  Make has a number of advantages over the hand-scripted method.  It allows for (fairly) easy dependency specification, it will only compile modified files, it lets you ensure everything happens in the proper order, it’s more maintainable, etc.  However at its core, Make is almost exactly a wrapper around the hand-script method.  As such, it suffers from many of the same limitations, such as a cryptic syntax and a dependence on the underlying shell.  Make is a far cry from hand scripting everything, but it’s hardly the silver bullet developers were looking for.

So as the years went by, more and more solutions were devised, though few of them caught on to the extent that Make had.  To this day, Make is still the de facto standard for C and C++ build systems.  Its dominance is so pervasive that I have even found Java applications which are built using Make, though thankfully these are far and few between.  With Java on the scene, attention turned to a new effort which attempted to unseat Make as the reigning champion of the build tools: Ant.

Ant based its syntax on XML, breaking completely with Make’s bash roots and focusing on the task rather than the dependency.  Because of this clean break, and due to the fact that the Ant interpreter itself is written in Java, Ant is entirely platform agnostic.  An Ant build script written for a Java project can be run on any platform, anywhere (as long as Java is installed).  This immediately gave it a huge boost over Make as it finally enabled developers on platforms such as Windows and MacOS 9 (and earlier), platforms without the advantages afforded by a real shell.  Ant’s rise to dominance in the field of Java build systems was so rapid and so completely unchallenged that it still remains the “proper” way to build a Java application.  Every Java developer has Ant installed, and as such it has become something of a lingua franca in build script land.

Unfortunately Ant, like Make suffers from a number of shortcomings.  Its XML syntax for one, while instantly recognizable and familiar to 99% of developers on this planet, poses problems with verbosity and expressiveness.  For example, Ant doesn’t provide any real mechanism for variables, bona fide procedures or any way to execute arbitrary code without a clean break into Java (a custom Ant Task).  While this tends to keep build scripts somewhat uniform and easily understandable (when you’ve seen one build.xml, you’ve seen them all), it also forms a crippling limitation in many ways.  I’ve used Ant a lot in my time, and let me tell you it can be a real pain.  For simple builds (javac a bunch of source files, copy one or two resources, zip the result, etc) it’s quite sufficient, but headaches set in when dealing with anything complex like chained builds, subprojects or library dependency management.  You can make it work, but it’s not pretty.

The Maven project was started to try to address some of these problems (among other things).  Maven provides full-stack dependancy management (even resolving and downloading third-party libraries), build management, conventions enforcement, IDE interop and so on.  A number of people would say that Ant is completely superceded by Maven, and that Maven is the only way to go for any new Java project.  Unfortunately, like so many successful Java projects of its day (a few examples Spring to mind), Maven refused to maintain its focus.  Instead of being an incredibly simple build system and dependency management tool, Maven has tried to branch out and become the all-encompassing tool to solve everything.  I know I haven’t even scratched the surface of its capabilities in my limited exposure, but I can say that I’ve seen enough.  Maven is amazing, but way way to invasive for my tastes.  It has a knack of making the simple things cryptic, the hard things harder and the complex things impossible.  (by impossible I mean without resorting to hackery like invoking Ant from within Maven or calling out to a shell script)  Now I realize flame wars have been fought on this very subject, but I have to conclude that Maven is just too much and too overwhelming for easy use (and hence, wide adoption).

Fortunately for me, the rise of dynamic languages has brought about some wider options.  Ruby in particular has become a favorite for many different build tool projects (Rake, Raven, etc).  Most interesting is the effort underway to provide a “non-sucky Maven”.  The Buildr project, currently in incubation at Apache, is basically seeking to be a build system which enables trivial application of the most common case (builds and dependencies), as well as the flexibility of a Turing-complete language (Ruby) to make possible just about any build task, no matter how esoteric.

Buildr’s promise is indeed alluring, and at first glance it seems to deliver.  The DSL syntax of the build file is intuitive and easy to grasp.  With this it bundles the full power of Maven, allowing it to be a drop-in replacement for any pre-existing Maven project.  Well, almost.  Buildr doesn’t allow for things like dependency checkout from a source code repository.  It also retains one of Maven’s biggest failings in that it unduly enforces a rigid directory structure.  While it is possible to override this restriction, Buildr’s documentation isn’t exactly clear on how, and to be honest I still haven’t figured out how to get some things working.  Buildr is promising, but not perfect.

Another flaw suffered by all of the new, “avant garde” build tools is that not all of them can be expected on every developer’s machine.  Back in the days of Make and Ant, every developer knew that every other developer could handle a build.xml file and use it to get a fully functional build out the other end.  Unfortunately, while Maven has made tremendous strides in popularity, it is still no where near as ubiquitous as Ant.  Buildr is even less common, additionally requiring the separate installation of a full Ruby runtime, as well as the “buildr” gem.  These considerations are less significant for a small commercial product, where all of the developers are in close contact and outside interference is rare.  However, for the open-source project, standardization in the required tools is critical, otherwise new developers would have no way to contribute.

Unfortunately it’s a bit of a Catch-22.  Even assuming Buildr manages to make good on its promise of being “a build system that doesn’t suck”, it has to gain in popularity before it will be accepted as the de facto standard for Java project builds.  But to gain popularity, Buildr must be accepted by the community.  It’s a tightly knit, closed circle driven by managers remaining content with “the way it’s always been done” and developers refusing to chance the success of their project on the hope that all parties concerned will also show forward thinking in their tool set.  I really don’t envy the Buildr project leads in their task to promote their tool.

So where does this leave me?  Practically speaking, I’m still going to stick with Ant.  Buildr may be interesting, and Maven may be powerful, but neither is the standard yet.  Perhaps even more importantly: I’m lazy.  I know Ant.  I know it very well and I would have to see some pretty clear benefits (and an easy introductory road) to switch my build system of choice.  Right now, I don’t see either of those.  Maven is infamous for having a very steep learning curve.  And as I mentioned before, Buildr’s documentation leaves something to be desired.

I want to help usher in the next era of Java development as much as the next guy.  But I’m not willing to sacrifice the now for the sake of a future which may take a totally different form when it arrives.  So I will (reluctantly) stick with my old standby.  Perhaps someday I’ll come across a build tool which really impresses me enough to make me switch.  Until then you can continue to listen to me whine and complain about the difficulties of whipping Ant into shape.  How lucky for both of us.  :-S

SWT Cocoa Call for Volunteers

17
Nov
2007

Steve Northover (father of SWT) posted today on his blog news that the SWT team was looking for help with the SWT Cocoa port.  As far as I know, they’ve been working on this for about two weeks, and they’ve already managed to get Eclipse to start - but only through the use of not-so-subtle hackery.

Unfortunately for us (the unwary users), Steve says that they’ve run out of time to work on the Cocoa port, given the pace of the 3.4 stream and its rapidly approaching feature freeze (only a few months away).  A Cocoa port of SWT is something that would benefit everyone, but most especially those who use Eclipse on Mac.  At the risk of sounding melodramatic, it’s probably the most important SWT effort currently in progress.  Apple’s been pretty clear that Cocoa is the future of UI on the Mac.  Most pundits agree that if Carbon isn’t gone in the next MacOS X release, it certainly will be in the one after that.  It is absolutely vital to the future of Eclipse that this port is carried through at the highest caliber.

So, if you’re an Objective-C maven, or just want to help out on a very interesting project, ring up the developers on the platform-swt-dev mailing-list.  I’d help out myself, but I know nothing about Objective-C.  SWT’s a great project to contribute to, even if you don’t completely agree with its purpose.  May the hacking begin!