Skip to content

The Plague of Polyglotism

28
Apr
2008

For those of you who don’t know, polyglotism is not some weird religion but actually a growing trend in the programming industry.  In essence, it is the concept that one should not be confined to a single language for a given system or even a specific application.  With polyglot programming, a single project could use dozens of different languages, each for a different task to which they are uniquely well-suited.

As a basic example, we could write a Wicket component which makes use of Ruby’s RedCloth library for working with Textile.  Because of Scala’s flexible syntax, we can use it to perform the interop between Wicket and Ruby using an internal DSL:

class TextileLabel(id:String, model:IModel) extends WebComponent(id, model) with JRuby {
  require("textile_utils")
 
  override def onComponentTagBody(stream:MarkupStream, openTag:ComponentTag) {
    replaceComponentTagBody(markupStream, openTag, 
        'textilize(model.getObject().toString()))
  }
}
# textile_utils.rb
require 'redcloth'
 
def textilize(text)
  doc = RedCloth.new text
  doc.to_html
end

Warning: Untested code

We’re actually using three languages here, even though we only have source for two of them.  The Wicket library itself is written in Java, our component is written in Scala and we work with the RedCloth library in Ruby.    This is hardly the best example of polyglotism, but it suffices for a simple illustration.  The general idea is that you would apply this concept to a more serious project and perform more significant tasks in each of the various languages.

The Bad News

This is all well and good, but there’s a glaring problem with this philosophy of design: not everyone knows every language.  You may be a language aficionado, picking up everything from Scheme to Objective-C, but it’s only a very small percentage of developers who share that passion.  Many projects are composed of developers without extensive knowledge of diverse languages.  In fact, even with a really good sampling of talent, it’s doubtful you’ll have more than one or two people fluent in more than two languages.  And unfortunately, there’s this pesky concern we all love called “maintainability”.

Let’s pretend that Slava Pestov comes into your project as a consultant and decides that he’s going to apply the polyglot programming philosophy.  He writes a good portion of your application in Java, Lisp and some language called Factor, pockets his consultant’s fee and then moves on.  Now the code he wrote may have been phenomenally well-designed and really perfect for the task, but you’re going to have a very hard time finding a developer who can maintain it.  Let’s say that six months down the road, you decide that your widget really needs a red push button, rather than a green radio selector.  Either you need a developer who knows Factor (hint: there aren’t very many), or you need a developer who’s willing to learn it.  The thing is that most developers with the knowledge and motivation to learn a language have either already done so, or are familiar enough with the base concepts as to be capable of jumping right in.  These developers fall into that limited group of people fluent in many different languages, and as such are a rare find.

Now I’m not picking on Factor in any way, it’s a very interesting language, but it still isn’t very widespread in terms of developer expertise.  That’s really what this all comes down to: developer expertise.  Every time you make a language choice, you limit the pool of developers who are even capable of groking your code.  If I decide to build an application in Java, even assuming that’s the only language I use, I have still eliminated maybe 20% of all developers from ever touching the project.  If I make the decision to use Ruby for some parts of the application, while still using Java for the others, I’ve now factored that 80% down to maybe 35% (developers who know Java and Ruby).  Once I throw in Scala, that cuts it down still further (maybe at 15% now).  If I add a fourth language – for example, Haskell – I’ve now narrowed the field so far, that it’s doubtful I’ll find anyone capable of handling all aspects within a reasonable price range.  It’s the same problem as with framework choice, except that frameworks are much easier to learn than languages.

The polyglot ideal was really devised by a bunch of nerdy folks like me.  I love languages and would like nothing better than to get paid to learn half a dozen new ones (assuming I’m coming into a project with a strange combination I haven’t seen before).  However, as I understand the industry, that’s not a common sentiment.  So a very loud minority of developers (/me waves) has managed to forge a very hot methodology, one which excludes almost all of the hard-working developer community.  If I didn’t know better, I would be tempted to say that it was a self-serving industry ploy to foster exclusivity in the job market.

I want to work on multi-language projects as much as anyone, but I really don’t think it’s the best thing right now.  I’m working on a project now which has an aspect for which Scala would be absolutely perfect, but since I’m the only developer on hand who is remotely familiar with the language, I’m probably going to end up recommending against its adoption.  Consider carefully the ramifications of trying new languages on your own projects, you may not be doing future developers any favors by going down that path.

Screencast: Introduction to the Scala Developer Tools

21
Apr
2008

Virtually everyone who has visited the Scala project page has seen the info page for the Scala plugin for Eclipse.  There are a few screenshots, an update site and very little instruction on how to proceed from there.  Those of you who have actually installed this plugin can vouch for how terribly it works as well as the remarkable lack of usefulness in its functionality.  It’s basically a very crude syntax highlighting editor for Scala embedded into Eclipse.  It has the ability to run programs and compile them within the IDE, but that’s about all.  Worse than that, it seems to make everything else about Eclipse less stable; somehow crashing random, unrelated plugins (such as DLTK).  Needless to say, it’s often a race to see how fast we can remove the Scala Eclipse plugin from our systems.

What is far less widely known is that there is a second Eclipse plugin which offers support for Scala development.  Basically, the guys at LAMP decided that it wasn’t worth trying to build out the original plugin any further.  Instead, they started from scratch and created a whole new implementation.  The result is entitled the “Scala Developer Tools” (or SDT, if you’re into short and phonetically confusing acronyms).  Basically, this plugin is a very unstable, very experimental attempt to build a first-class IDE for Scala on top of Eclipse.  Obviously, they still have a ways to go:

image

In case you were wondering, no that isn’t my default editor font.  To say the least, the plugin suffers from an annoying plethora of UI-related bugs.  Behavior is inconsistent, and often times changing a value doesn’t seem to be permanent (it took me several tries to get the syntax highlighting to stop shifting before my very eyes).  To make matters worse, it seems that installing the plugin in the first place is a bit like playing a game of hopscotch using un-anchored floats in the middle of a pool.  The update site has a nasty habit of throwing a 404 about 50% of the time.  You know what they say: if at first you don’t succeed…

The good news is that once you get the plugin installed, the preferences beaten into submission, and the UI bugs safely ignored, things become quite nice indeed.  The new editor is vastly improved over the old one, and it’s easy to see tremendous potential in the project.  Things are actually getting to a point where I would consider using the plugin rather than my current jEdit setup.

Of course, it’s hard to get a good idea of how a tool works until you see it in action.  That’s why I took the time to put together a small screencast which illustrates some of the highlights of the new editor.  I made no attempt to hide the bugs which cropped up during my testing, so this should give you a fair approximation of the current state of the plugin and whether it’s worth trying for your own projects.  The screencast has been produced at a reasonably high resolution (1024×732) in both Flash and downloadable AVI format.  Enjoy!

screencast-front

 

Defining Scala Design Idioms

14
Apr
2008

With any new language comes a certain amount of uncertainty as to what is “the right way” to do things.  Now I’m not just talking about recursion vs iteration or object-oriented vs straight procedural.  What I’m referring to is the design idioms which govern everything from naming conventions to array deconstruction.

This idioms are highly language specific and can even differ between languages with otherwise similar lexical elements.  Consider the following C++, Java, Ruby and Scala examples:

vector<string> first_names;
first_names.push_back("Daniel");
first_names.push_back("Chris");
first_names.push_back("Joseph");
 
for (vector<string>::iterator i = first_names.begin(); i != first_names.end(); ++i) {
    cout << *i << endl;
}

Java:

String[] firstNames = {"Daniel", "Chris", "Joseph"};
 
for (String name : firstNames) {
    System.out.println(name);
}

Ruby:

first_names = ['Daniel', 'Chris', 'Joseph']
 
first_names.each do |name|
  puts name
end

Scala:

val firstNames = Array("Daniel", "Chris", "Joseph")
 
for (name <- firstNames) {
  println(name)
}

As a matter of interest, the Scala example could also be shortened to the following:

val firstNames = Array("Daniel", "Chris", "Joseph")
firstNames.foreach(println)

All of these samples perform essentially the same task: traverse an array of strings and print each value to stdout.  Of course, the C++ example is actually using a vector rather than an array due to the evil nature of C/C++ arrays, but it comes to the same thing.  Passing over the differences in syntax between these four languages, what really stands out are the different ways in which the task is performed.  C++ and Java are both using iterators, while Ruby and Scala are making use of higher order functions.  Ruby and C++ both use lowercase variables separated by underscores, while Java and Scala share the camelCase convention.

This is a bit of a trivial example, but it does open the door to a much more interesting discussion: what are these idioms in Scala’s case?  Scala is a very new language which has yet to see truly wide-spread adoption.  More than that, Scala is fundamentally different from what has come before.  Certainly it draws inspiration from many languages – most notably Java, C# and Haskell – but even languages which are direct descendants can impose entirely different idioms.  Just look at the differences between the Java and the C++ examples above.  The practical, day-to-day implications of this become even more apparent when you consider object-oriented constructs:

// Book.h
class Book {
    std::string title;
    Author *author;
 
public:
    Book(string);
 
    std::string get_title();
    Author* get_author();
    void set_author(Author*);
};
 
// Book.cpp
Book::Book(string t) : title(t), author(0) {}
 
string Book::get_title() {
    return title;
}
 
Author* Book::get_author() {
    return author;
}
 
void Book::set_author(Author *a) {
    author = a;
}

The equivalent in Java:

public class Book {
    private String title;
    private Author author;
 
    public Book(String title) {
        this.title = title;
    }
 
    public String getTitle() {
        return title;
    }
 
    public Author getAuthor() {
        return author;
    }
 
    public void setAuthor(Author author) {
        this.author = author;
    }
}

And the (much shorter) Ruby code:

class Book
  attr_reader :title, :author
  attr_writer :author
 
  def initialize(title)
    @title = title
  end
end

This code uses standard, accepted idioms for class design in each of the three languages.  Notice how in C++ we avoid name-clashes between method formals and data members?  This is because the compiler tends to get a little confused if we try to combine name shadowing and the initialization syntax (the bit that follows the : in the constructor).  This contrasts strongly with Java which by convention encourages shadowing of fields by method formals.  It’s a very strong convention to do what I’ve done here, using this.fieldName to disambiguate between formals and fields.  Ruby stands apart from both of these languages by enforcing the use of prefix symbols on variable names to define the container.  There really can’t be any shadowing of instance variables by formals since all instance variables must be prefixed with the @ symbol.

The question here is: what conventions are applicable in Scala?  At first blush, it’s tempting to write a class which looks like this:

class Book(val title:String) {
  var author:Author = null
}

Because every variable/value in Scala is actually a method, the accessor/mutator pairs are already generated for us (similar to how it is in Ruby with attributes).  The problem here is of course redefining an accessor or mutator.  For example, we may want to perform a check in the author mutator to ensure that the new value is not null.  In C++ and Java, we would just add an if statement to the correct method and leave it at that.  Ruby is more interesting because of the auto-generated methods, but it’s still fairly easy to do:

class Book
  def author=(author)
    @author = author unless author.nil?
  end
end

Scala poses a different problem.  In our example above, we’ve essentially created a class with two public fields, something which is sternly frowned upon in most object-oriented languages.  If we had done this in Java, it would be impossible to implement the null check without changing the public interface of the class.  Fortunately Scala is more flexible.  Here’s a naive implementation of Book in Scala which performs the appropriate check:

class Book(val title:String) {
  private var author:Author = null
 
  def author = author
 
  def author_=(author:Author) {
    if (author != null) {
      this.author = author
    }
  }
}

Unfortunately, there are some fairly significant issues with the above code.  First off, it won’t compile.  Scala lacks separate namespaces for fields and methods, so you can’t just give a method the same name as a field and expect it to work.  This merging of namespaces actually allows a lot of interesting design and is on the whole a good thing, but it puts a serious crimp in our design.  The second problem with this example is closely related to the first.  Assuming the code did compile, at runtime execution would enter the author_= method, presumably pass the conditional and then execute the this.author = author statement.  However, the Scala compiler will interpret this line in the following way:

this.author_=(author)

That’s right, infinite recursion.  Usually this issue isn’t apparent because the compiler error will take precedence over the runtime design flaw, but it’s still something which merits notice.  Obviously, using the same names for variables as we do for methods and their formals just isn’t going to work here.  It may be the convention in Java, but we’ll have to devise something new for Scala.

Over the last few months, I’ve read a lot of Scala written by other people.  Design strategies to solve problems like these range all over the map, from totally separate names to prepending or appending characters, etc.  The community just doesn’t seem to be able to standardize on any one “right way”.  Personally, I favor the prepended underscore solution and would really like to see it become the convention, but that’s just me:

class Book(val title:String) {
  private var _author:Author = null   // notice the leading underscore
 
  def author = _author
 
  def author_=(author:Author) {
    if (author != null) {
      _author = author
    }
  }
}

The Scala community really needs to get together on this and other issues related to conventional design idioms.  I see a lot of code that’s written in Java, but with C or C++ idioms.  This sort of thing is rare nowadays, but was quite common in the early days of the language.  People weren’t sure what worked best in Java, so they tried to apply old techniques to the new syntax, often with disastrously unreadable results.  As Scala moves into the mainstream, we have to come to some sort of consensus as to what our code should look like and what conventions apply.  If we don’t, then the language will be forced to struggle through many of the same problems which plagued Java only a decade ago: new developers trying to get a handle on this foreign language, misapplying familiar constructs along the way.

Universal Type Inference is a Bad Thing

9
Apr
2008

Recently, Scala has been wowing developers with its concise syntax and powerful capabilities, but perhaps its most impressive feature is local type inference.  When the intended type for an element is obvious, the Scala compiler is able to make the inference and there is no need for any additional type annotations. 

While this is extremely useful, it falls quite a bit short of the type inference mechanisms available in other languages such as Haskell and ML.  Many people have pointed out that Scala doesn’t go as far as it could in its inference, forcing the use of type annotations when the type could easily be inferred by the compiler.  To understand this claim, it’s necessary to consider an example from a language which does have such universal type inference.  Consider the following function:

fun sum nil init = init 
 | sum (hd::tail) init = hd + (sum tail init)

For those of you not familiar with ML or its derivatives, this is a simple curried function which traverses a given list, adding the values together with an initial value given by init.  Obviously, it’s a contrived example since we could easily accomplish the above using a fold, but bear with me. 

The function has the following type signature: int list -> int -> int.  That is to say, the ML compiler is able to infer the only possible type which satisfies this function.  At no point do we actually annotate any specific nominal type.  Clarity aside, this seems like a pretty nifty language feature.  The ML compiler actually considers the whole function when inferring type, so its inferences can be that much more comprehensive.  Scala of course has type inference, but far less universal.  Consider the equivalent function to the above:

def sum(list:List[Int])(init:Int):Int = list match {
    case hd::tail => hd + sum(tail)(init)
    case Nil => init
  }

Obviously, this is a lot more verbose than the ML example.  Scala’s type inference mechanism is local only, which means that it considers the inference on an expression-by-expression basis.  Thus, the compiler has no way of inferring the type of the list parameter since it doesn’t consider the full context of the method.  Note that the compiler really couldn’t say much about this method even if it did have universal type inference because (unlike ML) Scala allows overloaded methods and operators.

Because of the relative verbosity of the Scala example when compared to ML, it’s tempting to claim that ML’s type inference is superior.  But while ML’s type inference may be more powerful than Scala’s, I think it is simultaneously more dangerous and less useful.  Let’s assume that I was trying to write the sum function from above, but I accidentally swapped the + operator for a :: on the second line:

fun sum nil init = init 
 | sum (hd::tail) init = hd :: (sum tail init)

The ML compiler is perfectly happy with this function.  Such a small distinction between the two, but in the case of the second (incorrect) function, the type signature will be inferred as the following: 'a list -> 'a list -> 'a list (the 'a notation is equivalent to the mathematical ).  We’ve gone from a curried function taking an int list and an int returning an int value to a function taking two list values with arbitrary element types and returning a new list with the same type; and all we did was change two characters.

By contrast, the same mistake in Scala will lead to a compiler error:

def sum(list:List[Int])(init:Int):Int = list match {
    case hd::tail => hd :: sum(tail)(init)    // error
    case Nil => init
  }

The error given will be due to the fact that the right-associative cons operator (::) is not defined for type Int.  In short, the compiler has figured out that we screwed up somewhere and throws an error.  This is very important, especially for more complicated functions.  I can’t tell you how many times I’ve sat and stared at a relatively short function in ML for literally hours before figuring out that the problem was in a simple typo, confusing the type inference.  Of course, ML does allow type annotations just like Scala, but it’s considered to be better practice to just allow the compiler to infer things for itself.

ML’s type inference ensures consistency, while Scala’s type inference ensures correctness.  Obviously, “correctness” is a word which gets bandied about with horrific flippancy, but I think in this case it’s merited.  The only thing that the ML type inference will guarantee is that all of your types match.  It will look through your function and ensure that everything is internally consistent.  Since both the correct and incorrect versions of our sum function are consistent, ML is fine with the result.  Scala on the other hand is more restrictive, which leads to better assurance at compile-time that what you just did was the “right thing”.  I would argue that it’s part of the responsibility of the type checker to catch typos such as the one in the example, but languages with universal type inference just can’t do this.

Type inference can be an incredibly nice feature, and it is very tempting to just assume that “more is better” and jump whole-heartedly into languages like ML.  The problem is languages which have such universal type inference don’t provide the same safety that languages with local or no type inference can provide.  It’s just too easy to make a mistake and then not realize it until the compiler throws some (apparently) unrelated error in a completely different section of the code.  In some sense, type inference weakens the type system by no longer providing the same assurance about a block of code.  It’s important to realize where to draw the line as a language designer; to realize how far is too far, and when to step back.

The “Option” Pattern

7
Apr
2008

As I’ve gotten to know Scala better, I’ve begun to appreciate its simple power in ways which have caught me by surprise.  When I picked up the language, I was expecting to be wowed by things like type inference and a more concise syntax.  I wasn’t expecting to fall in love with Option.

The Option Monad

In case you don’t know, Option is a class in the Scala core libraries.  It is what object-oriented developers would call “a simple container”.  It simply wraps around an instance of some type as specified in its type parameter.  A simple application would be in a naive integer division function:

def div(a:Int)(b:Int):Option[Int] = if (b <= 0) None
  else if (a < b) Some(0)
  else Some(1 + div(a - b)(b).get)

Pretty straightforward stuff.  This method repeatedly subtracts the dividend by the divisor until it is strictly less than the divisor.  Of course, the fact that I wrote this as a pure function using currying, recursion and complex expressions obfuscates the meaning somewhat, but you get the drift.  What’s really interesting here is the use of Option to encapsulate the result.  Here’s how we could use this method to perform some useful(?) calculations:

div(25)(5)     // => Some(5)
div(150)(2)    // => Some(75)
div(13)(4)     // => Some(3)

Nothing earth-shattering in the mathematical realm, but it provides a useful illustration.  Each return value is wrapped in an instance of class Some, which is a subclass of Option.  This doesn’t seem very useful until we consider what happens when we try to divide values which break the algorithm:

div(13)(0)     // => None
div(25)(-5)    // => None

Instead of getting an integer result wrapped in an enclosing class, we get an instance of a totally different class which doesn’t appear to encapsulate any value at all.  None is still a subclass of Option, but unlike Some it does not represent any specific value.  In fact, it would be more accurate to say that it represents the absence of a value.  This makes a lot of sense seeing as there really is no sane value for the first computation, and the second is simply incomputable with the given algorithm.

Retrieving a value from an instance of Option can be done in one of two ways.  The first technique is demonstrated in the div method itself (calling the no-args get method).  This is nice because it’s terse, but it’s not really the preferred way of doing things.  After all, what happens if the value in question is actually an instance of None?  (the answer is: Scala throws an exception)  This really doesn’t seem all that compelling as a means of encapsulating return values.  That is why pattern matching is more frequently employed:

div(13)(0) match {
  case Some(x) => println(x)
  case None => println("Problems")
}
// => prints "Problems"
 
div(25)(5) match {
  case Some(x) => println(x)
  case None => println("Problems")
}
// => prints "5"

Pattern matching allows us to deconstruct Option values in a type-safe manner without the risk of trying to access a value which really isn’t there.  Granted, the pattern matching syntax is a bit more verbose than just calling a get polymorphically, but it’s more about the principle of the thing.  It’s easy to see how this could be quite elegant in a non-trivial example.

Compared to null

This is very similar to a common pattern in C++ and Java.  Often times a method needs to return either a value or nothing, depending on various conditions.  More importantly, some internal state may be uninitialized, so a common “default” value for this state would be null.  Consider the following lazy initialization:

public class Example {
    private String value;
 
    public String getValue() {
        if (value == null) {
             value = queryDatabase();
        }
 
        return value;
    }
}
 
// ...
Example ex = new Example();
System.out.println(ex.getValue());

Well, that’s all well and good, but there’s two problems with this code.  Number one, there’s always the potential for stray null pointer exceptions.  This is certainly less of a concern in Java than it was back in the days of C and C++, but they still can be annoying.  However, let’s just assume that we’re all good programmers and we always check potentially-null values prior to use, there’s still the problem of primitive types.  Let’s change our example just a bit to see where this causes issues:

public class Example {
    private int value;
 
    public int getValue() {
        if (value == ???) {    // er...?
             value = queryDatabase();
        }
 
        return value;
    }
}
 
// ...
Example ex = new Example();
System.out.println(ex.getValue());

If you’re following along at home, your compiler will probably complain at this point saying that what you just wrote was not valid Java.  If your compiler didn’t complain, then you have some more serious issues that need to be addressed.

Primitive values cannot be valued as null because they are true primitives (an int is actually a bona-fide integer value sitting in a register somewhere at the hardware level).  This too is a holdover from the days of C and C++, but it’s something we have to deal with.  One of the consequences of this is that there is no reasonable “non-value” for primitive types.  Many people have tried clever little tricks to get around this, but most of them lead to horrible and strange results:

public class Example {
    private Integer value = null;
 
    public int getValue() {
        // forgot to init...
 
        return value;
    }
}

This code will fail at runtime with a NullPointerException oddly originating from the return statement in getValue().  I can’t tell you how many times I’ve spent hours sifting through code I thought was perfectly safe before finally isolating a stray null value which the compiler happily attempted to autobox.

It’s worth briefly mentioning that a common “non-value” for integers is something negative, but this breaks down when you can have legitimate values which fall into that range.  In short, there’s really no silver bullet within the Java language, so we have to turn elsewhere for inspiration.

Option in Java

I was actually working on an algorithm recently which required just such a solution.  In this case, the primitive value was a boolean, so there wasn’t even a conventional non-value to jump to.  I hemmed and hawed for a while before eventually deciding to implement a simple Option monad within Java.  The rest of the API is remarkably functional for something written in Java (immutable state everywhere), so I figured that a few monadic types would feel right at home.  Here’s what I came up with:

public interface Option<T> {
    public T get();
}
 
public final class Some<T> implements Option<T> {
    private final T value;
 
    public Some(T value) {
        this.value = value;
    }
 
    public T get() {
        return value;
    }
}
 
public final class None<T> implements Option<T> {
 
    public None() {}
 
    public T get() {
        throw new UnsupportedOperationException("Cannot resolve value on None");
    }
}

The usage for this code looks like this:

public class Example {
    private Option<Boolean> value = new None<Boolean>();
 
    public boolean getValue() {
        if (value instanceof None) {
            value = queryDatabase();
        }
 
        return value.get();
    }
}

Once again, Java has demonstrated how needlessly verbose and annoying its syntax can be.  In case you were wondering, the generics are necessary on None primarily because Java has such a poor type system.  Effectively, null is an untyped value which may be assigned to any class type.  Java has no concept of a Nothing type which is a subtype of anything.  Thus, there’s no way to provide a default parameterization for None and the developer must specify.

Now this is certainly not the cleanest API we could have written and it’s definitely not a very good demonstration of how monads can be applied to Java, but it gets the job done.  If you’re interested, there’s a lot of good information out there on how do do something like this better.  The point was not to create a pure monad though, the point was to create something that solved the problem at hand.

Conclusion

Once you start thinking about structuring your code to use Option in languages which have built-in support for it, you’ll find yourself dreaming about such patterns in other, less fortunate languages.  It’s really sort of bizarre how much this little device can open your mind to new possibilities.  Take my code, and give it a try in your project.  Better yet, implement something on your own which solves the problem more elegantly!  The stodgy old Java “best practices” could use a little fresh air.

P.S. Yes, I know that the original implementation of this was actually the Maybe monad in Haskell.  I picked Option instead mainly because a) I like the name better, and b) it’s Scala, so it’s far more approachable than Haskell.