Skip to content

The Need for a Common Compiler Framework

23
Jun
2008

In recent years, we have seen a dramatic rise in the number of languages used in mainstream projects.  In particular, languages which run on the JVM or CLR have become quite popular (probably because sane people hate dealing with x86 assembly).  Naturally, such languages prefer to interoperate with other languages built on these core platforms, particularly Java and C# (respectively).  Collectively, years of effort have been put into devising and implementing better ways of working with libraries written in these “parent languages”.  The problem is that such efforts are crippled by one fundamental limitation: circular dependencies.

Let’s take Scala as an example.  Of all of the JVM languages, this one probably has the potential for the tightest integration with Java.  Even Groovy, which is renowned for its integration, still falls short in many key areas.  (generics, anyone?)  With Scala, every class is a Java class, every method is a Java method, and there is no API which cannot be accessed from Java as natively as any other.  For example, I can write a simple linked list implementation in Scala and then use it in Java without any fuss whatsoever (warning: untested sample):

class LinkedList[T] {
  private var root: Node = _
 
  def add(data: T) = {
    val insert = Node(data, null)
 
    if (root == null) {
      root = insert
    } else {
      root.next = insert
    }
 
    this
  }
 
  def get(index: Int) = {
    def walk(node: Node, current: Int): T = {
      if (node == null) {
        throw new IndexOutOfBoundsException(index.toString)
      }
 
      if (current < index) {
        walk(node.next, current + 1)
      } else {
        node.data
      }
    }
 
    if (index < 0) {
      throw new IndexOutOfBoundsException(index.toString)
    }
 
    walk(root, 0)
  }
 
  def size = {
    def walk(node: Node): Int = if (node == null) 0 else 1 + walk(node.next)
 
    walk(root)
  }
 
  private case class Node(data: T, var next: Node)
}

Once this class is compiled, we can use it in our Java code just as if it were written within the language itself:

public class Driver {
    public static void main(String[] args) {
        LinkedList<String> list = new LinkedList<String>();
 
        for (String arg : args) {
            list.add(arg);
        }
 
        System.out.println("List has size: " + list.size());
 
        for (int i = 0; i < list.size(); i++) {
            System.out.println(list.get(i).trim());
        }
    }
}

Impressively seamless interoperability!  We actually could have gotten really fancy and thrown in some operator overloading.  Obviously, Java wouldn’t have been able to use the operators themselves, but it still would have been able to call them just like normal Java instance methods.  Using Scala in this way, we can get all the advantages of its concise syntax and slick design without really abandoning our Java code base.

The problem comes in when we try to satisfy more complex cases.  Groovy proponents often trot out the example of a Java class inherited by a Groovy class which is in turn inherited by another Java class.  In Scala, that would be doing something like this:

public class Shape {
    public abstract void draw(Canvas c);
}
class Rectangle(val width: Int, val height: Int) extends Shape {
  override def draw(c: Canvas) {
    // ...
  }
}
public class Square extends Rectangle {
    public Square(int size) {
        super(size, size);
    }
}

Unfortunately, this isn’t exactly possible in Scala.  Well, I take that back.  We can cheat a bit and first compile Shape using javac, then compile Rectangle using scalac and finally Square using javac, but that would be quite nasty indeed.  What’s worse is such a technique would completely fall over if the Canvas class were to have a dependency on Rectangle, something which isn’t too hard to imagine.  In short, Scala is bound by the limitations of a separate compiler, as are most languages on the JVM.

Groovy solves this problem by building their own Java compiler into groovyc, thus allowing the compilation of both Java and Groovy sources within the same process.  This solves the problem of circular references because neither set of sources is completely compiled before the other.  It’s a nice solution, and one which Scala will be adopting in an upcoming release of its compiler.  However, it doesn’t really solve everything.

Consider a more complex scenario.  Imagine we have Java class Shape, which is extended by Scala class Rectangle and Groovy class Circle.  Imagine also that class Canvas has a dependency on both Rectangle and Circle, perhaps for some special graphics optimizations.  Suddenly we have a three-way circular dependency and no way of resolving it without a compiler which can handle all three languages: Java, Groovy and Scala.  This is starting to become a bit more interesting.

Of course, we can solve this problem in the same way we solved the Groovy-Java dependence problem: just add support to the compiler!  Unfortunately, it may have been trivial to implement a Java compiler as part of groovyc, but Scala is a much more difficult language from a compiler’s point of view.  But even supposing that we do create an integrated Scala compiler, we still haven’t solved the problem.  It’s not difficult to imagine throwing another language into the mix; Clojure, for example.  Do we keep going, tacking languages onto our once-Groovy compiler until we support everything usable on the JVM?  It should be obvious why this is a bad plan.

A more viable solution would be to create a common compiler framework, one which would be used as the basis for all JVM languages.  This framework would have common abstractions for things like name resolution and type checking.  Instead of creating an entire compiler from scratch, every language would simply extend this core framework and implement their own language as some sort of module.  In this way, it would be easy to build up a custom set of modules which solve the needs of your project.  Since the compilers are modular and based on the same core framework, they would be able to handle simultaneous compilation of all JVM languages involved, effectively solving the circular dependency problem in a generalized fashion.

The framework could even make things easier on would-be compiler implementors by handling common operations like bytecode emission.  Fundamentally, all of these tightly-integrated languages are just different front-ends to a common backend: the JVM.  I haven’t looked at the sources, but I would imagine that there is a lot of work which had to be done in each compiler to solve problems which were already handled in another.

Of course, all this is purely speculative.  Everyone builds their compiler in a slightly different way (slightly => radically in the case of languages like Scala) and I wouldn’t imagine that it would be easy to build this sort of common compiler backend.  However, the technology is in place.  We already have nice module systems like OSGi, and we’re certainly no strangers to the work involved in building up a proper CLASSPATH for a given project.  Why should this be any different?

It’s not without precedent either.  GCC defines a common backend for a number of compilers, such as G++, GCJ and even an Objective-C compiler.  Granted, it’s neither as high-level nor as modular as we would need to solve circular dependencies, but it’s something to go on.

It will be interesting to see where the JVM language sphere is headed next.  The rapid emergence of so many new languages is leading to problems which will have to be addressed before the polyglot methodology will be truly accepted by the industry.  Some of the smartest people in the development community are working toward solutions; and whether they take my idea of a modular framework or not, somewhere along the line the problem of simultaneous compilation must be solved.

The Plague of Polyglotism

28
Apr
2008

For those of you who don’t know, polyglotism is not some weird religion but actually a growing trend in the programming industry.  In essence, it is the concept that one should not be confined to a single language for a given system or even a specific application.  With polyglot programming, a single project could use dozens of different languages, each for a different task to which they are uniquely well-suited.

As a basic example, we could write a Wicket component which makes use of Ruby’s RedCloth library for working with Textile.  Because of Scala’s flexible syntax, we can use it to perform the interop between Wicket and Ruby using an internal DSL:

class TextileLabel(id:String, model:IModel) extends WebComponent(id, model) with JRuby {
  require("textile_utils")
 
  override def onComponentTagBody(stream:MarkupStream, openTag:ComponentTag) {
    replaceComponentTagBody(markupStream, openTag, 
        'textilize(model.getObject().toString()))
  }
}
# textile_utils.rb
require 'redcloth'
 
def textilize(text)
  doc = RedCloth.new text
  doc.to_html
end

Warning: Untested code

We’re actually using three languages here, even though we only have source for two of them.  The Wicket library itself is written in Java, our component is written in Scala and we work with the RedCloth library in Ruby.    This is hardly the best example of polyglotism, but it suffices for a simple illustration.  The general idea is that you would apply this concept to a more serious project and perform more significant tasks in each of the various languages.

The Bad News

This is all well and good, but there’s a glaring problem with this philosophy of design: not everyone knows every language.  You may be a language aficionado, picking up everything from Scheme to Objective-C, but it’s only a very small percentage of developers who share that passion.  Many projects are composed of developers without extensive knowledge of diverse languages.  In fact, even with a really good sampling of talent, it’s doubtful you’ll have more than one or two people fluent in more than two languages.  And unfortunately, there’s this pesky concern we all love called “maintainability”.

Let’s pretend that Slava Pestov comes into your project as a consultant and decides that he’s going to apply the polyglot programming philosophy.  He writes a good portion of your application in Java, Lisp and some language called Factor, pockets his consultant’s fee and then moves on.  Now the code he wrote may have been phenomenally well-designed and really perfect for the task, but you’re going to have a very hard time finding a developer who can maintain it.  Let’s say that six months down the road, you decide that your widget really needs a red push button, rather than a green radio selector.  Either you need a developer who knows Factor (hint: there aren’t very many), or you need a developer who’s willing to learn it.  The thing is that most developers with the knowledge and motivation to learn a language have either already done so, or are familiar enough with the base concepts as to be capable of jumping right in.  These developers fall into that limited group of people fluent in many different languages, and as such are a rare find.

Now I’m not picking on Factor in any way, it’s a very interesting language, but it still isn’t very widespread in terms of developer expertise.  That’s really what this all comes down to: developer expertise.  Every time you make a language choice, you limit the pool of developers who are even capable of groking your code.  If I decide to build an application in Java, even assuming that’s the only language I use, I have still eliminated maybe 20% of all developers from ever touching the project.  If I make the decision to use Ruby for some parts of the application, while still using Java for the others, I’ve now factored that 80% down to maybe 35% (developers who know Java and Ruby).  Once I throw in Scala, that cuts it down still further (maybe at 15% now).  If I add a fourth language - for example, Haskell - I’ve now narrowed the field so far, that it’s doubtful I’ll find anyone capable of handling all aspects within a reasonable price range.  It’s the same problem as with framework choice, except that frameworks are much easier to learn than languages.

The polyglot ideal was really devised by a bunch of nerdy folks like me.  I love languages and would like nothing better than to get paid to learn half a dozen new ones (assuming I’m coming into a project with a strange combination I haven’t seen before).  However, as I understand the industry, that’s not a common sentiment.  So a very loud minority of developers (/me waves) has managed to forge a very hot methodology, one which excludes almost all of the hard-working developer community.  If I didn’t know better, I would be tempted to say that it was a self-serving industry ploy to foster exclusivity in the job market.

I want to work on multi-language projects as much as anyone, but I really don’t think it’s the best thing right now.  I’m working on a project now which has an aspect for which Scala would be absolutely perfect, but since I’m the only developer on hand who is remotely familiar with the language, I’m probably going to end up recommending against its adoption.  Consider carefully the ramifications of trying new languages on your own projects, you may not be doing future developers any favors by going down that path.

Defining Scala Design Idioms

14
Apr
2008

With any new language comes a certain amount of uncertainty as to what is “the right way” to do things.  Now I’m not just talking about recursion vs iteration or object-oriented vs straight procedural.  What I’m referring to is the design idioms which govern everything from naming conventions to array deconstruction.

This idioms are highly language specific and can even differ between languages with otherwise similar lexical elements.  Consider the following C++, Java, Ruby and Scala examples:

vector<string> first_names;
first_names.push_back("Daniel");
first_names.push_back("Chris");
first_names.push_back("Joseph");
 
for (vector<string>::iterator i = first_names.begin(); i != first_names.end(); ++i) {
    cout << *i << endl;
}

Java:

String[] firstNames = {"Daniel", "Chris", "Joseph"};
 
for (String name : firstNames) {
    System.out.println(name);
}

Ruby:

first_names = ['Daniel', 'Chris', 'Joseph']
 
first_names.each do |name|
  puts name
end

Scala:

val firstNames = Array("Daniel", "Chris", "Joseph")
 
for (name <- firstNames) {
  println(name)
}

As a matter of interest, the Scala example could also be shortened to the following:

val firstNames = Array("Daniel", "Chris", "Joseph")
firstNames.foreach(println)

All of these samples perform essentially the same task: traverse an array of strings and print each value to stdout.  Of course, the C++ example is actually using a vector rather than an array due to the evil nature of C/C++ arrays, but it comes to the same thing.  Passing over the differences in syntax between these four languages, what really stands out are the different ways in which the task is performed.  C++ and Java are both using iterators, while Ruby and Scala are making use of higher order functions.  Ruby and C++ both use lowercase variables separated by underscores, while Java and Scala share the camelCase convention.

This is a bit of a trivial example, but it does open the door to a much more interesting discussion: what are these idioms in Scala’s case?  Scala is a very new language which has yet to see truly wide-spread adoption.  More than that, Scala is fundamentally different from what has come before.  Certainly it draws inspiration from many languages - most notably Java, C# and Haskell - but even languages which are direct descendants can impose entirely different idioms.  Just look at the differences between the Java and the C++ examples above.  The practical, day-to-day implications of this become even more apparent when you consider object-oriented constructs:

// Book.h
class Book {
    std::string title;
    Author *author;
 
public:
    Book(string);
 
    std::string get_title();
    Author* get_author();
    void set_author(Author*);
};
 
// Book.cpp
Book::Book(string t) : title(t), author(0) {}
 
string Book::get_title() {
    return title;
}
 
Author* Book::get_author() {
    return author;
}
 
void Book::set_author(Author *a) {
    author = a;
}

The equivalent in Java:

public class Book {
    private String title;
    private Author author;
 
    public Book(String title) {
        this.title = title;
    }
 
    public String getTitle() {
        return title;
    }
 
    public Author getAuthor() {
        return author;
    }
 
    public void setAuthor(Author author) {
        this.author = author;
    }
}

And the (much shorter) Ruby code:

class Book
  attr_reader :title, :author
  attr_writer :author
 
  def initialize(title)
    @title = title
  end
end

This code uses standard, accepted idioms for class design in each of the three languages.  Notice how in C++ we avoid name-clashes between method formals and data members?  This is because the compiler tends to get a little confused if we try to combine name shadowing and the initialization syntax (the bit that follows the : in the constructor).  This contrasts strongly with Java which by convention encourages shadowing of fields by method formals.  It’s a very strong convention to do what I’ve done here, using this.fieldName to disambiguate between formals and fields.  Ruby stands apart from both of these languages by enforcing the use of prefix symbols on variable names to define the container.  There really can’t be any shadowing of instance variables by formals since all instance variables must be prefixed with the @ symbol.

The question here is: what conventions are applicable in Scala?  At first blush, it’s tempting to write a class which looks like this:

class Book(val title:String) {
  var author:Author = null
}

Because every variable/value in Scala is actually a method, the accessor/mutator pairs are already generated for us (similar to how it is in Ruby with attributes).  The problem here is of course redefining an accessor or mutator.  For example, we may want to perform a check in the author mutator to ensure that the new value is not null.  In C++ and Java, we would just add an if statement to the correct method and leave it at that.  Ruby is more interesting because of the auto-generated methods, but it’s still fairly easy to do:

class Book
  def author=(author)
    @author = author unless author.nil?
  end
end

Scala poses a different problem.  In our example above, we’ve essentially created a class with two public fields, something which is sternly frowned upon in most object-oriented languages.  If we had done this in Java, it would be impossible to implement the null check without changing the public interface of the class.  Fortunately Scala is more flexible.  Here’s a naive implementation of Book in Scala which performs the appropriate check:

class Book(val title:String) {
  private var author:Author = null
 
  def author = author
 
  def author_=(author:Author) {
    if (author != null) {
      this.author = author
    }
  }
}

Unfortunately, there are some fairly significant issues with the above code.  First off, it won’t compile.  Scala lacks separate namespaces for fields and methods, so you can’t just give a method the same name as a field and expect it to work.  This merging of namespaces actually allows a lot of interesting design and is on the whole a good thing, but it puts a serious crimp in our design.  The second problem with this example is closely related to the first.  Assuming the code did compile, at runtime execution would enter the author_= method, presumably pass the conditional and then execute the this.author = author statement.  However, the Scala compiler will interpret this line in the following way:

this.author_=(author)

That’s right, infinite recursion.  Usually this issue isn’t apparent because the compiler error will take precedence over the runtime design flaw, but it’s still something which merits notice.  Obviously, using the same names for variables as we do for methods and their formals just isn’t going to work here.  It may be the convention in Java, but we’ll have to devise something new for Scala.

Over the last few months, I’ve read a lot of Scala written by other people.  Design strategies to solve problems like these range all over the map, from totally separate names to prepending or appending characters, etc.  The community just doesn’t seem to be able to standardize on any one “right way”.  Personally, I favor the prepended underscore solution and would really like to see it become the convention, but that’s just me:

class Book(val title:String) {
  private var _author:Author = null   // notice the leading underscore
 
  def author = _author
 
  def author_=(author:Author) {
    if (author != null) {
      _author = author
    }
  }
}

The Scala community really needs to get together on this and other issues related to conventional design idioms.  I see a lot of code that’s written in Java, but with C or C++ idioms.  This sort of thing is rare nowadays, but was quite common in the early days of the language.  People weren’t sure what worked best in Java, so they tried to apply old techniques to the new syntax, often with disastrously unreadable results.  As Scala moves into the mainstream, we have to come to some sort of consensus as to what our code should look like and what conventions apply.  If we don’t, then the language will be forced to struggle through many of the same problems which plagued Java only a decade ago: new developers trying to get a handle on this foreign language, misapplying familiar constructs along the way.

ActiveObjects 0.8 Released

22
Mar
2008

Happy Easter everyone, we’ve got a new release!  ActiveObjects 0.8 is simmering at low heat (and footprint) on the servers even as I type.  This is probably the most significant milestone we’ve released thus far in that the 1.0 stream is now basically feature-complete.  I won’t be adding any new features to this release, just polishing and testing the heck out of the old ones.  We’ve already got a suite of 91 tests and I’ve got lots of ideas for more.  Expect this framework to get very, very stable in the next few months.

Features

Despite an annoyingly cramped schedule in the last couple months, I have managed to implement a few long-requested features.  The big ticket item for this release is pluggable cache implementations.  For those of you not “in the know”, ActiveObjects has a multi-tiered cache system composed of several layers and delegates.  This isn’t entirely unusual for an ORM, which is why I hadn’t mentioned it before.  The three basic layers:

  • Entity Cache - A SoftReference map from id/type pairs to actual entity values.  This means that if you run a query which returns an entity for people {id=1} and then run a second query which returns an entity for the same row, the two instances will be identical.  This is what enables the additional cache layers to function properly (and it saves on memory).
  • Value Cache - This is where all the field values for a given row are stored.  This will store both field values from the database as well as entity values (corresponding to foreign key fields).  Each entity has a separate instance of this cache.  Most of the memory required by ActiveObjects is taken up here.  With a long-running application, quite a bit of the database can actually get paged into memory.  This is a really good thing, since it drastically reduces round-trips to the database.  Thanks to the SoftReference(s) in the entity cache, running out of memory due to stale cache entries isn’t a problem.
  • Relations Cache - This is probably the most complex of all of the cache layers.  This cache literally caches any one-to-one, one-to-many or many-to-many result sets and spits them out on demand.  This reduces round-trips still more to the point where certain applications can be running completely from memory, even querying on complex relations.  This in and of itself isn’t that complex; the hard part is knowing when to expire specific cache entries.  This cache layer is essentially three different indexes to the same data, allowing cache expiry for a number of different circumstances.  Naturally, these structures don’t represent well as a single map of key / value pairs.  (more on this in a bit)

As of the 0.8 release, the value cache and relations cache layers have been unified under a single controller.  They’re still separate caches, but this unification allows for something I’ve been wanting to implement since 0.2: Memcached support.

High-volume applications often run into the problem of limited memory across the various server nodes.  Solutions like Terracotta can help tremendously, but unfortunately this doesn’t solve everything.  One answer is to provide every node in the cluster with a shared, distributed, in-memory map of key / value pairs.  This allows the application to page data into cluster memory and retrieve it extremely quickly.  Memcached is effectively this.  Naturally, it would be very useful if an ORM could automatically make use of just such a distributed memory cache and thus share cached rows between nodes.  Hibernate has had this for a long time, and now ActiveObjects does as well.

By making use of the MemcachedCache class in the activeobjects-memcached module (separate from the main distribution), it is possible to point ActiveObjects at an existing Memcached cluster and cause it to store the value cache there, rather than in local memory.  Once the relevant call has been made to EntityManager (setting up the cache), all cached rows will be stored in Memcached and shared between every instance of ActiveObjects running in your cluster.

Caveats

Of course, nothing is perfect.  For the moment, the major issues is that the relations cache doesn’t work properly against Memcached.  I had originally planned to just release it running against the local RAM, but this causes issue with proper cache expiry.  As such, the relations cache is disabled when running ActiveObjects against Memcached.  This doesn’t really impair functionality other than forcing ActiveObjects to hit the database every time a relationship is queried.  Since this is what most ORMs do anyway, it’s not too horrible.

I plan to have this issue resolved in time for the 0.9 release.  The biggest problem is figuring out how to flatten the multi-index structure into a single map.  Once I can do that, rewriting things for Memcached should be a snap.  (btw, if anyone wants to help out with patches in this area, you’re more than welcome!)

Databases Galore

One of the major focuses of this release was testing on different platforms.  This process uncovered an embarrassing number of bugs and glitches in some of the supposedly “supported” databases.  With the exception of Oracle, every bug I’ve been able to identify has been fixed.  So if you’ve tried ActiveObjects in the past and run into problems on non-MySQL platforms, now would be the time to give it another shot!

Speaking of Oracle, we’ve made some tremendous progress toward full support of this ornery database.  To be fair, I’m not the one responsible.  One of our community members has been tirelessly submitting patches and running tests.  We didn’t have time to get things fully sorted for this release (so don’t try AO 0.8 with Oracle unless you’re willing to risk server meltdown), but you can expect dramatic improvements in 0.9.  Once these fixes are in place, we should be able to safely claim that we support most of the database market (in terms of product adoption).

Final Thoughts

This really is an amazing release, well worth the download.  Most of the core API has remained the same, so things should be basically plug-n-play for existing applications.  The user-facing API is now in freeze and (unless some serious bug arises) will experience no changes between now and 1.0.  If you’ve been considering trying ActiveObjects for your project, now would be an excellent time.  I can’t promise no bugs, but if you point out my idiocy, I’ll certainly work to fix it!

Oh, as an aside, ActiveObjects is now in a Maven2 repository.  (thanks to some POM reworking by Nathan Hamblen)  To make use of this somewhat dubious feature, add the java.net Maven2 repository to your POM and then insert the following dependency:

<dependency>
    <groupId>net.java.dev.activeobjects</groupId>
    <artifactId>activeobjects</artifactId>
    <version>0.8</version>
</dependency>

After that, the magic of Maven will take over.  Of course, you’ll need to add the dependency for the appropriate database driver and (optionally) connection pool.  Those dependencies are left as an exercise to the reader.  Note, activeobjects-memcached isn’t in Maven yet, but it will be for the 0.9 release (either that or integrated into the core, I haven’t decided yet).

I certainly hope you enjoy this latest release.  As always, feel free to drop by the users list if you have any questions or (more likely) uncover any problems.

Defining High, Mid and Low-Level Languages

27
Feb
2008

I’ve been writing quite a bit recently about the differences between languages.  Mostly I’ve just been whining about how annoying it is that everyone keeps searching for the “one language to rule them all”, the Aryan Language if you will.  Over the course of some of these articles, I’ve made some rather loosely defined references to terms like “general purpose” and “mid-level” when trying to describe these languages. 

Several people have (rightly) called me out on these terms, arguing that I haven’t really defined what they mean, so I shouldn’t be using them to try to argue a certain point.  In the case of “general purpose language”, I have to admit that I tend to horribly misuse the term and any instances within my writing should be discarded without thought.  However, I think with a little bit of reflection, we can come to some reasonable definitions for high-, mid- and low-level languages.  To that end, I present the “Language Spectrum of Science!” (cue reverb)

Language Spectrum of Science 

This scale is admittedly arbitrary and rather loosely defined in and of itself, but I think it should be a sufficient visual aid in conveying my point.  In case you hadn’t guessed, red languages are low-level, green languages are high-level and that narrow strip of yellow represents the mid-level languages.  Obviously I’m leaving out a large number of languages which could be represented with equal validity, but I only have a finite number of pixels in page-width.

The scale is also somewhat myopic.  It defines Ruby as the highest of the high-level languages.  Very few could argue the other side of the scale since there’s not really anything lower than the hardware, but claiming that Ruby is the most high-level language in history seems somewhat odd.  In truth, I picked Ruby as the super high-level language mainly because it’s a) more dynamic than both JavaScript and Perl, b) more prone to RAD frameworks like Rails and c) it’s the most significant high-level language which I’m really familiar with.

It’s also important to note that languages aren’t really points on the spectrum, but rather they span ranges which are more or less wide, depending on the capabilities.  These ranges may overlap considerably (as in the case of Java and Scala) or may be entirely disjoint (Assembly and Ruby).  In short, the scale is somewhat blurry and shouldn’t be taken as a canonical reference.

Low-Level

Of all of the categories, it’s probably easiest to define what it means to be a low-level language.  Machine code is low level because it runs directly on the processor.  Low-level languages are appropriate for writing operating systems or firmware for micro-controllers.  They can do just about anything with a little bit of work, but obviously you wouldn’t want to write the next major web framework in one of them (I can see it now, “Assembly on Rails”).

Characteristics

  • Direct memory management
  • Little-to-no abstraction from the hardware
  • Register access
  • Statements usually have an obvious correspondence with clock cycles
  • Superb performance

C is actually a very interesting language in this category (more so C++) because of how broad its range happens to be.  C allows you direct access to registers and memory locations, but it also has a number of constructs which allow significant abstraction from the hardware itself.  Really, C and C++ probably represent the most broad spectrum languages in existence, which makes them quite interesting from a theoretical standpoint.  In practice, both C and C++ are too low-level to do anything “enterprisy”.

Mid-Level

This is where things start getting vague.  Most high-level languages are well defined, as are low-level languages, but mid-level languages tend to be a bit difficult to box.  I really define the category by the size of application I would be willing to write using a given language.  I would have no problem writing and maintaining a large desktop application in a mid-level language (such as Java), whereas to do so in a low-level language (like Assembly) would lead to unending pain.

This is really the level at which virtual machines start to become common-place.  Java, Scala, C# etc all use a virtual machine to provide an execution environment.  Thus, many mid-level languages don’t compile directly down to the metal (at least, not right away) but represent a blurring between interpreted and compiled languages.  Mid-level languages are almost always defined in terms of low-level languages (e.g. the Java compiler is bootstrapped from C).

Characteristics

  • High level abstractions such as objects (or functionals)
  • Static typing
  • Extremely commonplace (mid-level languages are by far the most widely used)
  • Virtual machines
  • Garbage collection
  • Easy to reason about program flow

High-Level

High-level languages are really interesting if you think about it.  They are essentially mid-level languages which just take the concepts of abstraction and high-level constructs to the extreme.  For example, Java is mostly object-oriented, but it still relies on primitives which are represented directly in memory.  Ruby on the other hand is completely object-oriented.  It has no primitives (outside of the runtime implementation) and everything can be treated as an object.

In short, high-level languages are the logical semantic evolution of mid-level languages.  It makes a lot of sense when you consider the philosophy of simplification and increase of abstraction.  After all, people were n times more productive switching from C to Java with all of its abstractions.  If that really was the case, then can’t we just add more and more layers of abstraction to increase productivity exponentially?

High-level languages tend to be extremely dynamic.  Runtime flow is changed on the fly through the use of things like dynamic typing, open classes, etc.  This sort of technique provides a tremendous amount of flexibility in algorithm design.  However, this sort of mucking about with execution also tends to make the programs harder to reason about.  It can be very difficult to follow the flow of an algorithm written in Ruby.  This “obfuscation of flow” is precisely why I don’t think high-level languages like Ruby are suitable for large applications.  That’s just my opinion though.  :-)

Characteristics

  • Interpreted
  • Dynamic constructs (open classes, message-style methods, etc)
  • Poor performance
  • Concise code
  • Flexible syntax (good for internal DSLs)
  • Hybrid paradigm (object-oriented and functional)
  • Fanatic community

Oddly enough, high-level language developers seem to be much more passionate about their favorite language than low- or mid-level developers.  I’m not entirely sure why it has to be this way, but the trend has been far too universal to ignore (Python, Perl, Ruby, etc).  Ruby is of course the canonical example of this primarily because of the sky-rocket popularity of Rails, but any high-level language has its fanatic evangelists.

What’s really interesting about many high-level languages is the tendency to fall into a hybrid paradigm category.  Python for example is extremely object-oriented, but also allows things like closures and first-class functions.  It’s not as powerful in this respect as a language like Scala (which allows methods within methods within methods), but nevertheless it is capable of representing most elements of a pure-functional language.

As an aside, high-level languages usually perform poorly compared with low- or even mid-level languages.  This is merely a function of the many layers of abstraction between the code and the machine itself.  One instruction in Ruby may translate into literally thousands of machine words.  Of course, high-level languages are almost exclusively used in situations where such “raw-metal” performance is unnecessary, but it’s still a language trait worth remembering.

Conclusion

It’s important to remember that I’m absolutely not recommending one language or “level” over another for the general case.  The very reason we have such a gradient variety of language designs is that there is a need for all of them at some point.  The Linux kernel could never be written in Ruby, and I would never want to write an incremental backup system in Assembly.  All of these languages have their uses, it’s just a matter of identifying which language matches your current problem most closely.