Skip to content
Print

The Need for a Common Compiler Framework

23
Jun
2008

In recent years, we have seen a dramatic rise in the number of languages used in mainstream projects.  In particular, languages which run on the JVM or CLR have become quite popular (probably because sane people hate dealing with x86 assembly).  Naturally, such languages prefer to interoperate with other languages built on these core platforms, particularly Java and C# (respectively).  Collectively, years of effort have been put into devising and implementing better ways of working with libraries written in these “parent languages”.  The problem is that such efforts are crippled by one fundamental limitation: circular dependencies.

Let’s take Scala as an example.  Of all of the JVM languages, this one probably has the potential for the tightest integration with Java.  Even Groovy, which is renowned for its integration, still falls short in many key areas.  (generics, anyone?)  With Scala, every class is a Java class, every method is a Java method, and there is no API which cannot be accessed from Java as natively as any other.  For example, I can write a simple linked list implementation in Scala and then use it in Java without any fuss whatsoever (warning: untested sample):

class LinkedList[T] {
  private var root: Node = _
 
  def add(data: T) = {
    val insert = Node(data, null)
 
    if (root == null) {
      root = insert
    } else {
      root.next = insert
    }
 
    this
  }
 
  def get(index: Int) = {
    def walk(node: Node, current: Int): T = {
      if (node == null) {
        throw new IndexOutOfBoundsException(index.toString)
      }
 
      if (current < index) {
        walk(node.next, current + 1)
      } else {
        node.data
      }
    }
 
    if (index < 0) {
      throw new IndexOutOfBoundsException(index.toString)
    }
 
    walk(root, 0)
  }
 
  def size = {
    def walk(node: Node): Int = if (node == null) 0 else 1 + walk(node.next)
 
    walk(root)
  }
 
  private case class Node(data: T, var next: Node)
}

Once this class is compiled, we can use it in our Java code just as if it were written within the language itself:

public class Driver {
    public static void main(String[] args) {
        LinkedList<String> list = new LinkedList<String>();
 
        for (String arg : args) {
            list.add(arg);
        }
 
        System.out.println("List has size: " + list.size());
 
        for (int i = 0; i < list.size(); i++) {
            System.out.println(list.get(i).trim());
        }
    }
}

Impressively seamless interoperability!  We actually could have gotten really fancy and thrown in some operator overloading.  Obviously, Java wouldn’t have been able to use the operators themselves, but it still would have been able to call them just like normal Java instance methods.  Using Scala in this way, we can get all the advantages of its concise syntax and slick design without really abandoning our Java code base.

The problem comes in when we try to satisfy more complex cases.  Groovy proponents often trot out the example of a Java class inherited by a Groovy class which is in turn inherited by another Java class.  In Scala, that would be doing something like this:

public class Shape {
    public abstract void draw(Canvas c);
}
class Rectangle(val width: Int, val height: Int) extends Shape {
  override def draw(c: Canvas) {
    // ...
  }
}
public class Square extends Rectangle {
    public Square(int size) {
        super(size, size);
    }
}

Unfortunately, this isn’t exactly possible in Scala.  Well, I take that back.  We can cheat a bit and first compile Shape using javac, then compile Rectangle using scalac and finally Square using javac, but that would be quite nasty indeed.  What’s worse is such a technique would completely fall over if the Canvas class were to have a dependency on Rectangle, something which isn’t too hard to imagine.  In short, Scala is bound by the limitations of a separate compiler, as are most languages on the JVM.

Groovy solves this problem by building their own Java compiler into groovyc, thus allowing the compilation of both Java and Groovy sources within the same process.  This solves the problem of circular references because neither set of sources is completely compiled before the other.  It’s a nice solution, and one which Scala will be adopting in an upcoming release of its compiler.  However, it doesn’t really solve everything.

Consider a more complex scenario.  Imagine we have Java class Shape, which is extended by Scala class Rectangle and Groovy class Circle.  Imagine also that class Canvas has a dependency on both Rectangle and Circle, perhaps for some special graphics optimizations.  Suddenly we have a three-way circular dependency and no way of resolving it without a compiler which can handle all three languages: Java, Groovy and Scala.  This is starting to become a bit more interesting.

Of course, we can solve this problem in the same way we solved the Groovy-Java dependence problem: just add support to the compiler!  Unfortunately, it may have been trivial to implement a Java compiler as part of groovyc, but Scala is a much more difficult language from a compiler’s point of view.  But even supposing that we do create an integrated Scala compiler, we still haven’t solved the problem.  It’s not difficult to imagine throwing another language into the mix; Clojure, for example.  Do we keep going, tacking languages onto our once-Groovy compiler until we support everything usable on the JVM?  It should be obvious why this is a bad plan.

A more viable solution would be to create a common compiler framework, one which would be used as the basis for all JVM languages.  This framework would have common abstractions for things like name resolution and type checking.  Instead of creating an entire compiler from scratch, every language would simply extend this core framework and implement their own language as some sort of module.  In this way, it would be easy to build up a custom set of modules which solve the needs of your project.  Since the compilers are modular and based on the same core framework, they would be able to handle simultaneous compilation of all JVM languages involved, effectively solving the circular dependency problem in a generalized fashion.

The framework could even make things easier on would-be compiler implementors by handling common operations like bytecode emission.  Fundamentally, all of these tightly-integrated languages are just different front-ends to a common backend: the JVM.  I haven’t looked at the sources, but I would imagine that there is a lot of work which had to be done in each compiler to solve problems which were already handled in another.

Of course, all this is purely speculative.  Everyone builds their compiler in a slightly different way (slightly => radically in the case of languages like Scala) and I wouldn’t imagine that it would be easy to build this sort of common compiler backend.  However, the technology is in place.  We already have nice module systems like OSGi, and we’re certainly no strangers to the work involved in building up a proper CLASSPATH for a given project.  Why should this be any different?

It’s not without precedent either.  GCC defines a common backend for a number of compilers, such as G++, GCJ and even an Objective-C compiler.  Granted, it’s neither as high-level nor as modular as we would need to solve circular dependencies, but it’s something to go on.

It will be interesting to see where the JVM language sphere is headed next.  The rapid emergence of so many new languages is leading to problems which will have to be addressed before the polyglot methodology will be truly accepted by the industry.  Some of the smartest people in the development community are working toward solutions; and whether they take my idea of a modular framework or not, somewhere along the line the problem of simultaneous compilation must be solved.

Comments

  1. Reading half way through your post, I first thought of maven. You could define all the compilers you need with a particular artifact you want to compiler. Maven could then figure out which compiler to use for which source file. I guess it would only work with a linear dependency graph. As soon as you have circular dependencies, you probably need simultaneous compilation of all languages involved.

    David Linsin Monday, June 23, 2008 at 12:42 am
  2. I was thinking of Maven as well. Buildr actually has some really nice support for what you’re talking about. It figures out what compilers it needs based on your directory structure and what sources are available. After that point, it invokes these compilers in series as necessary to resolve dependencies. You are correct though that an acyclic (I assume that’s what you meant by “linear”) dependency graph is required.

    If a framework like what I described in the article is created, then I would certainly expect tools like Maven and Buildr to have really solid integration. It only makes sense. All of the technology is available to make software development a completely plugable, easy-to-setup process from the simplest to the most complex projects, no matter what language(s) are involved. It’s just a matter of getting everything to work together.

    Daniel Spiewak Monday, June 23, 2008 at 12:46 am
  3. Off the top of my head, I suppose one possibility is to introduce the concept of a linker to Java so you don’t need to resolve external references at compile time, but at link time.

    Feels unpleasant though, and likely to be non-trivial.

    Adrian Monday, June 23, 2008 at 1:29 am
  4. Yeah, that would be nasty. It would also require every JVM language to implement some sort of C header-like functionality (so that unlinked definitions can be stubbed). Ugliness aside, I’m not sure how well that sort of feature would work in some of these languages (like Clojure, which uses Lisp-like “definitions” which can’t be nicely stubbed).

    Daniel Spiewak Monday, June 23, 2008 at 1:54 am
  5. The current trunk version of the Scala compiler already supports mixed compilation of Java and Scala sources. At least, sortof.

    The approach it uses is moderately clever: It knows just enough about Java to determine what the classes generated by a source file look like. What their names are, what they extend, implement, etc. and what the signatures of their public methods and fields look like. This means that when compiling Scala files it can look at Java files to find their dependencies, but it doesn’t have to actually be able to compile the Java files. This way you can compile all your scala files first then compile all your Java files.

    This in turn is suggestive of a solution to the problem: We don’t need to have a full generic compiler framework. All we need is a common interface for performing this sort of operation: Pass a bunch of source files to a compiler and say “Do your best to figure out the signatures emitted by this”.

    In general it might prove to be the case that not all language features are possible to deal with this way. For example, because Scala has type inference for the return type of a method you might not be able to know its signature without knowing that of its dependencies. But it seems ok to have a larger set of compile errors for cross language work.

    David R. MacIver Monday, June 23, 2008 at 2:47 am
  6. So how does it work presently in the .Net universe?

    I’m pretty sure there is no problem mixing vb.net and C#, but of course they were both developed knowing they would be mixed/

    Languages like IronPython and the like would have the same problem then?

    Jeff Heon Monday, June 23, 2008 at 7:00 am
  7. Maybe this would work within the JVM-world or the CLR-world, but even there part of the problem is that mapping arbitrary programming languages onto them is hard – conforming to these runtimes means making sacrifices which many people aren’t prepared to (such as me, being one of those weirdo’s writing compilers targeting x86 directly – it’s not that hard, but I do plan on abstracting it out).

    But even on the JVM, part of the problem will be that language architects deep down tend to be anarchists – just look at the sheer number of parser generators. Sometimes it seems there are more parser generators than languages, since most serious language designers tends to write his/her own sooner or later, and often more than one, because so many of us hate the existing tools with a vengeance (I certainly do, including my own, so I keep striving for something nicer).

    GCC sort of gets there. LLVM might also sort of get there. But they are complex beasts, and making it possible to support multiple languages with them pages it harder to keep them lean.

    But frankly you don’t NEED a common compiler framework. All you need are a few narrow standardized interfaces to encapsulate calling a compiler, manipulating the resulting bytecode, and to query the classes. Java already have a lot of reflection support, so you’re sort-of halfway there.

    Libraries for some things like emitting bytecode could certainly be reused, though, but even there you’re likely to run into vastly different philosophies on what the right way to do it is between compiler implementors.

    Vidar

    Vidar Hokstad Monday, June 23, 2008 at 8:42 am
  8. > So how does it work presently in the .Net universe?
    Same problem (in the normal case). Each compile generates an assembly, and you can’t have circular dependencies between assemblies.

    You can create multi-language assemblies by building “modules” and then merging them, but IIRC you have the same inability to have circular references between different modules.

    But then how often do you really have such circular dependencies that cannot be broken by having Canvas (for example) dependent on RectangleBase and CircleBase rather than their concrete implementations.

    Richard Monday, June 23, 2008 at 9:37 am
  9. Hi Daniel, good write up! but I’m not sure what you mean about Groovy + generics falling short, AFAIK generics support is there already since 1.5, Besides Groovy 1.5+ also supports annotations and enums (haven’t checked yet if Scala supports them directly or has an alternative).

    What you describe here is called the joint-compiler, funny you talk about Java+Groovy+Scala compiler, Danno Ferrin (a.k.a. @shemnon) pitched the same idea ta JavaOne =) I believe there is already a compiler framework JSR (not exactly the fastest way to have it done), but it makes total sense (at least to me) to have a common ground to compile JVM languages together in a “single step”. Why not propose this topic to the upcoming DaVinci VM conference?

    Andres Almiray Monday, June 23, 2008 at 9:43 am
  10. I like the idea of a common compiler framework. It could be useful for more than just compilation – but for tool building in general. For instance, I’d like to be able to get a programmatic representation of the class hierarchy, or the dependency graph between files in either scala or java. Clearly the compiler is the arbiter of such information and it would be nice to have an API to gain access to it for tool construction.

    Does eclipse define such APIs for JDT?

    Robin Barooah Monday, June 23, 2008 at 9:56 am
  11. All these VM’s and IL’s make me feel like all we did was re-create the wheel of something already done just on a somewhat lower level.

    jon Monday, June 23, 2008 at 12:53 pm
  12. I think some group of people already MAKE a program that MAKEs complicated build processes involving multiple build programs easier to create and manage. You might find such a thing MAKEs the dependencies on different compilers not as much of an issue.

    Trevion Monday, June 23, 2008 at 4:44 pm
  13. Whew! Starting from the top and working my way down…

    @David

    Interesting notes about scalac in trunk/ I’ll have to build myself a copy and give it a try. What’s really interesting about this solution is it basically implements Adrian’s idea of a linker of sorts. It also solves probably the biggest problem with joint compilation, which is having to reimplement an already well-vetted compiler.

    The problem is that this still isn’t a very general solution: it only works for Scala and Java. Groovy and the rest are still left out in the cold. Granted, it would be possible to build such a “stub generator” for Groovy as well as Java, but then what about users who want to go the other way? More and more these days, people want to “mix and match” their languages. I’m almost completely convinced that the only way to accommodate this need is with a generic framework. This framework *could* be based on the generation of stubs and compiler delegation, but why not go all-in and make every compilation process potentially-joint?

    @Vidar

    Yeah, language designers are quite the rebellious lot. I suppose I had never thought about it before, but it makes sense. After all, why would you create your own language if you didn’t already have a bit of a non-conformist streak?

    The thing is, I’m really not sure if there’s much of a choice in this matter. Whether we go with a framework like I have outlined, or some other (preferably better) solution, it doesn’t matter. We really need to have some sort of interplay between the different compilers, otherwise would-be adopters are going to be permanently tearing their collective hair out over the pain involved. You’re quite correct that it is a difficult problem to solve. If all of the JVM languages shared a common paradigm then perhaps it would be simpler, but we have to be a bit more flexible than that.

    The good news is that some of the smartest people in the entire development community are involved in language design these days. I have every confidence in their ability to come up with something extraordinarily clever to solve this issue. It’s probably just a matter of them recognizing that the need exists.

    @Richard

    You’re right that *object-oriented* languages can avoid circular dependencies by abstracting out a common superclass and thus breaking the dependent chain. However, this is a *lot* more work and it often feels unnaturally contrived. Furthermore, not every JVM language sports object-oriented constructs. Clojure (for example) is pure-functional. A circular dependency could be solved as long as a maximum of one of the languages involved is non-OO. After that point, such a solution breaks down.

    @Andres

    I actually haven’t tried Groovy’s generics at all, I was simply parroting what I’ve heard others say. :-) Actually, I think it was Danno himself that mentioned having troubles with Groovy and generics, but it would have been some time ago. Hmm…

    I looked around for any compiler-related JSRs, but only found the Java 6 compiler API (JSR 299 or something like that). That’s just a mechanism for interacting with the compiler programmatically, it’s not really a *compiler* framework, designed for use in the construction of compilers. To which JSR were you referring?

    The Da Vinci Machine conference would be fun, but I’m afraid there’s just no way I could attend. However, if someone else thinks the idea is worthy of theft and re-presentation, I have no objections! :-) I think everyone would benefit if *some* solution to this problem were to be proposed.

    Daniel Spiewak Tuesday, June 24, 2008 at 1:04 am
  14. It was probably that JSR, I wasn’t so sure about it actual contents but knew it was compiler related. I guess now is the time for the JCP, openJDK and/or MVM groups to look into a compiler framework with better eyes. I’m in the Bay Area so a quick trip to Santa Clara is no big deal, I’m no compiler expert but I can surely give it a try. Or we can also send a message to Charles Nutter, he seems pretty excited about collaboration between JVM languages after all =)

    Andres Almiray Tuesday, June 24, 2008 at 4:16 pm
  15. Go for it! I’d be glad to help out where I can with the idea, but to be honest, you probably understand the problem domain much better than I do. :-)

    Daniel Spiewak Thursday, June 26, 2008 at 6:46 am

Post a Comment

Comments are automatically formatted. Markup are either stripped or will cause large blocks of text to be eaten, depending on the phase of the moon. Code snippets should be wrapped in <pre>...</pre> tags. Indentation within pre tags will be preserved, and most instances of "<" and ">" will work without a problem.

Please note that first-time commenters are moderated, so don't panic if your comment doesn't appear immediately.

*
*