Skip to content

What is Hindley-Milner? (and why is it cool?)

29
Dec
2008

Anyone who has taken even a cursory glance at the vast menagerie of programming languages should have at least heard the phrase “Hindley-Milner”.  F#, one of the most promising languages ever to emerge from the forbidding depths of Microsoft Research, makes use of this mysterious algorithm, as do Haskell, OCaml and ML before it.  There is even some research being undertaken to find a way to apply the power of HM to optimize dynamic languages like Ruby, JavaScript and Clojure.

However, despite widespread application of the idea, I have yet to see a decent layman’s-explanation for what the heck everyone is talking about.  How does the magic actually work?  Can you always trust the algorithm to infer the right types?  Further, why is Hindley-Milner is better than (say) Java?  So, while those of you who actually know what HM is are busy recovering from your recent aneurysm, the rest of us are going to try to figure this out.

Ground Zero

Functionally speaking, Hindley-Milner (or “Damas-Milner”) is an algorithm for inferring value types based on use.  It literally formalizes the intuition that a type can be deduced by the functionality it supports.  Consider the following bit of psuedo-Scala (not a flying toy):

def foo(s: String) = s.length
 
// note: no explicit types
def bar(x, y) = foo(x) + y

Just looking at the definition of bar, we can easily see that its type must be (String, Int)=>Int.  As humans, this is an easy thing for us to intuit.  We simply look at the body of the function and see the two uses of the x and y parameters.  x is being passed to foo, which expects a String.  Therefore, x must be of type String for this code to compile.  Furthermore, foo will return a value of type Int.  The + method on class Int expects an Int parameter; thus, y must be of type Int.  Finally, we know that + returns a new value of type Int, so there we have the return type of bar.

This process is almost exactly what Hindley-Milner does: it looks through the body of a function and computes a constraint set based on how each value is used.  This is what we were doing when we observed that foo expects a parameter of type String.  Once it has the constraint set, the algorithm completes the type reconstruction by unifying the constraints.  If the expression is well-typed, the constraints will yield an unambiguous type at the end of the line.  If the expression is not well-typed, then one (or more) constraints will be contradictory or merely unsatisfiable given the available types.

Informal Algorithm

The easiest way to see how this process works is to walk it through ourselves.  The first phase is to derive the constraint set.  We start by assigning each value (x and y) a fresh type (meaning “non-existent”).  If we were to annotate bar with these type variables, it would look something like this:

def bar(x: X, y: Y) = foo(x) + y

The type names are not significant, the important restriction is that they do not collide with any “real” type.  Their purpose is to allow the algorithm to unambiguously reference the yet-unknown type of each value.  Without this, the constraint set cannot be constructed.

Next, we drill down into the body of the function, looking specifically for operations which impose some sort of type constraint.  This is a depth-first traversal of the AST, which means that we look at operations with higher-precedence first.  Technically, it doesn’t matter what order we look; I just find it easier to think about the process in this way.  The first operation we come across is the dispatch to the foo method.  We know that foo is of type String=>Int, and this allows us to add the following constraint to our set:

X  \mapsto  String

The next operation we see is +, involving the y value.  Scala treats all operators as method dispatch, so this expression actually means “foo(x).+(y).  We already know that foo(x) is an expression of type Int (from the type of foo), so we know that + is defined as a method on class Int with type Int=>Int (I’m actually being a bit hand-wavy here with regards to what we do and do not know, but that’s an unfortunate consequence of Scala’s object-oriented nature).  This allows us to add another constraint to our set, resulting in the following:

X  \mapsto  String

Y  \mapsto  Int

The final phase of the type reconstruction is to unify all of these constraints to come up with real types to substitute for the X and Y type variables.  Unification is literally the process of looking at each of the constraints and trying to find a single type which satisfies them all.  Imagine I gave you the following facts:

  • Daniel is tall
  • Chris is tall
  • Daniel is red
  • Chris is blue

Now, consider the following constraints:

Person1 is tall

Person1 is red

Hmm, who do you suppose Person1 might be?  This process of combining a constraint set with some given facts can be mathematically formalized in the guise of unification.  In the case of type reconstruction, just substitute “types” for “facts” and you’re golden.

In our case, the unification of our set of type constraints is fairly trivial.  We have exactly one constraint per value (x and y), and both of these constraints map to concrete types.  All we have to do is substitute “String” for “X” and “Int” for “Y” and we’re done.

To really see the power of unification, we need to look at a slightly more complex example.  Consider the following function:

def baz(a, b) = a(b) :: b

This snippet defines a function, baz, which takes a function and some other parameter, invoking this function passing the second parameter and then “cons-ing” the result onto the second parameter itself.  We can easily derive a constraint set for this function.  As before, we start by coming up with type variables for each value.  Note that in this case, we not only annotate the parameters but also the return type.  I sort of skipped over this part in the earlier example since it only sufficed to make things more verbose.  Technically, this type is always inferred in this way.

def baz(a: X, b: Y): Z = a(b) :: b

The first constraint we should derive is that a must be a function which takes a value of type Y and returns some fresh type Y’ (pronounced like “why prime“).  Further, we know that :: is a function on class List[A] which takes a new element A and produces a new List[A].  Thus, we know that Y and Z must both be List[Y'].  Formalized in a constraint set, the result is as follows:


X  \mapsto  (Y=>Y’ )

Y  \mapsto  List[Y’ ]

Z  \mapsto  List[Y’ ]

Now the unification is not so trivial.  Critically, the X variable depends upon Y, which means that our unification will require at least one step:


X  \mapsto  ( List[Y’ ]=>Y’ )

Y  \mapsto  List[Y’ ]

Z  \mapsto  List[Y’ ]

This is the same constraint set as before, except that we have substituted the known mapping for Y into the mapping for X.  This substitution allows us to eliminate X, Y and Z from our inferred types, resulting in the following typing for the baz function:

def baz(a: List[Y']=>Y', b: List[Y']): List[Y'] = a(b) :: b

Of course, this still isn’t valid.  Even assuming that Y' were valid Scala syntax, the type checker would complain that no such type can be found.  This situation actually arises surprisingly often when working with Hindley-Milner type reconstruction.  Somehow, at the end of all the constraint inference and unification, we have a type variable “left over” for which there are no known constraints.

The solution is to treat this unconstrained variable as a type parameter.  After all, if the parameter has no constraints, then we can just as easily substitute any type, including a generic.  Thus, the final revision of the baz function adds an unconstrained type parameter “A” and substitutes it for all instances of Y’ in the inferred types:

def baz[A](a: List[A]=>A, b: List[A]): List[A] = a(b) :: b

Conclusion

…and that’s all there is to it!  Hindley-Milner is really no more complicated than all of that.  One can easily imagine how such an algorithm could be used to perform far more complicated reconstructions than the trivial examples that we have shown.

Hopefully this article has given you a little more insight into how Hindley-Milner type reconstruction works under the surface.  This variety of type inference can be of immense benefit, reducing the amount of syntax required for type safety down to the barest minimum.  Our “bar” example actually started with (coincidentally) Ruby syntax and showed that it still had all the information we needed to verify type-safety.  Just a bit of information you might want to keep around for the next time someone suggests that all statically typed languages are overly-verbose.

The Joy of Concatenative Languages Part 2: Innately Functional

15
Dec
2008

In part one of this series, I introduced the concept of a stack-based language and in particular the syntax and rough ideas behind Cat.  However, to anyone coming into concatenative land for the first time, my examples likely seemed both odd and unconvincing.  After all, why would you ever use point-free programming when everyone else seems to be sold on the idea of name binding?  More importantly, where do these languages fit in with our established menagerie of language paradigms?

The answer to the first question really depends on the situation.  I personally think that the best motivation for concatenative languages is their syntax.  If you want to create an internal DSL, there will be no language better suited to it than one which is concatenative, Cat, Factor or otherwise.  This is because stack-oriented languages can get away with almost no syntax whatsoever.  They say that Lisp is a syntax-free language, but this holds even more strongly for languages like Cat.  Well, that and you don’t have to deal with all the parentheses…

The second question is (I think) the more interesting one: how do we classify these languages and what sort of methodologies should we apply?  At first glance, Cat (and other languages like it) seem to be quite imperative in nature.  After all, you have a single mutable stack that any function can modify.  However, if you turn your head sideways and blink twice, you begin to realize that concatenative languages are really much closer to the functional side of the oyster.

Consider the following Cat program:

define plus { + }
define minus { - }
 
7 2 3
plus minus

Trivial, but to the point.  This program first adds the numbers 2 and 3, then subtracts the result from 7.  Thus, the final result is a value of 2 on the stack.  The only twist is that we have defined functions plus and minus to do the dirty work for us.  This wasn’t strictly necessary, but I wanted to emphasize that + and - really are functions.  We could express the exact same program in Scala:

def plus(a: Int, b: Int) = a + b
def minus(a: Int, b: Int) = a - b
 
minus(7, plus(2, 3))

Do you see how the consecutive invocations of plus and minus in Cat became composed invocations in Scala?  This is where the term “concatenative language” derives from: the whole program is just a series of function compositions.  Wikipedia’s article on Cat has a very nice, mathematical description:

Two adjacent terms in Cat imply the composition of functions that generate stacks, so the Cat program f g is equivalent to the mathematical expressions and , where x is the stack input to the expression.

Strictly speaking, a concatenative language could be implemented without a stack, but such an implementation would likely be a bit harder to use than the average stack-based language.

Coming back to my original premise: concatenative languages are functional in nature.  Absolutely everything in Cat is a function.  Operators, words, even numeric literals like “3” are actually functions at the conceptual level.  Additionally, Cat, Joy and Factor all offer a mechanism for treating functions as first-class values:

2 3
[ + ]
apply

The square-bracket ([]) syntax is representative of a quotation.  Literally this mean “create a function of the enclosed words and place it as a value on the stack”.  We can pop this function off the stack and invoke it by using the apply word.  Incidentally, you may have noticed that this syntax is remarkably close to that which is used in if conditionals:

5 0 <
[ "strange math" ]
[ "all is well" ]
if

This syntax works because if isn’t conceptually a language primitive: it’s just another function which happens to take a boolean and two quotations off the stack.  For the sake of efficiency, Cat does indeed implement if as a primitive, but this was a deliberate optimization rather than an implementation forced by the language design.  Untyped Cat (see Part 3) is equivalent in power to the pure-untyped lambda calculus, and as our friend Alonzo Church showed us, if-style conditionals are easily accomplished:

TRUE = λa . λb . a
FALSE = λa . λb . b

IF = λp . λt . λe . p t e

Yeah, maybe we’re drifting a bit off-point here…

Higher-Order Programming

So if Cat is just another functional programming language, then we should be able to implement all of those higher-order design patterns that we’ve come to know and love in languages like Scala and ML.  To see how, let’s look at implementing some simple list manipulation functions in Cat.  The easiest would be to start with append, which pops two lists off of the stack and pushes a new list which is the end-to-end concatenation of the two originals:

define append {
  empty
  [ pop ]
  [
    uncons
    [append] dip
    cons
  ]
  if
}

This function first starts by checking to see if the top list is empty.  If so, then just pop it off the stack and leave the other right where it is.  Appending an empty list should always yield the original list.  However, if the head list is not empty, then we need to work a bit.  First, we decompose it into its tail and head, which are pushed onto the stack in order by the uncons function.  Next, we need to recursively append the tail with our second list on the stack.  However, the head of the list from uncons is in the way on top of the stack.  We could use stack manipulation to move things around and get our lists up to the head of the stack, but dip provides us with a handy, higher-order shortcut.  We temporarily remove the top of the stack, invoke the quotation “[append]” against the remainder and then push the old top back on top of the result.

The dip operation is surprisingly powerful, making it possible to completely live without either variables or multiple stacks.  Any non-trivial Cat program will need to make use of this handy function at some level.

Once we have the old head and the new appended-list on the stack, all we need to do is put them back together using cons.  This function leaves a new list on the stack in place of the old list and head element.  This Cat program is almost precisely analogous to the following ML:

fun append ls nil = ls
  | append ls (hd :: tail) = hd :: (append ls tail)

Personally, I find the ML a lot easier to read, but that’s just me.  Obviously it’s a lot shorter, but as it turns out, our Cat implementation, while intuitive, was sub-optimal.  Cat already implements append in the guise of the cat function, and it is far more concise than what I showed:

define cat {
  swap [cons] rfold
}

It’s almost frightening how short this is: only three words.  It’s not as if rfold is doing anything mysterious either; it’s just a simple right-fold function that takes a list, an initial value and a quotation, producing a result by traversing the entire list.  We can use something similar back in ML-land, achieving an implementation which is arguably equivalent in subjective elegance:

val append = foldr (op::)

Moving on, we can also implement a length function in Cat, this time using fold to tighten things up:

define length {
  0 [ pop 1 + ] fold
}

You’ll notice that we have to mess around a bit in the quotation in order to avoid the first “parameter”, the current element of the list (which we do not need).  Expressing this in ML yields a very similar degree of cruft:

val length = foldl (fn (n, _) => n + 1) 0

Conclusion

The important take-away from this tangled morass of an article is the fact that Cat is a highly functional language, capable of easily keeping up with some of the stalwart champions of the paradigm.  More significantly, this is a trait which is shared by all concatenative languages.  Rather than throwing away all of the old wisdom learned in language design, stack-based languages build on it by providing an alternative view into the world of functions.

In the next (and final) article of the series, we will take a brief look at the challenges of applying a type system to a concatenative language and the fascinating techniques used by Cat to achieve just that.

Is Scala Not “Functional Enough”?

20
Oct
2008

In one of Rich Hickey’s excellent presentations introducing Clojure, he mentions in passing that Scala “isn’t really a functional language”.  He says that Java and Scala are both cut from the same mold, and because Scala doesn’t force immutability it really shouldn’t qualify.  These viewpoint is something I’ve been hearing a lot of from various sources, people talking about how F# is really the only mainstream functional language, or how once Erlang takes off it will leave Scala in the dust.

When I first heard this sentiment voiced by Rich, I brushed it off as a little odd and only slightly self-serving (after all, if you don’t use Scala, there’s a better chance you will use Clojure).  Rich has his own opinions about a lot of things, but I have found with most that I can still understand his rationale, even if I don’t agree.  So, realizing that many of his other kooky ideas seemed to have some basis in reality, I decided to come back to his opinion on Scala and give it some deeper consideration.

The core of the argument made by Rich (and others) against Scala as a functional language goes something like this:

  • Mutable variables as first-class citizens of the language
  • Uncontrolled side-effects (ties in with the first point)
  • Mutable collections and other imperative libraries exist on equal footing
  • Object-oriented structures (class inheritance, overloading, etc)
  • Verbosity

Comparative Type Inference

If you’re coming from Java-land, the final point may have caught you a bit by surprise.  After all, Scala is vastly more concise than Java, so how could anyone possibly claim that it is “too verbose”?  Well, to answer that question, you have to compare Scala with the other side of the language jungle: the functional languages.  Here’s an explicitly-recursive function which sums a list of integers:

def sum(ls: List[Int]): Int = ls match {
  case hd :: tail => hd + sum(tail)
  case Nil => 0
}

That’s not too bad.  The use of pattern matching eliminates an entire class of runtime errors (selecting a non-existent element) and makes the code a lot cleaner than the equivalent Java.  However, compare this with the same function ported directly to SML (a functional language:

fun sum nil = 0
  | sum (hd :: tail) = hd + sum tail

One thing you’ll notice here is the complete lack of any type annotations.  Like most static functional languages, ML (and derivatives) has a form of type inference called “Hindley - Milner” (sometimes called “global type inference”).  Rather than just looking at a single expression to infer a type (like Scala), Hindley - Milner looks at the entire function and derives the most general (least restrictive) type which satisfies all expressions.  This means that everything can be statically type-checked with almost no need to declare types explicitly.

“Now, wait!” (you say), “You would never write a function just to sum a list; you should be using a fold.”  That’s true.  So let’s see how well these two languages do when the problem is solved in a more realistic fashion.  Once again, Scala first:

def sum(ls: List[Int]) = ls.foldLeft(0) { _ + _ }

Let’s see ML top that!

fun sum ls = foldl (op+) 0 ls

Then again, maybe we’ll just quit while we’re behind…

The fact is that Scala requires significantly more ceremony to accomplish some things which are trivial in pure-functional languages like ML and Haskell.  So while Scala may be a huge step up from Java and C++, it’s still a far cry from being the quickest and most readable way of expressing things in a functional style.

One obvious solution to this would be to just add Hindley - Milner type inference to Scala.  Well, this may be the “obvious” solution, but it doesn’t work.  Scala has an extremely powerful and complex type system, one with a number of properties which Hindley - Milner just can’t handle.  A full object-oriented inheritance hierarchy causes some serious problems with the “most general” inference of Hindley - Milner: just about everything becomes type Any (or close to it).  Also, method overloading can lead to ambiguities in the inferred types.  This is actually a problem even in the venerable Haskell, which imposes hard limitations on what functions can be in scope at any given point in time (so as to avoid two functions with the same name).

Simply put, Scala’s design forbids any type inference (that I know of) more sophisticated than local expression-level.  Don’t get me wrong, it’s still better than nothing, but a language with local type inference alone will never be as generally concise as a language with Hindley - Milner.

Side Effects

One big ticket item in the litany of complaints against Scala is the admission of uncontrolled side effects.  It’s not hard to find an example which demonstrates this property:

val name = readLine
println("Hello, " + name)

This example alone shows how fundamental side-effects are within the Scala language.  All we have done here is made two function calls, one of them passing a String and receiving nothing as a result.  From a mathematical standpoint, this code snippet is virtually a no-op.  However, we all know that the println function has an additional side effect which involves sending text to standard out.  Coming from Java, this makes perfect sense and it’s probably hard to see why this would be considered a problem.  However, coming from Haskell, what we just wrote was a complete abomination.

You see, Haskell says that no function should ever have side effects unless they are explicitly declared using a special type constructor.  In fact, this is one of the areas where monads have had a huge impact on Haskell’s design.  Consider the following Haskell equivalent:

main :: IO ()
main = do
         name <- getLine
         putStrLn ("Hello, " ++ name)

Even if you don’t know Haskell, the above should be pretty readable.  The first line is the type declaration for the main “function” (it’s actually a value, but why quibble).  Haskell does have Hindley - Milner type inference, but I wanted to be extra-explicit.  You’ll notice that main is not of type void or Unit or anything similar, it is actually of type IO parameterized with Haskell’s form of Unit: ().  This is an extremely important point: IO is a monad which represents an action with side-effects returning a value which matches its type parameter (in this case, ()).  The little dance we perform using do-notation is just a bit of syntax sugar allowing us to compose two other IO values together in a specific order.  The getLine “function” is of type IO String, meaning that it somehow reads a String value by using side effects (in this case, reading from standard in).  Similarly, putStrLn is a function of type String -> IO ().  This means that it takes a String as a parameter and uses it to perform some side effects, from which it obtains no result value.  The do-notation takes these two monadic values and composes them together, forming one big value of type IO ().

Now, this may seem horribly over-complicated, especially when compared to the nice clean side effects that we have in Scala, but it’s actually quite mathematically elegant.  You see, the IO monad is how we represent actions with side effects.  In fact, the only (safe) way to have side effects in Haskell is to wrap them up inside monad instantiations like these.  Haskell’s type system allows you to actually identify and control side effects so that they remain contained within discrete sections of your code base.

This may not sound so compelling, but remember that functional programming is all about eliminating side effects.  You compute your result, you don’t just accidentally find yourself with a magic value at the end of a long run.  The ability to work with side effects as packaged values just like any other constant is extremely powerful.  More importantly, it is far closer to the “true” definition of functional programming than what we have in Scala.

Conclusion

I hate to say it, but Rich Hickey and the others are quite right: Scala isn’t a terribly functional language.  Variables, mutable data structures, side effects and constant type declarations all seem to conspire to remove that crown from Scala’s proverbial brow.  But let’s not forget one thing: Scala wasn’t designed to be a functional language.

That may sound like heresy, but it’s true.  Scala was created primarily as an experiment in language design, specifically focusing on type systems.  This is the one area where I think Scala excels far beyond the rest of the field.  Scala makes it possible to model many problems in an abstract way and then leverage the type system to prove correctness at compile time.  This is approach is both revolutionary and an extremely natural way to solve problems.  The experience of using the type system in this fashion is a little difficult to describe (I’m still on the lookout for good examples), but trust me, you’ll like it when you see it.

Scala’s not really a functional language, and as Cedric Beaust has pointed out, it’s not really the best object-oriented language either; so what is it good for?  Scala sits in a strange middle ground between the two worlds of functional and object-oriented programming.  While this does have some disadvantages like being forced to take second place in terms of type inference, it also lets you do some really interesting stuff like build a mutable ListBuffer with constant time conversion to an immutable List, or sometimes recognize the fact that fold is not the universal solution.  It’s an experiment to be sure, but one which I think has yielded some very powerful, very useful results…just not many of a purely functional nature.