Skip to content
Print

Scala Collections for the Easily Bored Part 2: One at a Time

28
Jul
2008

As I hinted previously, this series is intended to delve into Scala’s extensive collections API and the many ways in which it can make your life easier.  Probably the most important operations you could ever perform on collections are those which examine each element, one at a time.  After all, what’s a more common array idiom than looping over all values?

In that vein, this article starts by looking at foreach, the imperative programmer’s bread-and-butter when it comes to types like Array and List.  But rather than just stopping there, we also will look at more powerful, higher-order operations like fold, map and the ever-mysterious: flatMap.

Iterating

As I said above, looping over every item in a collection is probably the most heavily used operation in a programmer’s repertoire.  In fact, this pattern is so common in imperative languages that Java 5 saw the introduction of a special construct at the language level, just to make things a little easier.  For example:

String[] names = { "Daniel", "Chris", "Joseph" };
for (String name : names) {
    System.out.println(name);
}

This code should be old hat to anyone coming from a Java background.  Under the surface, this code is compiled into a while-loop with an iterator and an increment operation.  The code steps through the array, assigning each successive element to name.  Most statically typed languages have a construct similar to this.  For example, C# offers the foreach statement, which is almost precisely similar to Java’s enhanced for-loop, but with a slightly different syntax.  However, some languages (like Ruby) idiomatically eschew loops and rely instead on higher-order methods.  Translating the above into Ruby yields the following:

names = ['Daniel', 'Chris', 'Joseph']
names.each do |name|
  puts name
end

In this case, we aren’t using a loop of any kind, but rather creating a block (Ruby’s name for a closure) which takes a single parameter and passes it to the built-in puts method.  This block is then passed as an object to the each method of class Array, which calls the block once for each element in series.  Certainly, there is a language construct which encapsulates this, but using the each method directly is considered more “Ruby”.

The same approach is taken in Scala.  Rather than define a special construct for iteration, Scala simply provides the syntactic tools needed to construct higher-order methods like Ruby’s each.  Every collection in Scala’s library defines (or inherits) a foreach method (taking after its C# ancestry) which behaves precisely like Ruby’s each.  To show how, we will translate our example once more, this time into Scala:

val names = Array("Daniel", "Chris", "Joseph")
names.foreach { name =>
  println(name)
}

Here we define an anonymous method (Scala’s name for a closure) which takes a single parameter.  As in Ruby, this closure is passed to foreach and invoked for each array element.  In this way, foreach is a a so-called “higher-order” method, due to the fact that it accepts a parameter which is itself another method.  Scala’s implementation of this concept is actually a bit more general than Ruby’s, allowing us to shorten our example into the following:

val names = Array("Daniel", "Chris", "Joseph")
names.foreach(println)

This time, instead of creating an anonymous method to pass to foreach, we simply pass the println method itself.  The only criterion that foreach imposes on its parameter is that it is a method which accepts a single parameter of type String (the element type of the array).  Since we already have just such a method (println), there is no need to define another simply for encapsulation.

Unfortunately, despite its flexibility, foreach is not always the most syntactically concise way to iterate over a collection.  There are times that we just want to use a syntax which is similar to the for-loops available in other languages.  Well, fear not!  Scala’s for-comprehensions are more than capable of providing just such a syntax.  Consider the example of imperatively summing the values of a list:

val nums = List(1, 2, 3, 4, 5)
 
var sum = 0
for (n <- nums) {
  sum += n
}

At compile-time, the above is actually translated into an equivalent call to foreach, passing the body of the loop as the anonymous method.  This means that any class which correctly declares a foreach method may be used in a for-comprehension in this way, greatly reducing the syntactic overhead.

Folding

Looping is nice, but sometimes there are situations where it is necessary to somehow combine or examine every element in a collection, producing a single value as a result.  An example of this could be our previous example of summing a list.  Using foreach, we had to make use of a mutable variable (sum) and produce the result as a side effect.  This is fine for hybrid languages like Scala, but some languages actually lack mutable variables altogether.  In the previous post, I mentioned ML, which is a pure-functional language (almost).  Since pure-functional languages lack mutable state, the gods of computing needed to come up with some other way to accommodate this particular case.  The solution is called “folding”.

Folding a collection is literally the process of looking at each element in addition to a current accumulator and returning some value.  To make things more clear, let’s re-write our list summing example in a functional style:

val nums = List(1, 2, 3, 4, 5)
val sum = nums.foldLeft(0) { (total, n) =>
  total + n
}

It may seem a bit disguised, but this too is a loop, just like foreach.  For each element in the list (starting from the left), the foldLeft method will call the anonymous method we passed to it, providing both the current element, as well as the total accumulated from the previous call.  Since there was no previous call when the first element is processed, we must specify an initial value for total – in this case, 0.  Literally, the above can be flattened into the following:

val sum = 0 + 1 + 2 + 3 + 4 + 5

Of course, we would never want to hard-code a list in this fashion, but it serves as a sufficient illustration of the functionality.  I know from experience that when you first discover fold it’s difficult to see why anyone would want to use a construct so limited.  After all, doesn’t it just serve to obscure the true meaning of the code?  Well, take my word for it, fold is an almost indispensable tool…once you get to know it a little better.  Try keeping an eye out for times in your own code where a fold might be useful instead of a conventional loop.  You’ll be surprised how often it can be used to solve a problem, sometimes one not even intuitively related to accumulation.

There’s no special language-level syntax for fold, but Scala’s powerful operator overloading mechanism has allowed the designers of the collections API to create a special operator of rather dubious readability.  To illustrate, here’s our “summing a list” example once more:

val nums = List(1, 2, 3, 4, 5)
(0 /: nums) {_+_}

Yeah, I can’t read it either.  This example is semantically equivalent to the previous fold, but its meaning is a bit obfuscated by a) the bizarre right-associative operator, and b) a mysterious cameo by Scala’s ubiquitous underscore.  While I don’t have a problem using the underscore syntax in my own code, I don’t think that the fold operator improves anything other than number of characters.  I suppose it’s a matter of taste though.

Reduce

Fold has a closely related operation in Scala called “reduce” which can be extremely helpful in merging the elements of a sequence where leading or trailing values might be a problem.  Consider the ever-popular example of transforming a list of String(s) into a single, comma-delimited value:

val names = List("Daniel", "Chris", "Joseph")
val str = names.foldLeft("") { (acc, n) =>
  acc + ", " + n
}
 
println(str)

If you compile and run this code, you will actually arrive at a result which looks like the following:

, Daniel, Chris, Joseph

This is because folding a list requires a leading value, but this means that we have an extra separator running wild.  We could try a foldRight, but this would merely move the same problem to the tail of the string.  Interestingly enough, this problem of leading/trailing separators is hardly specific to folding.  I can’t tell you how many times I ran into this issue when constructing highly-imperative query synthesis algorithms for ActiveObjects.

The easiest way to solve this problem in Scala is to simply use a reduce, rather than a fold.  As a rule, any collection which defines foldLeft will also define reduceLeft (and the corresponding right methods).  Reduce distinguishes itself from fold in that it does not require an initial value to “prime the sequence” as it were.  Rather, it starts with the very first element in the sequence and moves on to the end.  Thus, the following code will produce the desired result for our names example:

val names = List("Daniel", "Chris", "Joseph")
val str = names.reduceLeft[String] { (acc, n) =>
  acc + ", " + n
}
 
println(str)

There are of course a few small problems with this approach.  Firstly, it is not as general as a fold.  Reduce is designed primarily for the iterate/accumulate pattern, whereas fold may be applied to many problems (such as searching a list).  Also, the reduce method will throw an exception if the target collection is empty.  Finally, Scala’s type inference isn’t quite clever enough to figure out what’s going on without the explicit [String] type parameter (since our result is of type String).  Still, even with all these shortcomings, reduce can be a very powerful tool in the right hands.

Mapping

We’ve seen how fold can be an extremely useful tool for applying a computation to each element in a collection and arriving at a single result, but what if we want to apply a method to every element in a collection in-place (as it were), creating a new collection of the same type with the modified elements?  Coming from an imperative background, this probably sounds a little abstract, but like fold, map can be extremely useful in many common scenarios.  Consider the example of transforming a list of String(s) into a list of Int(s):

val strs = List("1", "2", "3", "4", "5")
val nums = strs.map { s =>
  s.toInt
}
 
nums == List(1, 2, 3, 4, 5)   // true

The final expression in this snippet is just to illustrate what really happens to the list elements when map is called.  Literally, the map method walks through each element in the list, calls the provided method and then stores the result in the corresponding index of a new list.  (list is immutable, remember?)  If you think about it, this is very similar to looping with foreach except that at each iteration we produce a value which is saved for future use.

Another common application of this technique might be to cast an entire array from one type to another.  I often make use of XMLRPC, which has the unfortunate property of stripping all type information from its composite types.  Thus, I often find myself writing code like this:

public void rpcMethod(Object[] params) {
    String[] strParams = new String[params.length];
    for (int i = 0; i < params.length; i++) {
        strParams[i] = (String) params[i];
    }
}

It’s a lot of trouble to go through, but I really don’t know any better way.  We can’t just cast the array to String[], since the array itself is not of type String[], only its elements match that type.  Java doesn’t support higher-order operations such as map, but fortunately Scala does.  We can translate the above into a functional style and gain tremendously in both readability and conciseness:

def rpcMethod(params: Array[Object]) {
  val strParams = params.map { _.asInstanceOf[String] }
}

For the sake of brevity, you’ll notice that I made use of the underscore syntax as a placeholder for the anonymous method parameter.  This syntax works remarkably well for short operations like the above, where all we need to do is take the input value and cast it to a new type.

As it turns out, mapping over a collection is a phenomenally common operation, perhaps even more so than fold.  For that reason, the creators of Scala decided that it was worth adding a special syntax sugar built into the powerful for-comprehension mechanism.  With a little bit of tweaking, we can transform our casting example into an arguably more readable form:

def rpcMethod(params: Array[Object]) {
  val strParams = for (p <- params) yield p.asInstanceOf[String]
}

At compile-time, these two forms are literally equivalent.  In some ways it is a matter of taste as to which is better.  I personally tend to favor directly calling map for simple, non-combinatorial operations like this, but to each his own.

Binding

Actually, the name “bind” comes from Haskell.  Scala’s term for this operation is “flatMap” because the operation may be viewed as a combination of the map and flatten methods.  Of all of the techniques discussed so far, this one probably has the richest theoretical implications.  Coming straight from the menacing jungles of category theory and the perplexing wasteland of monads, flatMap is both intriguing and apparently useless (at first glance anyway).

Like map, flatMap walks through every element in a collection and applies a given function, saving the value for later use.  However, unlike map, flatMap expects the return type of the specified function to be the same as the enclosing collection with an optionally different type parameter.  If we look at this in terms of our number-converting example from previously, this means that our anonymous method must not return a value of type Int, but rather of type List[Int].  Once flatMap has all of the resultant List[Int] values, it flattens them into a single list containing all of the elements from each of the inner lists.

Ok, that was utterly un-helpful.  Maybe the method signature would be more illuminating:

class List[A] {   // don't try this at home
  def flatMap[B](f: (A)=>List[B]): List[B]
}

Other than forcing order of evaluation, I can’t personally think of too many common cases where this sort of operation is useful.  However, one contrived example does spring to mind.  Consider once more the example of converting a list of String(s) into a list of Int(s), but this time assume that some of the String elements might not nicely convert into integer values:

val strs = List("1", "two", "3", "four", "five")
val nums = strs.flatMap { s =>
  try {
    List(s.toInt)
  } catch {
    case _ => Nil
  }
}
 
nums == List(1, 3)    // true

Recall that in Scala, everything is an expression – including try/catch blocks – therefore, everything has a value.  This code literally walks through the entire list and tries to convert each element into an integer and wrap the result in a List.  If the conversion fails (for whatever reason), an empty list is returned (Nil).  Because we return an empty list for those elements which cannot be converted, flatMap literally resolves those results out of existence, leading to a List which only contains two Int(s).  For the monadically uninclined among us, this is precisely the reason why Nil is referred to as the “zero” of the List monad.  However, that’s a topic for an entirely different series

Conclusion

Ok, so this article was a bit longer than I really wanted to run, but there’s a lot of material to cover!  Scala’s collections framework shows how even operations steeped in mountains of theory can still prove useful in solving common problems.  Now, every time I use collections in Java (or even Ruby), I find myself reaching for many of these same methods, only to find them either unavailable or less powerful than I would like.  Scala provides a welcome refuge for all those of us who desire more powerful collection types in our daily programming.

Be with us next time for filter, forall, exists and more!

Comments

  1. Can you really not think of a good use case for flatMap? It’s useful whenever you want to follow links in things. In particular whenever you have code like

    for ( x for (y person.pets)

    Or

    people.flatMap(_.pets)

    if you like that sort of thing.

    Another example is using this recursively:

    def descendants(file : File) = {
    val children = file.listFiles;
    children ++ children.flatMap(descendants(_));
    }

    (actually this code doesn’t work, as I should check if children is null because the behaviour for non-directories is stupid, but I couldn’t be bothered)

    etc.

    David R. MacIver Monday, July 28, 2008 at 7:17 am
  2. Ugh html. The for loop code there seems to have got fucked up. Sorry about that.

    David R. MacIver Monday, July 28, 2008 at 7:41 am
  3. “Under the surface, this code is getting an Iterator for the array and then performing the appropriate loop operations, ”

    You may want to double check that. Arrays in an enhanced for loop are just normal array for iterations, no iterator created.

    Nils Monday, July 28, 2008 at 9:04 am
  4. @Nils

    Hmm, seems you’re right. I tested using the following code:

    public class Test {
    public static void main(String[] args) {
    for (String s : args) {
    s.length();
    }
    }
    }

    And the result was this:

    0: aload_0
    1: astore_1
    2: aload_1
    3: arraylength
    4: istore_2
    5: iconst_0
    6: istore_3
    7: iload_3
    8: iload_2
    9: if_icmpge 29
    12: aload_1
    13: iload_3
    14: aaload
    15: astore 4
    17: aload 4
    19: invokevirtual #2; //Method java/lang/String.length:()I
    22: pop
    23: iinc 3, 1
    26: goto 7
    29: return

    Looks like the enhanced for-loop on arrays actually compiles down to a while loop with an int iterator. This could technically be an optimization, but it does make sense that this would be the case, rather than the use of a full iterator object. Will correct the article.

    @David

    Yes, legitimately I can’t think of a common use-case. That’s not to say that they don’t exist (you seem to have a few handy), I just haven’t run into them yet or haven’t fully recognized them for what they are. flatMap is one of those functions that I can see its theoretical utility, but haven’t quite figured out how to apply it to my every-day development cycle. The comment in the article about not knowing of a use-case was intended to elicit comments such as yours. :-)

    Daniel Spiewak Monday, July 28, 2008 at 9:38 am
  5. Hi Daniel

    Thanks for providing another excellent resource for those of us trying to learn scala. People following your series may be interested in trying the exercises over on Tony’s blog.

    http://blog.tmorris.net/revised-scala-exercises/

    Jeff Tuesday, July 29, 2008 at 2:08 am
  6. Personally, I think that *anyone* interested in Scala on any sort of serious level should mosey on down to Tony’s blog. He tends to cover topics which have more deeply theoretical implications, which is always interesting. :-) Good stuff, that.

    Daniel Spiewak Tuesday, July 29, 2008 at 8:32 am
  7. One use for flatMap would be any time that the individual items in the input list may produce multiple items in the output list. For example, suppose you had a list of numbers and you wanted to make a big list of all their prime factors:

    /** Returns the List of prime factors of the integer n. */
    def primeFactors(n: int) = {
    ~~~~/** Finds prime factors of n, given that no factors are below min */
    ~~~~def primeFactorsGivenMin(n: int, min: int): List[int] = if (min > Math.sqrt(n)) {
    ~~~~~~~~// n must be prime as it has no factors smaller than sqrt(n)
    ~~~~~~~~List(n)
    ~~~~} else {
    ~~~~~~~~if (n % min == 0) {
    ~~~~~~~~~~~~// min must be prime as it is the smallest factor of n
    ~~~~~~~~~~~~min :: primeFactorsGivenMin(n / min, min)
    ~~~~~~~~} else {
    ~~~~~~~~~~~~primeFactorsGivenMin(n, min + 1)
    ~~~~~~~~}
    ~~~~}
    ~~~~primeFactorsGivenMin(n, 2)
    }

    val someNumbers = List(5, 7, 9, 64, 79, 100)

    someNumbers.flatMap(primeFactors)

    OUTPUT:
    List[int] = List(5, 7, 3, 3, 2, 2, 2, 2, 2, 2, 79, 2, 2, 5, 5)

    (Sorry for the tildes, but I can’t stand code that isn’t indented properly).

    Michael Chermside Tuesday, July 29, 2008 at 12:54 pm
  8. A valid example, but how often do you find yourself doing things like that in a real-world project? flatMap works very well for when a transformation must be applied which yields a List of any different size, be it smaller or larger. It also works well on other monads, such as Option, for when operations must be performed on the enclosed value (if it exists) and the result of the operation is itself an instance of the monad. I don’t really see much of this in my daily programming, however.

    Daniel Spiewak Tuesday, July 29, 2008 at 1:02 pm
  9. Daniel:

    Okay, that’s certainly true. I keep “each item of the input list may return multiple items in the output list” in my mind as the canonical usage, but in practice my most common use is “each item in the input list produces one or zero items in the output list” (which was your original example).

    Michael Chermside Wednesday, July 30, 2008 at 4:53 am
  10. in the sample code above:

    val nums = List(1, 2, 3, 4, 5)
    val sum = nums.foldLeft(0) { (n, total) =>
    n + total
    }

    from the meaning of the argument names “n” and “total”, you seemed to get the order of the argument wrong. i think it should be:

    val nums = List(1, 2, 3, 4, 5)
    val sum = nums.foldLeft(0) { (total, n) =>
    n + total
    }

    the code worked because the initial value and elements in the list have the same type and all the block does is summing. change the block to “n * 10 + total” to see the difference.

    walter chang Monday, August 4, 2008 at 1:47 am
  11. @walter

    Quite correct! I always get so dyslexic with fold and which direction its arguments go. Obviously if they are different types then it’s not an issue, but otherwise, and especially with commutative operations, it just gets too darn confusing. :-)

    Daniel Spiewak Monday, August 4, 2008 at 2:35 am
  12. Looks like the link to part 3 of this series is broken.
    Expected:
    http://www.codecommit.com/blog/scala/scala-collections-for-the-easily-bored-part-3
    Actual:
    http://www.codecommit.com/scala-collections-for-the-easily-bored-part-3

    In case you can’t tell, I’ve been writing a lot of unit test today :)

    Kevin Albrecht Saturday, October 11, 2008 at 10:13 pm
  13. Indeed, it is incorrect. :-) Fixed!

    Daniel Spiewak Saturday, October 11, 2008 at 10:25 pm
  14. I am student of Scala and have looked for tutorial material to help in my leaning process.

    After a lengthy search and several attempts to find something I can understand, I found this series of articles on Scala very helpful; thanks for great material expressed in a language and format that is easy to follow and learn.

    I hope after reading these articles I’ll be ready for Tony Martin’s blog that contains very interesting material, monads for example, but that at the moment I am unable to understand.

    J.F. Zarama Sunday, January 4, 2009 at 12:57 am
  15. Hi Daniel!
    I just want to ask if you still can’t think of a good flatMap example, I think I read something related sometime ago via your twitter account. BTW : Did you know if you google “scala flatMap” this page is ranked first (hu?) its so famous its even mentioned here:
    http://weblogs.java.net/blog/cayhorstmann/archive/2010/05/16/xml-processing-scala

    regards andreas

    Andreas S. Tuesday, May 18, 2010 at 11:46 am
  16. @Andreas

    I have very much reversed my position on flatMap. Now I find myself using it all over the place.

    One task that it is particularly good for is recursive traversal of a tree structure returning a flat list of results. For example, imagine you have an n-ary tree (e.g. an XML document) from which you want to extract all of the items which fit a certain criterion. The best way to do this is recursively traverse the tree, flatMapping over the children at each step.

    An even simpler (but equally-common) use of flatMap is projection of a sequence. For example, imagine that you have a List[Any] which contains both Int(s) and String(s). You want to project on this sequence to get just the String values. The naïve way to do this is to use filter:

    val xs: List[Any] = ...
    val filtered = xs filter { _.isInstanceOf[String] }

    Unfortunately, in this example, filtered will be of type List[Any]. When you use filter, the type system isn’t made aware of what you’re doing, and so it can’t help you by narrowing the type. The better approach is to use flatMap:

    val xs: List[Any] = ...
    val filtered = xs flatMap {
      case s: String => Some(s)
      case _ => None
    }

    In this snippet, `filtered` will be of type List[String]. By using flatMap, we have effectively informed the type system that we’re filtering based on type, and so the type system is able to narrow things down for us, providing a more specific type. Note that this case is even more suitably solved in Scala 2.8 by the partialMap method.

    The point is that flatMap is hardly as useless as I claimed in the article. I’m leaving the old text for historical reasons (as you pointed out, people have quoted it now). However, I categorically disagree with what I wrote. :-)

    Daniel Spiewak Tuesday, May 18, 2010 at 12:09 pm
  17. I’m in the early stages of learning Scala and found this article very accessible and useful. But after reading the whole article and the entire comment stream I was surprised at the very end (and happy) to find your reversal on the utility of flatMap. Rather than keeping this plot twist for the very end of the page for newbies like me who might not know better … maybe you could modify the original text and put a short “see below” comment?

    Jon Sorenson Saturday, August 28, 2010 at 1:30 pm
  18. This was mentioned earlier but I find flatMap to work beautiful when the passed function returns Option. No need for a if or match to see if it contains something or not. I use this all the time.

    Cristian Vrabie Tuesday, June 21, 2011 at 3:20 am
  19. A good example of fllatmap might be say you have a list of lines read from a file. And you want to turn it into a list of words in the file. so, here both the input and output could be Seq[String]

    Arun Ramakrishnan Thursday, July 21, 2011 at 3:57 pm
  20. “Scala’s type inference isn’t quite clever enough to figure out what’s going on without the explicit [String] type parameter (since our result is of type String).”

    I am using scala 2.9 and this doesn’t seem to be the case, as shown below

    scala> names reduceLeft { _ + “,” + _}
    res70: java.lang.String = Daniel,Chris,Joseph

    numan Saturday, October 15, 2011 at 9:36 am

Post a Comment

Comments are automatically formatted. Markup are either stripped or will cause large blocks of text to be eaten, depending on the phase of the moon. Code snippets should be wrapped in <pre>...</pre> tags. Indentation within pre tags will be preserved, and most instances of "<" and ">" will work without a problem.

Please note that first-time commenters are moderated, so don't panic if your comment doesn't appear immediately.

*
*