Tribulations of CanBuildFrom

Tuesday 30 May 2017

Julien Richard-Foy

CanBuildFrom is probably the most infamous abstraction of the current collections. It is mainly criticised for making scary type signatures.

Our ongoing collections redesign (blog post, GitHub repo) is an opportunity to try alternative designs. This post explains the (many!) problems solved by CanBuildFrom and the alternative solutions implemented in the new collections.

Transforming the elements of a collection

It’s useful to think of String as a collection of Char elements: you can then use the common collection operations like ++, find, etc. on String values.

However the map method is challenging because this one transforms the Char elements into something that might or might not be Chars. Then, what should be the return type of the map method on String values? Ideally, we want to get back a String if we transform each Char into another Char, but we want to get some Seq[B] if we transform each Char into a different type B. And this is the way it currently works:

Welcome to Scala 2.12.2 (OpenJDK 64-Bit Server VM, Java 1.8.0_131).
Type in expressions for evaluation. Or try :help.

scala> "foo".map(c => c.toInt)
res1: scala.collection.immutable.IndexedSeq[Int] = Vector(102, 111, 111)

scala> "foo".map(c => c.toUpper)
res2: String = FOO

This feature is not limited to the map method: flatMap, collect, concat and a few others also work the same. Moreover, String is not the only collection type that needs this feature: BitSet and Map are other examples.

The current collections rely on CanBuildFrom to implement this feature. The map method is defined as follows:

def map[B, That](f: Char => B)(implicit bf: CanBuildFrom[String, B, That]): That

When the implicit CanBuildFrom parameter is resolved it fixes the return type That. The resolution is driven by the actual B type: if B is Char then That is fixed to String, otherwise it is immutable.IndexedSeq.

The drawback of this solution is that the type signature of the map method looks cryptic.

In the new design we solve this problem by defining two overloads of the map method: one that handles Char to Char transformations, and one that handles other transformations. The type signatures of these map methods are straightforward:

def map(f: Char => Char): String
def map[B](f: Char => B): Seq[B]

Then, if you call map with a function that returns a Char, the first overload is selected and you get a String. Otherwise, the second overload is selected and you get a Seq[B]. Before Scala 2.12 such a solution would not have worked well: users would have been required to explicitly write the type of the argument of the supplied f function. In Scala 2.12 type inference has been improved so that it is not anymore necessary.

Thus, we got rid of the cryptic method signatures while still supporting the feature of returning a different type of result according to the type of the transformation function.

Collections’ type constructors with different arities

The collections are hierarchically organized. Essentially, the most generic collection is Iterable[A], and then we have three main kinds of collections: Seq[A], Set[A] and Map[K, V].

It is worth noting that Map[K, V] takes two type parameters (K and V) whereas the other collection types take only one type parameter. This makes it difficult to generically define, at the level of Iterable[A], operations that will return a Map[K, V] when specialized.

For instance, consider again the case of the map method. We want to generically define it on Iterable[A], but which return type should we use? When this method will be inherited by List[A] we want its return type to be List[B], but when it will be inherited by HashMap[K, V], we want its return type to be HashMap[L, W]. It is clear that we want to abstract over the type constructor of the concrete collections, but the difficulty is that they don’t always take the same number of type parameters.

That’s a second problem solved by CanBuildFrom in the current collections. Look again at the type signature of the (generic) map method on Iterable[A]:

def map[B, That](f: A => B)(implicit bf: CanBuildFrom[Repr, B, That]): That

The return type That is inferred from the resolved CanBuildFrom instance at call-site. Both the Repr and B types actually drive the implicit resolution: when Repr is List[_] the parameter That is fixed to List[B], and when Repr is HashMap[_, _] and B is a tuple (K, V) then That is fixed to HashMap[K, V].

In the new design we solve this problem by defining two “branches” in the hierarchy:

  • IterableOps for collections whose type constructor takes one parameter,
  • MapOps for collections whose type constructor takes two parameters.

Here is a simplified version of IterableOps:

trait IterableOps[A, CC[_]] {
  def map[B](f: A => B): CC[B]
}

The CC type parameter stands for Collection type Constructor. Then, the List[A] concrete collection extends IterableOps[A, List] to set its correct self-type constructor.

Similarly, here is a simplified version of MapOps:

trait MapOps[K, V, CC[_, _]] extends IterableOps[(K, V), Iterable] {
  def map[L, W](f: ((K, V)) => (L, W)): CC[L, W]
}

And then the HashMap[K, V] concrete collection extends MapOps[K, V, HashMap] to set its correct self-type constructor. Note that MapOps extends IterableOps: consequently it inherits from its map method, which will be selected when the transformation function passed to map does not return a tuple.

Sorted collections

The third challenge is about sorted collections (like TreeSet and TreeMap, for instance). These collections define their order of iteration according to an ordering relationship for the type of their elements.

As a consequence, when you transform the type of the elements (e.g. by using the – now familiar! – map method), an implicit ordering instance for the new type of elements has to be available.

With CanBuildFrom, the solution relies (again) on the implicit resolution mechanism: the implicit CanBuildFrom[TreeSet[_], X, TreeSet[X]] instance is available for some type X only if an implicit Ordering[X] instance is also available.

In the new design we solve this problem by introducing a new branch in the hierarchy. This one defines transformation operations that require an ordering instance for the element type of the resulting collection:

trait SortedIterableOps[A, CC[_]] {
  def map[B : Ordering](f: A => B): CC[B]
}

However, as mentioned in the previous section, we need to also abstract over the kind of the type constructor of the concrete collections. Consequently we have in total four branches:

kind not sorted sorted
CC[_] IterableOps SortedIterableOps
CC[_, _] MapOps SortedMapOps

In summary, instead of having one map method that supports all the use cases described in this section and the previous ones, we specialized the hierarchy to have overloads of the map method, each one supporting a specific use case. The benefit is that the type signatures immediately tell you the story: you don’t have to have a look at the actual implicit resolution to know the result you will get from calling map.

Implicit builders

In the current collections, the fact that CanBuildFrom instances are available in the implicit scope is useful to implement, separately from the collections, generic operations that work with any collection type.

Examples of use cases are:

In the new design we are still experimenting with solutions to support these features. So far the decision is to not put implicit builders in the collections implementation. We might provide them as an optional dependency instead, but it seems that most of these use cases could be supported even without implicit builders: you could just use an existing collection instance and navigate through its companion object (providing the builder), or you could just use the companion object directly to get a builder.

breakOut escape hatch

As we have previously seen, in the current collections when we want to transform some collection into a new collection, we rely on an available implicit CanBuildFrom instance to get a builder for the target collection. The implicit search is driven by the type of the initial collection and the type of elements of the target collection. The available implicit instances have been designed to make sense in the most common cases.

However, sometimes this default behavior is not what you want. For instance, consider the following program:

val xs: List[Int] = 1 :: 2 :: 3 :: Nil
val xsWithSquares: Map[Int, Int] =
  xs.map(x => (x, x * x))

If you try to compile it you will get a compile error because the implicitly resolved builder produces a List[(Int, Int)] instead of the desired Map[Int, Int]. We could convert this List[(Int, Int)] into a Map[Int, Int] but that would be inefficient for large collections.

We can fix this issue by using the breakOut escape hatch:

val xs: List[Int] = 1 :: 2 :: 3 :: Nil
val xsWithSquares: Map[Int, Int] =
  xs.map(x => (x, x * x))(collection.breakOut)

breakOut selects a CanBuildFrom instance irrespective of the initial collection type. This requires the target type to be known, in this case via an explicit type ascription.

In the new design we have no direct equivalent of breakOut. The solution of the above example consists in using a View to avoid the construction of an intermediate collection:

val xs: List[Int] = 1 :: 2 :: 3 :: Nil
val xsWithSquares: Map[Int, Int] =
  xs.view.map(x => (x, x * x)).to(Map)

In practice, we expect that most usages of breakOut could be adapted to the new design by using a View followed by an explicit to call. However, this is an area that remains to explore.

Summary

In this article we have reviewed the features built on top of CanBuildFrom and explained the design decision we made for the new collections to support most of these features without CanBuildFrom.