Let Them Be Lazy!

Tuesday 28 November 2017

Julien Richard-Foy

One of the main changes in the new design of the collections is that operations have “view based” default implementations rather than “builder based”. This blog article explains what this means as well as what it means for a collection to be lazy or strict, how the view based design allows us to support both types of collections, and finally why this is an improvement over the current design of the collections.

More background information about the collections redesign can be found in this blog post. The source code is available in this GitHub repository.

Background: Lazy vs Strict Collections

We can classify collections into two categories: those that evaluate their elements when they are created, and those that evaluate their elements when they are needed only.

An example of the first category is List[A]: when we create a list we compute all its elements, when we transform a list (e.g. using map) we compute all the elements of the resulting list. We say that list is a strict collection type.

An example of the second category is Stream[A]: we can create and transform streams without computing their elements. Those might never be computed unless we iterate through them. We say that streams are lazy.

Laziness allows us to create fancy infinite collections without blowing the memory: Stream.continually(42). In practice, lazy collections are particularly useful to describe successive transformation operations without evaluating intermediate transformations. Typically, if we want to transform the first n elements of a list using map we will probably write something like xs.take(n).map(f). Unfortunately, this expression will create an intermediate collection (resulting from the .take(n) call) although we are only interested in the result of the map call. Creating this intermediate collection can be an expensive operation that we can avoid by using a view, which is a lazy collection: xs.view.take(n).map(f).

What’s Wrong With Views?

Unfortunately, the current support of views in the standard library suffers from two issues.

Operations Default Implementations Break Laziness

In the current collections, the default implementations of operations that return new collections (e.g. map, filter, ++) are based on strict builders. Consequently, when these operations are inherited by lazy collections they break their laziness. Therefore, these operations have to carefully be overridden by lazy collections to disable the default behaviour and preserve laziness instead. From a maintainer point of view, this is a headache.

CanBuildFrom Doesn’t Work With Views

Unfortunately, views not only cause trouble for the standard library maintainers but also end users. Indeed, the primary mechanism to extend the collections with new operations, namely CanBuildFrom, is incompatible with views. This means that users that use an implicit CanBuildFrom instance to build a collection will get a runtime exception if that instance has been resolved for a view. This is because CanBuildFrom, as its name suggests, also uses a strict “builder based” approach. Therefore no sensible implementation of its methods can be provided for lazy collections.

Summary

We have seen that on one hand views are useful to avoid unnecessary computations but the other hand they give pain to maintainers and are dangerous for end users.

The collections redesign is an opportunity to fix these issues, as explained in the remainder of this article.

“View Based” Default Implementations

Essentially, in the new collections we reversed the situation. Instead of having strict by default operations, they are now lazy by default and concrete collection types decide whether to evaluate their result or not. For instance, here is the implementation of map:

trait IterableOps[A, CC[_], C] {
  def map[B](f: A => B): CC[B] = fromIterable(View.Map(this, f))
}

(where CC[_] is List or Vector)

We can break down this implementation into two steps.

First, we create a lazy View that “records” that the map operation has been applied. Constructing such a view is a cheap operation, here is the implementation of View.Map:

object View {
  case class Map[A, B](underlying: Iterable[A], f: A => B) extends View[B] {
    def iterator() = underlying.iterator().map(f)
  }
}

As you can see, unless we actually iterate on the view we don’t evaluate the elements of the underlying collection or the mapped elements.

Second, we build the resulting collection from the view by calling fromIterable. This operation is abstract in IterableOps and is implemented by leaf classes of the hierarchy (e.g. List, Vector, etc.). Here is the implementation of fromIterable in List, for instance:

trait List[A] extends IterableOps[A, List, List[A]] {
  def fromIterable[E](it: Iterable[E]): List[E] = (List.newBuilder[E]() ++= it).result()
}

This implementation creates a new list and adds all the elements of the passed Iterable[E] to it. By doing so, it necessarily computes the elements of the source collection and the resulting collection: List is a strict collection type.

Now let’s have a look at the fromIterable implementation in View:

trait View[A] extends IterableOps[A, View, View[A]] {
  def fromIterable[E](it: Iterable[E]): View[E] =
    new View[E] {
      def iterator() = it.iterator()
    }
}

Here we return a view that will only evaluate its elements (by delegating to the underlying collection elements) when it is actually iterated: View is a lazy collection type.

Let’s take a step back and consider again the implementation of the map operation:

trait IterableOps[A, CC[_], C] {
  def map[B](f: A => B): CC[B] = fromIterable(View.Map(this, f))
}

The first part of the implementation creates a lazy view of the transformed elements, and the second part of the implementation (the call to fromIterable) turns this view into the resulting collection (which can be done either in a lazy or strict way, as we’ve just seen).

Note that in practice, the map operation has been overridden in List (for performance reasons) so we don’t actually use the default implementation, but the reasoning presented above still applies to most operations.

Conclusion

We have classified collections type into two categories: strict collections eagerly evaluate their elements (examples of such collections are List, Vector, etc.), whereas lazy collections evaluate their elements only when they are needed (examples are Stream and View).

In the new collections, the default implementations of transformation operations preserve laziness: when they are applied to a strict collection they are strict, and when they are applied to a lazy collection they are lazy. To achieve this we switched from a builder based approach to a view based approach to create collections.

More precisely, the new pattern to implement an operation consists in decoupling the logic of the operation and the evaluation mode of the underlying collection type. The logic of the operation is captured in a View and the evaluation mode is defined by the fromIterable method implementation in leaf collection types.