Wednesday 13 June 2018
One more article about the standard collections, really? Indeed, during the last 18 months a lot of work has been done on the collections side and we’ve published several blog articles and given several talks to explain the various changes or challenges we were facing. This article summarizes what is going to change from an end-user perspective.
In case you’ve thoroughly followed our previous blog posts and talks, you might not learn much from this article. Otherwise, this is the perfect opportunity to catch up on the topic in a few minutes!
The next section presents the changes that are internal to the collections implementation
but that might have some visible impact on the surface. Then, I will show why I think
that the removal of
CanBuildFrom made the API more beginner friendly. Next, I
will introduce some new operations available in the collections. Finally, I
will mention the main deprecations, the motivations behind them, and their
Under The Hood: A Cleaner Ground
The most important change in the new collections framework is that transformation
operations (such as
filter) are now implemented in a way that works with both
strict collections (such as
List) and non-strict collections (such as
This is a change because this was not the case before. Indeed, the previous
implementations were strict (they eagerly evaluated the collection elements) and had
to be overridden by non-strict collection types. You can find more details about that in
this blog post.
The good news is that the new design is more correct in the sense that you can
now implement custom non-strict collection types without having to worry about
re-implementing a ton of operations. (Some operations, though, still eagerly evaluate
the collection elements (e.g.
groupBy) and will be clearly documented.) Another benefit
is that transformation operations defined outside of the collections (like in the
now work with non-strict collections (such as
Speaking of non-strict collections, the
View type has been redesigned and
views should behave in a more predictable way. Also,
Stream has been
deprecated in favor of
LazyList (see the last section).
I think the most visible change for end-users is that transformation operations
CanBuildFrom anymore. I believe this will be quite visible despite our previous
efforts to hide
CanBuildFrom from the API documentation of the collections.
Indeed, if you take a look at the
List API, the signature
shown for the
map operation does not mention
However, if you use this operation in your code, then your IDE reveals its actual signature:
As you can see, the type signature shown in the API documentation has been “simplified”
to make it more approachable, but I believe that this is probably introducing more
confusion to the users. Especially when you look at the
TreeMap[A, B] API:
This type signature makes no sense: the result type can not be
TreeMap takes two type parameters (the type of keys and the type
of values). Also, the function
f actually takes a key-value pair as parameter,
not just a key (as incorrectly indicated by the type
CanBuildFrom was used for good reasons, in particular the type
in the above screenshot was computed according to the type of the source
collection and the type of elements of the new collection. The case of
is compelling: in case you transform your key-value pairs into other key-value
pairs for which the type of keys has an implicit
Ordering instance, then
TreeMap, but if there is no such
Ordering instance then the best
collection type that can be returned is
Map. And if you transform the key-value
pairs into something that is not even a pair, then the best collection type
that can be returned is
Iterable. These three cases were supported by
a single operation implementation, and
CanBuildFrom was used to abstract over
the various possible return types.
In the new collections we wanted to have simpler type signatures so that we can show their actual signature in the API documentation, and auto-completion provided by IDEs is not scary. We achieve that by using overloading, as explained in more detail in this blog article.
In practice, this means that the new
TreeMap has three overloads of the
These type signatures are the actual ones and they essentially translate
“in types” what I’ve written above about the possible result types of
according to the type of elements returned by the transformation function
We believe that the new API is simpler to understand.
New And Noteworthy
We have introduced a few new operations. The following sections present some of them.
A common pattern with the old 2.12 collections is to use
mapValues to transform the groups. For instance,
this is how we can index the names of a collection of users by
case class User(name: String, age: Int) def namesByAge(users: Seq[User]): Map[Int, Seq[String]] = users.groupBy(_.age).mapValues(users => users.map(_.name))
There is a subtlety in this code. The static return type is
Map implementation actually returned is lazy and evaluates
its elements each time it is traversed (ie the
users => users.map(_.name)
function is evaluated each time the
Map is traversed).
In the new collections the return type of
mapValues is a
Map, to clearly indicate that its contents is evaluated each time it
Furthermore, we have introduced an operation named
that both groups elements and transforms the groups. The above code
can be rewritten as follows to take advantage of
def namesByAge(users: Seq[User]): Map[Int, Seq[String]] = users.groupMap(_.age)(_.name)
Map is strict: it eagerly evaluates its elements
once. Also, the fact that it is implemented as a single operation
makes it possible to apply some optimizations that make it
~1.3x faster than the version that uses
InPlace Transformation Operations
Mutable collections have a couple of new operations for transforming
their elements in place: instead of returning a new collection (like
filter do) they mutate the source collection. These
operations are suffixed with
InPlace. For instance, to remove
users whose name start with the letter
J from a buffer and then
increment their age, one can now write:
val users = ArrayBuffer(…) users .filterInPlace(user => !user.name.startsWith("J")) .mapInPlace(user => user.copy(age = user.age + 1))
Deprecations For Less Confusion
A consequence of cleaning and simplifying the collections framework is that several types or operations have been deprecated in Scala 2.13 and will be removed in 2.14.
Iterable Is The Top Collection Type
We felt that having a distinction between
Iterable was not
worth it, so we removed
Traversable (it is now an alias to
Iterable[A] is now the collection type at the top of the hierarchy.
Its only abstract member is
def iterator: Iterator[A].
LazyList Is Preferred Over
Stream is deprecated in favor of
LazyList. As its name suggests,
LazyList is a linked list whose elements are lazily evaluated. An
important semantic difference with
Stream is that in
the head and the tail are lazy, whereas in
Stream only the tail is lazy.
Insertion And Removal Operations Are Not Available On Generic Collections
In the old 2.12 framework, the
scala.collection.Map type has a
+ and a
to add and remove entries. The semantics of these operations is to return a new collection
with the added or removed entries, without changing the source collection.
These operations are then inherited by the mutable branch of the collections. But the mutable
collection types also introduce their own insertion and removal operations, namely
which modify the source collection in place. This means that the
+=, as well as
Having all these operations can be handy in some cases but can also introduce confusion. If you want
-, then you probably wanted to use an immutable collection type in the first place…
updated have been moved from
- have been moved from
We think that by deprecating these insertion and removal operations from generic collection
types and by having distinct operations between the
immutable branches we make
the situation clearer.
In summary, the changes for end-users are the following:
- non-strict collections (such as views) are safer to use and simpler implement,
- type signatures of transformation operations (such as
map) are simpler (no implicit
- new cool operations have been added,
- the type hierarchy is simpler (no
- mutable collection types do not inherit immutable insertion and removal operations.