The Architecture of Scala Collections -- Integrating new sets and maps

As a second example you'll learn how to integrate a new kind of map into the collection framework. The idea is to implement a mutable map with String as the type of keys by a "Patricia trie"¹ The term Patricia is in fact an abbreviation for "Practical Algorithm to Retrieve Information Coded in Alphanumeric." The idea is to store a set or a map as a tree where subsequent character in a search key determines uniquely a descendant tree. For instance a Patricia trie storing the three strings "abc", "abd", "al", "all", "xy" would look like this:

To find the node corresponding to the string "abc" in this trie, simply follow the subtree labeled "a", proceed from there to the subtree labelled "b", to finally reach its subtree labelled "c". If the Patricia trie is used as a map, the value that's associated with a key is stored in the nodes that can be reached by the key. If it is a set, you simply store a marker saying that the node is present in the set.

import collection._

class PrefixMap[T]

extends mutable.Map[String, T]

with mutable.MapLike[String, T, PrefixMap[T]] {

var suffixes: immutable.Map[Char, PrefixMap[T]] = Map.empty

var value: Option[T] = None

def get(s: String): Option[T] =

if (s.isEmpty) value

else suffixes get (s(0)) flatMap (_.get(s substring 1))

def withPrefix(s: String): PrefixMap[T] =

if (s.isEmpty) this

else {

val leading = s(0)

suffixes get leading match {

case None =>

suffixes = suffixes + (leading -> empty)

case _ =>

}

suffixes(leading) withPrefix (s substring 1)

}

override def update(s: String, elem: T) =

withPrefix(s).value = Some(elem)

override def remove(s: String): Option[T] =

if (s.isEmpty) { val prev = value; value = None; prev }

else suffixes get (s(0)) flatMap (_.remove(s substring 1))

def iterator: Iterator[(String, T)] =

(for (v <- value.iterator) yield ("", v)) ++

(for ((chr, m) <- suffixes.iterator;

(s, v) <- m.iterator) yield (chr +: s, v))

def += (kv: (String, T)): this.type = { update(kv._1, kv._2); this }

def -= (s: String): this.type = { remove(s); this }

override def empty = new PrefixMap[T]

}

Patricia tries support very efficient lookups and updates. Another nice feature is that they support selecting a subcollection by giving a prefix. For instance, in the patricia tree above you can obtain the sub-collection of all keys that start with an "a" simply by following the "a" link from the root of the tree.

Based on these ideas we will now walk you through the implementation of a map that's implemented as a Patricia trie. We call the map a PrefixMap, which means that it provides a method withPrefix that selects a submap of all keys starting with a given prefix. We'll first define a prefix map with the keys shown in the running example:

scala> val m = PrefixMap("abc" -> 0, "abd" -> 1, "al" -> 2,

"all" -> 3, "xy" -> 4)

m: PrefixMap[Int] = Map((abc,0), (abd,1), (al,2), (all,3),

(xy,4))

scala> m withPrefix "a"

res14: PrefixMap[Int] = Map((bc,0), (bd,1), (l,2), (ll,3))

A prefix map node has two mutable fields: suffixes and value. The value field contains an optional value that's associated with the node. It is initialized to None. The suffixes field contains a map from characters to PrefixMap values. It is initialized to the empty map.

You might ask why did we pick an immutable map as the implementation type for suffixes? Would not a mutable map have been more standard, since PrefixMap as a whole is also mutable? The answer is that immutable maps that contain only a few elements are very efficient in both space and execution time. For instance, maps that contain fewer than 5 elements are represented as a single object. By contrast, the standard mutable map is a HashMap, which typically occupies around 80 bytes, even if it is empty. So if small collections are common, it's better to pick immutable over mutable. In the case of Patricia tries, we'd expect that most nodes except the ones at the very top of the tree would contain only a few successors. So storing these successors in an immutable map is likely to be more efficient.

Now have a look at the first method that needs to be implemented for a map: get. The algorithm is as follows: To get the value associated with the empty string in a prefix map, simply select the optional value stored in the root of the tree. Otherwise, if the key string is not empty, try to select the submap corresponding to the first character of the string. If that yields a map, follow up by looking up the remainder of the key string after its first character in that map. If the selection fails, the key is not stored in the map, so return with None. The combined selection over an option value is elegantly expressed using flatMap. When applied to an optional value, ov, and a closure, f, which in turn returns an optional value, ov flatMap f will succeed if both ov and f return a defined value. Otherwise ov flatMap f will return None.

The next two methods to implement for a mutable map are += and -=. In the implementation of PrefixMap, these are defined in terms of two other methods: update and remove.

The remove method is very similar to get, except that before returning any associated value, the field containing that value is set to None. The update method first calls withPrefix to navigate to the tree node that needs to be updated, then sets the value field of that node to the given value. The withPrefix method navigates through the tree, creating sub-maps as necessary if some prefix of characters is not yet contained as a path in the tree.

The last abstract method to implement for a mutable map is iterator. This method needs to produce an iterator that yields all key/value pairs stored in the map. For any given prefix map this iterator is composed of the following parts: First, if the map contains a defined value, Some(x), in the value field at its root, then ("", x) is the first element returned from the iterator. Furthermore, the iterator needs to traverse the iterators of all submaps stored in the suffixes field, but it needs to add a character in front of every key string returned by those iterators. More precisely, if m is the submap reached from the root through a character chr, and (s, v) is an element returned from m.iterator, then the root's iterator will return (chr +: s, v) instead. This logic is implemented quite concisely as a concatenation of two for expressions in the implementation of the iterator method in PrefixMap. The first for expression iterates over value.iterator. This makes use of the fact that Option values define an iterator method that returns either no element, if the option value is None, or exactly one element x, if the option value is Some(x).

import scala.collection.mutable.{Builder, MapBuilder}

import scala.collection.generic.CanBuildFrom

object PrefixMap extends {

def empty[T] = new PrefixMap[T]

def apply[T](kvs: (String, T)*): PrefixMap[T] = {

val m: PrefixMap[T] = empty

for (kv <- kvs) m += kv

}

def newBuilder[T]: Builder[(String, T), PrefixMap[T]] =

new MapBuilder[String, T, PrefixMap[T]](empty)

implicit def canBuildFrom[T]

: CanBuildFrom[PrefixMap[_], (String, T), PrefixMap[T]] =

new CanBuildFrom[PrefixMap[_], (String, T), PrefixMap[T]] {

def apply(from: PrefixMap[_]) = newBuilder[T]

def apply() = newBuilder[T]

}

Note that there is no newBuilder method defined in PrefixMap. There is no need to, because maps and sets come with default builders, which are instances of class MapBuilder. For a mutable map the default builder starts with an empty map and then adds successive elements using the map's += method. Mutable sets behave the same. The default builders for immutable maps and sets use the non-destructive element addition method +, instead of method +=.

However, in all these cases, to build the right kind of set or map, you need to start with an empty set or map of this kind. This is provided by the empty method, which is the last method defined in PrefixMap. This method simply returns a fresh PrefixMap.

We'll now turn to the companion object PrefixMap. In fact it is not strictly necessary to define this companion object, as class PrefixMap can stand well on its own. The main purpose of object PrefixMap is to define some convenience factory methods. It also defines a CanBuildFrom implicit to make typing work out better.

The two convenience methods are empty and apply. The same methods are present for all other collections in Scala's collection framework so it makes sense to define them here, too. With the two methods, you can write PrefixMap literals like you do for any other collection: