- About Scala
- In the Enterprise
- Scala Community
- Language Research
- In the Press
- The Scala Team
- Scala's Prehistory
- Contact Us
- Learning Scala
- Tour of Scala
- Scala API
- Setup & Getting Started
- Programming Guides
- Other Guides
- Code Examples
- Scala Developers
Scala's collections not scalable because of... 32 bits sizes ?
Tue, 2011-02-15, 00:08
As you know, Scala's collection currently mimic Java's when it comes
to sizes ; it's all 32 bits ints everywhere : Traversable.size,
Seq.apply, .slice, .zipWithIndex...
The thing is, when Java came out about 15 years ago it was *good* when
you had 32 MB of RAM, so the 2+ billion elements limit seemed very
high up for collections. In contrast, tonight I've started configuring
my shiny new server with 24 GB of RAM.
2 billion elements already seems little to me (for scientifical
computations, for instance), but I'll let you imagine how small it
will look like in another 15 years.
As a result, to deal with very large amounts of data in Scala (on a 64
bits JVM, which is becoming commonplace) we're already facing
limitations. If it were only for Java arrays (which size is unlikely
to be ever changed from int to long), we could create a LargeArray[T]
that would split its data into many 2+G sub-arrays, but then it's near-
impossible to fit such a collection into Scala's standard collections
("def size: Long" cannot override "def size: Int").
So it's gonna be hard and painful but I believe we just need one more
breaking change in Scala to make it not future-proof, but really just
present-proof and actually *scalable*.
The naive way to go would be to change all the indexes / sizes-related
code from Int to Long and see everything collection-related break,
just about everywhere (that would be the "shortest suicide note in
There are a few things that could make it possible to have a smooth
transition, though (although for sure I'm not seeing the full picture
- obviously, first annotate the Int -> Long modified types with some
@widenedin30 annotation ; Or use some "type SizeType = Long // Int"
- add some compiler magic to accept Ints with a warning and insert the
appropriate casts (in transition mode, which would be the default for
a few versions)
- ship a scala-refactoring-based tool with Scala that would help mass-
migrate code from the old to the new collections sizes (http://scala-
refactoring.org/), performing some clever usage analysis where
possible and failing with advice on manual migration where it's not
possible to modify the code automatically.
What do you guys think about all this ? (I know, Pandora box,
PS: told Martin about this when I recently had the chance to meet him,
but I believe this truly belongs to the debate mailing list :-)