This page is no longer maintained — Please continue to the home page at www.scala-lang.org

An Alternative Map And Set Implementation For Very Large InMemory Data Collections

1 reply
Grey
Joined: 2009-01-03,
User offline. Last seen 42 years 45 weeks ago.
A while back the following project was announced on one of the Scala lists.

http://github.com/alex14n/CompactHashMap

Within the project's ReadMe, displayed at the above link, are a number of interesting discussion links.

Somewhere in there, and I may overstate a bit here, the Doug Lea said something to the effect "this is the only alternative impl to Java's standard Hash Maps I've seriously considered in years, and almost every month someone proposes some newfangled  _improved_ alternative."   In addition, as part of the consideration, it appears the code was put through a number of rigorous test suites (Googles' Java Test Suite), most of which appear to be associated with the project, this should ease on going maintenance of the code substantially.

In the end Sun decided not to adopt it.

However, I've found it priceless for one key aspect, its memory efficiency and performance with very large (millions) of entries.   One hears quite regularly about next generation concurrent applications designed for cores, cores, cores.  But all that computational horsepower is useless if data is not co-resident.  The new concurrent libraries will perforce drive new extreme in-memory data  set capable collection libraries, IMHO.

While I expect it unlikely (their distinct advantage is apparent in only rather specialized use cases, e.g.,  large data sets and some other), I thought I'd throw out the possibility that these be added to the standard Scala library as alternate implementations of the Map and Set ADTs.  There would be some work involved to meet the upcoming 2.8 signatures, but a communication with the author indicated he would be migrating to the 2.8 sigs.

R.


ijuma
Joined: 2008-08-20,
User offline. Last seen 22 weeks 2 days ago.
Re: An Alternative Map And Set Implementation For Very Large InM

Ray Racine gmail.com> writes:
> A while back the following project was announced on one of the Scala
> lists.http://github.com/alex14n/CompactHashMapWithin the project's ReadMe,
> displayed at the above link, are a number of interesting discussion
> links. Somewhere in there, and I may overstate a bit here, the Doug Lea said
> something to the effect "this is the only alternative impl to Java's standard
> Hash Maps I've seriously considered in years, and almost every month someone
> proposes some newfangled _improved_ alternative."

This was about the Java version as I understand:

http://github.com/alex14n/CompactHashMap/blob/master/java/FastHashMap.java

Also, if we're going that route, we should look at the version that Doug Lea
eventually came up with:

http://permalink.gmane.org/gmane.comp.java.openjdk.core-libs.devel/1962

> However, I've found it priceless for one key aspect, its memory efficiency
> and performance with very large (millions) of entries. One hears quite
> regularly about next generation concurrent applications designed for
> cores,cores, cores.

To avoid misunderstandings, it's worth pointing out that the HashMap in question
is not thread-safe.

Best,
Ismael

Copyright © 2012 École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland