This page is no longer maintained — Please continue to the home page at www.scala-lang.org

Best choice for asynchronous lazy map / cache in Scala?

9 replies

Mon, 2010-03-08, 17:03

Matthias

Joined: 2010-03-08,

Hi,
I have a typical use case:- An object offers map-like access to other objects which I will call values, in a key/value manner.- Those values are static and could in theory be generated on the fly each time they are requested. - In practice, generating the values can be so resource consuming that caching them is a requirement.- The cache must be thread safe.
My first try was to use a
new HashMap[X,Y] with SynchronizedMap[X,Y]
and then call "getOrElseUpdate". However, this approach has several problems.
Problem 1: Every access that causes a value to be generated will block all other threads trying to access values.Problem 2: The thread that generates a value is on the map's monitor. This can cause deadlocks when at least two maps of this kind are involved, and if those can be inter-dependent.
Currently, I am just about to write my own map trait for these kinds of use cases. I just want to ask you if you have a better idea. Is there any type which I can use that already does that? - I know, there are probably many many Java class libraries out there that provide that kind of functionality. But I don't really want to include a huge library with god-knows-how-many-classes just for such a relatively simple use cases. I'd rather write that trait by myself. But hey, maybe I missed a type in the basic Scala library that I could use, or someone has made such a type that I could easily integrate in my project without having to adapt a huge library.
Just to give you a brief idea of what I am going to do: I will create a trait that is similar to SynchronizedMap. However, it does not simply synchronize every single access to the map. Those accesses that might block a thread (because they trigger a value generating process) will behave like this:
- Get a generator thread for the requested key, or create one if it does not exist.- If the generator thread already existed, join with it, waiting for it to finish, and after it finished return the value that it generated. - If the generator thread did not yet exist, start it, wait for it to finish, and return the value that it generated.
It is very important that the last two steps (waiting for the generator) do not keep the monitor (i.e., synchronize on) the map. When done this way, thread deadlocks cannot happen.
I've written similar classes in Java already, but I'd like to do it again in Scala. Unless anyone can point me to a good already existing implementation, that is.
Thanks! Madoc.

Mon, 2010-03-08, 17:17

#1

ounos

Joined: 2008-12-29,

Re: Best choice for asynchronous lazy map / cache in Scala?

As far as I know, the best async lazy map in scala is...the MapMaker of google collections: http://code.google.com/p/google-collections/
Lets not forget that a selling point of scala is the ability to leverage existing java libraries, instead of having to rewrite everything. Why not spend the time building something new?

Dimitris
2010/3/8 Matthias <madocdoyu [at] gmail [dot] com>

Hi,
I have a typical use case:- An object offers map-like access to other objects which I will call values, in a key/value manner.- Those values are static and could in theory be generated on the fly each time they are requested. - In practice, generating the values can be so resource consuming that caching them is a requirement.- The cache must be thread safe.
My first try was to use a
new HashMap[X,Y] with SynchronizedMap[X,Y]
and then call "getOrElseUpdate". However, this approach has several problems.
Problem 1: Every access that causes a value to be generated will block all other threads trying to access values.Problem 2: The thread that generates a value is on the map's monitor. This can cause deadlocks when at least two maps of this kind are involved, and if those can be inter-dependent.
Currently, I am just about to write my own map trait for these kinds of use cases. I just want to ask you if you have a better idea. Is there any type which I can use that already does that? - I know, there are probably many many Java class libraries out there that provide that kind of functionality. But I don't really want to include a huge library with god-knows-how-many-classes just for such a relatively simple use cases. I'd rather write that trait by myself. But hey, maybe I missed a type in the basic Scala library that I could use, or someone has made such a type that I could easily integrate in my project without having to adapt a huge library.
Just to give you a brief idea of what I am going to do: I will create a trait that is similar to SynchronizedMap. However, it does not simply synchronize every single access to the map. Those accesses that might block a thread (because they trigger a value generating process) will behave like this:
- Get a generator thread for the requested key, or create one if it does not exist.- If the generator thread already existed, join with it, waiting for it to finish, and after it finished return the value that it generated. - If the generator thread did not yet exist, start it, wait for it to finish, and return the value that it generated.
It is very important that the last two steps (waiting for the generator) do not keep the monitor (i.e., synchronize on) the map. When done this way, thread deadlocks cannot happen.
I've written similar classes in Java already, but I'd like to do it again in Scala. Unless anyone can point me to a good already existing implementation, that is.
Thanks! Madoc.

Mon, 2010-03-08, 17:37

#2

Dave Griffith

Joined: 2009-01-14,

Re: Best choice for asynchronous lazy map / cache in Scala?

val map = new HashMap[X,Future[Y]] with SynchronizedMap[X,Future[Y]]

map.getOrElseUpdate(/* code to build an appropriate future for your
calculation, probably by sending to an actor with the !! send */)}.get()

--Dave Griffith

Madoc wrote:
>
> Hi,
>
> I have a typical use case:
> - An object offers map-like access to other objects which I will call
> values, in a key/value manner.
> - Those values are static and could in theory be generated on the fly each
> time they are requested.
> - In practice, generating the values can be so resource consuming that
> caching them is a requirement.
> - The cache must be thread safe.
>
> My first try was to use a
>
> new HashMap[X,Y] with SynchronizedMap[X,Y]
>
> and then call "getOrElseUpdate". However, this approach has several
> problems.
>
> Problem 1: Every access that causes a value to be generated will block all
> other threads trying to access values.
> Problem 2: The thread that generates a value is on the map's monitor. This
> can cause deadlocks when at least two maps of this kind are involved, and
> if
> those can be inter-dependent.
>
> Currently, I am just about to write my own map trait for these kinds of
> use
> cases. I just want to ask you if you have a better idea. Is there any type
> which I can use that already does that? - I know, there are probably many
> many Java class libraries out there that provide that kind of
> functionality.
> But I don't really want to include a huge library with
> god-knows-how-many-classes just for such a relatively simple use cases.
> I'd
> rather write that trait by myself. But hey, maybe I missed a type in the
> basic Scala library that I could use, or someone has made such a type that
> I
> could easily integrate in my project without having to adapt a huge
> library.
>
> Just to give you a brief idea of what I am going to do: I will create a
> trait that is similar to SynchronizedMap. However, it does not simply
> synchronize every single access to the map. Those accesses that might
> block
> a thread (because they trigger a value generating process) will behave
> like
> this:
>
> - Get a generator thread for the requested key, or create one if it does
> not
> exist.
> - If the generator thread already existed, join with it, waiting for it to
> finish, and after it finished return the value that it generated.
> - If the generator thread did not yet exist, start it, wait for it to
> finish, and return the value that it generated.
>
> It is very important that the last two steps (waiting for the generator)
> do
> not keep the monitor (i.e., synchronize on) the map. When done this way,
> thread deadlocks cannot happen.
>
> I've written similar classes in Java already, but I'd like to do it again
> in
> Scala. Unless anyone can point me to a good already existing
> implementation,
> that is.
>
> Thanks!
> Madoc.
>
>

Mon, 2010-03-08, 18:27

#3

Matthias

Joined: 2010-03-08,

Re: Best choice for asynchronous lazy map / cache in Scala?

Wow, great idea! Now I have to really good approaches to work with. Thanks Dave!

Am 08.03.2010 um 17:29 schrieb Dave Griffith:

>
>
> val map = new HashMap[X,Future[Y]] with SynchronizedMap[X,Future[Y]]
>
> map.getOrElseUpdate(/* code to build an appropriate future for your
> calculation, probably by sending to an actor with the !! send */)}.get()
>
> --Dave Griffith

Mon, 2010-03-08, 18:57

#4

Dave Griffith

Joined: 2009-01-14,

Re: Best choice for asynchronous lazy map / cache in Scala?

No problem. Now I just have to try to remember it myself the next time I
need that pattern.

--Dave Griffith

Mon, 2010-03-08, 19:27

#5

ounos

Joined: 2008-12-29,

Re: Best choice for asynchronous lazy map / cache in Scala?

With this approach you still have the "all accesses to the map obstruct any other" bit though. That's why j.u.c.ConcurrentHashMap and friends was built.
Dimitris

2010/3/8 Dave Griffith <dave [dot] l [dot] griffith [at] gmail [dot] com>

No problem. Now I just have to try to remember it myself the next time I
need that pattern.

--Dave Griffith

--
View this message in context: http://old.nabble.com/-scala--Best-choice-for-asynchronous-lazy-map---cache-in-Scala--tp27823361p27824976.html
Sent from the Scala mailing list archive at Nabble.com.

Mon, 2010-03-08, 21:27

#6

Matthias

Joined: 2010-03-08,

Re: Best choice for asynchronous lazy map / cache in Scala?

I don't think so, because I imagine the sequence of action for one thread to be like this:
1. Obtain a monitor on the map.2. Get the Future instance, creating it if it does not exist. (Which is a trivial operation that does not risk any deadlocks.)3. Release the monitor on the map.4. Get the object from the Future instance, which may cause the object to be generated if it was not generated before.
The point is: Step 4 happens after the release of the monitor, therefore it cannot interfere with any other threads that want to get the map's monitor. While step 4 is being executed for key A, no matter how long it takes, other threads are not obstructed to enter steps 1-4 for any other key B.
However, at the moment I find MapMaker/ConcurrentHashMap more handy. I have to introduce less changes in my code to use it. But I view both approaches as equivalent; it depends on personal liking and the concrete use case which to choose.
I am thankful that the both of you presented me both ideas - one is a recommendation for a very good API that is definitely worth a look and that I will use, and the other makes elegant use of the built-in library, and had that eye opening "It's so clear and simple, why didn't I come up with that?" effect on me.
Madoc.
Am 08.03.2010 um 19:24 schrieb Dimitris Andreou:

With this approach you still have the "all accesses to the map obstruct any other" bit though. That's why j.u.c.ConcurrentHashMap and friends was built.
Dimitris

2010/3/8 Dave Griffith <dave [dot] l [dot] griffith [at] gmail [dot] com>

No problem. Now I just have to try to remember it myself the next time I
need that pattern.

--Dave Griffith

--
View this message in context: http://old.nabble.com/-scala--Best-choice-for-asynchronous-lazy-map---cache-in-Scala--tp27823361p27824976.html
Sent from the Scala mailing list archive at Nabble.com.

Mon, 2010-03-08, 21:37

#7

ounos

Joined: 2008-12-29,

Re: Best choice for asynchronous lazy map / cache in Scala?

Put some threads trying simultaneously to execute the steps 1-3, and watch the fireworks. Then be thankful that you were using a ConcurrentHashMap in your real code. :)
Dimitris
2010/3/8 Matthias Deja <madocdoyu [at] gmail [dot] com>

I don't think so, because I imagine the sequence of action for one thread to be like this:
1. Obtain a monitor on the map.2. Get the Future instance, creating it if it does not exist. (Which is a trivial operation that does not risk any deadlocks.) 3. Release the monitor on the map.4. Get the object from the Future instance, which may cause the object to be generated if it was not generated before.
The point is: Step 4 happens after the release of the monitor, therefore it cannot interfere with any other threads that want to get the map's monitor. While step 4 is being executed for key A, no matter how long it takes, other threads are not obstructed to enter steps 1-4 for any other key B.
However, at the moment I find MapMaker/ConcurrentHashMap more handy. I have to introduce less changes in my code to use it. But I view both approaches as equivalent; it depends on personal liking and the concrete use case which to choose.
I am thankful that the both of you presented me both ideas - one is a recommendation for a very good API that is definitely worth a look and that I will use, and the other makes elegant use of the built-in library, and had that eye opening "It's so clear and simple, why didn't I come up with that?" effect on me.
Madoc.
Am 08.03.2010 um 19:24 schrieb Dimitris Andreou:
With this approach you still have the "all accesses to the map obstruct any other" bit though. That's why j.u.c.ConcurrentHashMap and friends was built.
Dimitris

2010/3/8 Dave Griffith <dave [dot] l [dot] griffith [at] gmail [dot] com>

No problem. Now I just have to try to remember it myself the next time I
need that pattern.

--Dave Griffith

--
View this message in context: http://old.nabble.com/-scala--Best-choice-for-asynchronous-lazy-map---cache-in-Scala--tp27823361p27824976.html
Sent from the Scala mailing list archive at Nabble.com.

Mon, 2010-03-08, 21:47

#8

Matthias

Joined: 2010-03-08,

Re: Best choice for asynchronous lazy map / cache in Scala?

It may very well be that I am mistaken, but at the moment I have the impression that steps 1-3 are safe. Only one thread can be inside that block at any time. And the "heavy work" that may risk deadlocks does not happen inside those steps. Actually, I believe this is pretty close to what ConcurrentHashMap does internally. And it's pretty close to the custom made proprietary implementation that I use for several years already in my job.
But there is a risk, namely in step 2: "creating it if it does not exist". When one introduces code here that may block other threads, then the whole thing is useless. Step 2 must by all means only create the Future instance that is able to create the desired value, not the desired value itself. This is one thing that one has to bear in mind when using this technique. Your suggestion, Dimitris, is more safe in that this consideration was already placed in the implementation of ConcurrentHashMap. Therefore, I would regard your idea as less error prone.
Madoc.

2010/3/8 Dimitris Andreou <jim [dot] andreou [at] gmail [dot] com>

Put some threads trying simultaneously to execute the steps 1-3, and watch the fireworks. Then be thankful that you were using a ConcurrentHashMap in your real code. :)
Dimitris
2010/3/8 Matthias Deja <madocdoyu [at] gmail [dot] com>
I don't think so, because I imagine the sequence of action for one thread to be like this:
1. Obtain a monitor on the map.2. Get the Future instance, creating it if it does not exist. (Which is a trivial operation that does not risk any deadlocks.) 3. Release the monitor on the map.4. Get the object from the Future instance, which may cause the object to be generated if it was not generated before.
The point is: Step 4 happens after the release of the monitor, therefore it cannot interfere with any other threads that want to get the map's monitor. While step 4 is being executed for key A, no matter how long it takes, other threads are not obstructed to enter steps 1-4 for any other key B.
However, at the moment I find MapMaker/ConcurrentHashMap more handy. I have to introduce less changes in my code to use it. But I view both approaches as equivalent; it depends on personal liking and the concrete use case which to choose.
I am thankful that the both of you presented me both ideas - one is a recommendation for a very good API that is definitely worth a look and that I will use, and the other makes elegant use of the built-in library, and had that eye opening "It's so clear and simple, why didn't I come up with that?" effect on me.
Madoc.
Am 08.03.2010 um 19:24 schrieb Dimitris Andreou:
With this approach you still have the "all accesses to the map obstruct any other" bit though. That's why j.u.c.ConcurrentHashMap and friends was built.
Dimitris

2010/3/8 Dave Griffith <dave [dot] l [dot] griffith [at] gmail [dot] com>

No problem. Now I just have to try to remember it myself the next time I
need that pattern.

--Dave Griffith

--
View this message in context: http://old.nabble.com/-scala--Best-choice-for-asynchronous-lazy-map---cache-in-Scala--tp27823361p27824976.html
Sent from the Scala mailing list archive at Nabble.com.

Wed, 2010-03-10, 10:17

#9

ijuma

Joined: 2008-08-20,

Re: Best choice for asynchronous lazy map / cache in Scala?

On Mon, Mar 8, 2010 at 8:39 PM, Matthias wrote:
> It may very well be that I am mistaken, but at the moment I have the
> impression that steps 1-3 are safe. Only one thread can be inside that block
> at any time. And the "heavy work" that may risk deadlocks does not happen
> inside those steps. Actually, I believe this is pretty close to what
> ConcurrentHashMap does internally. And it's pretty close to the custom made
> proprietary implementation that I use for several years already in my job.
> But there is a risk, namely in step 2: "creating it if it does not exist".
> When one introduces code here that may block other threads, then the whole
> thing is useless. Step 2 must by all means only create the Future instance
> that is able to create the desired value, not the desired value itself. This
> is one thing that one has to bear in mind when using this technique. Your
> suggestion, Dimitris, is more safe in that this consideration was already
> placed in the implementation of ConcurrentHashMap. Therefore, I would regard
> your idea as less error prone.

ConcurrentHashMap has that name because it works well under
contention. It does not synchronize on reads in the common case and
uses lock-striping to reduce contention where it needs to synchronize.

Best,
Ismael

Copyright © 2012 École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland