This page is no longer maintained — Please continue to the home page at www.scala-lang.org

(Offtopic) FYI: Interesting discussion on JVM performance

14 replies
Ben Hutchison 3
Joined: 2009-11-02,
User offline. Last seen 42 years 45 weeks ago.

GC, allocation, cache effects, immutables, numerics!:
http://groups.google.com/group/jvm-languages/browse_thread/thread/2c10bd...

-Ben

Johannes Rudolph
Joined: 2008-12-17,
User offline. Last seen 29 weeks 19 hours ago.
Re: (Offtopic) FYI: Interesting discussion on JVM performance

On Thu, Nov 19, 2009 at 1:40 AM, Ben Hutchison wrote:
> FYI: Im finding this thread quite interesting and relevant to Scala.
>
> GC, allocation, cache effects, immutables, numerics!:
> http://groups.google.com/group/jvm-languages/browse_thread/thread/2c10bd...

Me, too. IMO it's more relevant to the dynamic languages because they
suffer more from boxing issues than Scala does. Specialization can get
away with much of the boxing. Scala could even emulate value-types to
a certain degree. Still, one big boxing issue is returning more than
one (primitive) values from a method (e.g. returning tuples). For
particular cases when inlining and EA can't optimize away the box
allocation the proposed runtime tagged tuples/pairs could help Scala
as well.

ijuma
Joined: 2008-08-20,
User offline. Last seen 22 weeks 2 days ago.
Re: (Offtopic) FYI: Interesting discussion on JVM performance

Johannes Rudolph googlemail.com> writes:
> Me, too. IMO it's more relevant to the dynamic languages because they
> suffer more from boxing issues than Scala does.

Scala does _a lot_ of boxing currently.

> Specialization can get away with much of the boxing.

This is the hope, but have you checked in practice? In particular, have you
checked collections like maps, sets and lists, are the internal structures
stored differently? If you want to avoid boxing, it's not enough to specialize
the methods, you have to specialize the fields too. I haven't checked the latest
SID or the implementation to see if it actually does that. Does it?

> Still, one big boxing issue is returning more than
> one (primitive) values from a method (e.g. returning tuples). For
> particular cases when inlining and EA can't optimize away the box
> allocation the proposed runtime tagged tuples/pairs could help Scala
> as well.

Scala could also use what John Rose calls tuples in the JVM[1]. That would allow
one to implement more efficient HashMaps for example.

Best,
Ismael

[1] http://blogs.sun.com/jrose/entry/tuples_in_the_vm

Johannes Rudolph
Joined: 2008-12-17,
User offline. Last seen 29 weeks 19 hours ago.
Re: Re: (Offtopic) FYI: Interesting discussion on JVM performa

On Thu, Nov 19, 2009 at 9:35 AM, Ismael Juma wrote:
> Johannes Rudolph googlemail.com> writes:
>> Me, too. IMO it's more relevant to the dynamic languages because they
>> suffer more from boxing issues than Scala does.
>
> Scala does _a lot_ of boxing currently.

Yes. But whereas Scala could optimize way much of them, because it can
know from the static types, dynamic languages can hardly be optimized
without the proposed changes.

>
>> Specialization can get away with much of the boxing.
>
> This is the hope, but have you checked in practice? In particular, have you
> checked collections like maps, sets and lists, are the internal structures
> stored differently? If you want to avoid boxing, it's not enough to specialize
> the methods, you have to specialize the fields too. I haven't checked the latest
> SID or the implementation to see if it actually does that. Does it?

Scala 2.8 r18784 (two months old):

from

class Test[@specialized T](v:T)

[[syntax trees at end of jvm]]// Scala source: test.scala
package {
class Test extends java.lang.Object with ScalaObject {
protected[this] val v: java.lang.Object = _;
def this(v: java.lang.Object): Test = {
Test.this.v = v;
Test.super.this();
()
}
};
class Test$mcI$sp extends Test {
protected[this] val v$mcI$sp: Int = _;
def this(v$mcI$sp: Int): Test$mcI$sp = {
Test$mcI$sp.this.v$mcI$sp = v$mcI$sp;
Test$mcI$sp.super.this(scala.Int.box(v$mcI$sp));
()
}
};
// [ ...]
}

so Scala actually does specialize fields but stores them in a boxed
version as well (probably to comply to the generic interface). I
wonder if this is really needed, if the field is private.

>> Still, one big boxing issue is returning more than
>> one (primitive) values from a method (e.g. returning tuples). For
>> particular cases when inlining and EA can't optimize away the box
>> allocation the proposed runtime tagged tuples/pairs could help Scala
>> as well.
>
> Scala could also use what John Rose calls tuples in the JVM[1]. That would allow
> one to implement more efficient HashMaps for example.

That's what I meant actually. I just wanted to stress, that you could
emulate the 2-tuples in HashMaps by two primitive arrays right now
whereas you cannot efficiently emulate multiple return values (perhaps
TLS? How expensive would that be?).

ijuma
Joined: 2008-08-20,
User offline. Last seen 22 weeks 2 days ago.
Re: (Offtopic) FYI: Interesting discussion on JVM performance

Johannes Rudolph googlemail.com> writes:
> so Scala actually does specialize fields but stores them in a boxed
> version as well (probably to comply to the generic interface). I
> wonder if this is really needed, if the field is private.

Interesting. That implementation doesn't seem like it avoids boxing on updates.

> That's what I meant actually. I just wanted to stress, that you could
> emulate the 2-tuples in HashMaps by two primitive arrays right now

It's not the same. It's what primitive collections like Trove rely on, but you
incur additional cache misses. It's still worth it for primitive maps, but it's
less obvious for non-primitive ones. If you look at the implementation of the
Python dictionary (in C), it uses open addressing but uses a struct for the
entries. That is superior to using two separate arrays.

> whereas you cannot efficiently emulate multiple return values (perhaps
> TLS? How expensive would that be?).

I think multiple return values are the case where there is not much to worry
about[1]. I'd think that scalar replacement would take care of that for the
common case.

Best,
Ismael

[1] I am assuming that Scala will fix the issue where tuples get stored as a
field when you do val (a, b) = computeAAndB in the class body.

Iulian Dragos 2
Joined: 2009-02-10,
User offline. Last seen 42 years 45 weeks ago.
Re: Re: (Offtopic) FYI: Interesting discussion on JVM performa


On Thu, Nov 19, 2009 at 10:26 AM, Ismael Juma <mlists [at] juma [dot] me [dot] uk> wrote:
Johannes Rudolph <johannes.rudolph <at> googlemail.com> writes:
> so Scala actually does specialize fields but stores them in a boxed
> version as well (probably to comply to the generic interface).  I
> wonder if this is really needed, if the field is private.

Interesting. That implementation doesn't seem like it avoids boxing on updates.

It does, or if it doesn't it's a bug. Note that specialization is still work in progress, though very close to completion. The paper cited in the SID does describe how fields are specialized.

What may look strange is that a field has two accessors, one is the generic one (using boxing/unboxing), the other the specialized one. Callers will choose the specific one whenever the static type allows it, avoiding boxing. The generic one is rewritten to go through the specialized one to avoid 'double vision' (so to say). I think this is what made you think that updates don't avoid boxing.

Of course, I'm interested to hear of any behavior that is unexpected/undesirable.

thanks,
iulian
 

> That's what I meant actually. I just wanted to stress, that you could
> emulate the 2-tuples in HashMaps by two primitive arrays right now

It's not the same. It's what primitive collections like Trove rely on, but you
incur additional cache misses. It's still worth it for primitive maps, but it's
less obvious for non-primitive ones. If you look at the implementation of the
Python dictionary (in C), it uses open addressing but uses a struct for the
entries. That is superior to using two separate arrays.

> whereas you cannot efficiently emulate multiple return values (perhaps
> TLS? How expensive would that be?).

I think multiple return values are the case where there is not much to worry
about[1]. I'd think that scalar replacement would take care of that for the
common case.

Best,
Ismael

[1] I am assuming that Scala will fix the issue where tuples get stored as a
field when you do val (a, b) = computeAAndB in the class body.




--
« Je déteste la montagne, ça cache le paysage »
Alphonse Allais
ijuma
Joined: 2008-08-20,
User offline. Last seen 22 weeks 2 days ago.
Re: (Offtopic) FYI: Interesting discussion on JVM performance

Hi Iulian,

Iulian Dragos epfl.ch> writes:
> It does, or if it doesn't it's a bug.

Great.

> Note that specialization is still work in progress,

Indeed, and the bit that Johannes posted is from two months ago.

> though very close to completion.

Good to hear that.

> The paper cited in the SID does describe how fields are specialized.

Cool, I'll check it out. I read a paper that was released a while ago, but I
haven't read the SID yet.

By the way, does the SID talk about specialization of collections (whether
HashMap and the like will be specialized, I expect Arrays and Function objects
with 1 and 2 arities to be specialized)?

> What may look strange is that a field has two accessors, one is the
> generic one (using boxing/unboxing), the other the specialized one.
> Callers will choose the specific one whenever the static type allows it,
> avoiding boxing. The generic one is rewritten to go through the specialized
> one to avoid 'double vision' (so to say).

Thanks for the explanation.

> I think this is what made you think that updates don't avoid boxing.

Looking at the code again, it seems like what confused me was the constructor:

class Test$mcI$sp extends Test {
protected[this] val v$mcI$sp: Int = _;
def this(v$mcI$sp: Int): Test$mcI$sp = {
Test$mcI$sp.this.v$mcI$sp = v$mcI$sp;
Test$mcI$sp.super.this(scala.Int.box(v$mcI$sp));
()
}
};

I am more used to looking at bytecode than Scala syntax trees, so I'll check
that (and with a more recent build) to see what is actually happening. :)

Best,
Ismael

Johannes Rudolph
Joined: 2008-12-17,
User offline. Last seen 29 weeks 19 hours ago.
Re: Re: (Offtopic) FYI: Interesting discussion on JVM performa

On Thu, Nov 19, 2009 at 1:17 PM, Ismael Juma wrote:
> I am more used to looking at bytecode than Scala syntax trees, so I'll check
> that (and with a more recent build) to see what is actually happening. :)

I'm more used to bytecodes as well, the good thing with the syntax
trees is that you've got all the stuff together and don't have to look
into all of the classes scalac generates.

And FYI: I tried a newer build r19722 from today, but it failed to
compile my example and threw an assertion.

Johannes Rudolph
Joined: 2008-12-17,
User offline. Last seen 29 weeks 19 hours ago.
Re: Re: (Offtopic) FYI: Interesting discussion on JVM performa

On Thu, Nov 19, 2009 at 2:12 PM, Johannes Rudolph
wrote:
> And FYI: I tried a newer build r19722 from today, but it failed to
> compile my example and threw an assertion.
I tried again and now the compilation works fine... Perhaps I had some
class-files lying around which the compiler picked up.

Johannes Rudolph
Joined: 2008-12-17,
User offline. Last seen 29 weeks 19 hours ago.
Re: Re: (Offtopic) FYI: Interesting discussion on JVM performa

On Thu, Nov 19, 2009 at 11:57 AM, Iulian Dragos wrote:
> Of course, I'm interested to hear of any behavior that is
> unexpected/undesirable.

I see two issues with the initialization of class variables.

1. Generic variable initializations are called twice. The first time
in the specialized constructor and then in the generic constructor
again. See this example:

class Test[@specialized T](v:T) {
val elems:T = {
println(v)
v
}
}
object TestUser{
def main(args:Array[String]){
new Test(5)
}
}

which when compiled (now r19722) and run prints 5 twice.

2. And regarding the same point: The object might need twice as much
space as before to accomodate both the generic and the specialized
data. For example:

class Test[@specialized T](v:T)(implicit mf:Manifest[T]) {
val elems:Array[T] = new Array[T](50)
}

creates the array two times.

Though, AFAICS the one in the generic version is completely hidden and
can't be accessed at all. Why is it, that the generic class is handled
specially as the supertype of all the generic classes? Wouldn't it be
more consequent to generate a specialized version for the generic case
and make the accessors in the base class abstract?

Johannes Rudolph
Joined: 2008-12-17,
User offline. Last seen 29 weeks 19 hours ago.
Re: Re: (Offtopic) FYI: Interesting discussion on JVM performa

A funny little detail is the specialization on the BoxedUnit
non-primitive type. That seems at least a little bit odd (not to say
.. err.. totally void). Is there a reason for this behaviour? In the
light of my last email this is odd even twice: The specialized version
needs twice the space than the generic one to store boxes containing
plainly nothing. :)

On Thu, Nov 19, 2009 at 11:57 AM, Iulian Dragos wrote:
>
>
> On Thu, Nov 19, 2009 at 10:26 AM, Ismael Juma wrote:
>>
>> Johannes Rudolph googlemail.com> writes:
>> > so Scala actually does specialize fields but stores them in a boxed
>> > version as well (probably to comply to the generic interface).  I
>> > wonder if this is really needed, if the field is private.
>>
>> Interesting. That implementation doesn't seem like it avoids boxing on
>> updates.
>
> It does, or if it doesn't it's a bug. Note that specialization is still work
> in progress, though very close to completion. The paper cited in the SID
> does describe how fields are specialized.
>
> What may look strange is that a field has two accessors, one is the generic
> one (using boxing/unboxing), the other the specialized one. Callers will
> choose the specific one whenever the static type allows it, avoiding boxing.
> The generic one is rewritten to go through the specialized one to avoid
> 'double vision' (so to say). I think this is what made you think that
> updates don't avoid boxing.
>
> Of course, I'm interested to hear of any behavior that is
> unexpected/undesirable.
>
> thanks,
> iulian
>
>>
>> > That's what I meant actually. I just wanted to stress, that you could
>> > emulate the 2-tuples in HashMaps by two primitive arrays right now
>>
>> It's not the same. It's what primitive collections like Trove rely on, but
>> you
>> incur additional cache misses. It's still worth it for primitive maps, but
>> it's
>> less obvious for non-primitive ones. If you look at the implementation of
>> the
>> Python dictionary (in C), it uses open addressing but uses a struct for
>> the
>> entries. That is superior to using two separate arrays.
>>
>> > whereas you cannot efficiently emulate multiple return values (perhaps
>> > TLS? How expensive would that be?).
>>
>> I think multiple return values are the case where there is not much to
>> worry
>> about[1]. I'd think that scalar replacement would take care of that for
>> the
>> common case.
>>
>> Best,
>> Ismael
>>
>> [1] I am assuming that Scala will fix the issue where tuples get stored as
>> a
>> field when you do val (a, b) = computeAAndB in the class body.
>>
>
>
>
> --
> « Je déteste la montagne, ça cache le paysage »
> Alphonse Allais
>

Iulian Dragos 2
Joined: 2009-02-10,
User offline. Last seen 42 years 45 weeks ago.
Re: Re: (Offtopic) FYI: Interesting discussion on JVM performa


On Thu, Nov 19, 2009 at 3:58 PM, Johannes Rudolph <johannes [dot] rudolph [at] googlemail [dot] com> wrote:
A funny little detail is the specialization on the BoxedUnit
non-primitive type. That seems at least a little bit odd (not to say
.. err.. totally void). Is there a reason for this behaviour? In the
light of my last email this is odd even twice: The specialized version
needs twice the space than the generic one to store boxes containing
plainly nothing. :)

I'm not sure I understand your question. Are you asking why Unit is considered a primitive type for specialization? It is precisely for avoiding uses of BoxedUnit. Let's look at how Function1 is (roughly) defined:

trait Function1[+R, -T1] {
  def apply(x: T1): R
}

Suppose now we have a specialized collection that defines foreach:

def foreach(f: A => Unit) = ...

and we use it at a primitive type, say Int: Coll[Int]. If Unit was not specialized, foreach would need to go through the generic apply, and box both the argument and the return value.

Note that specializing on Unit does not make matters worse. In the generic version of Function1 Units are anyway boxed and returned by 'apply'.

I agree with you it makes no sense to specialize for Unit if the original type parameter is not a return type. Currently Unit is not treated specially, but i guess it makes sense to skip those combinations.

cheers,
iulian

 

On Thu, Nov 19, 2009 at 11:57 AM, Iulian Dragos <iulian [dot] dragos [at] epfl [dot] ch> wrote:
>
>
> On Thu, Nov 19, 2009 at 10:26 AM, Ismael Juma <mlists [at] juma [dot] me [dot] uk> wrote:
>>
>> Johannes Rudolph <johannes.rudolph <at> googlemail.com> writes:
>> > so Scala actually does specialize fields but stores them in a boxed
>> > version as well (probably to comply to the generic interface).  I
>> > wonder if this is really needed, if the field is private.
>>
>> Interesting. That implementation doesn't seem like it avoids boxing on
>> updates.
>
> It does, or if it doesn't it's a bug. Note that specialization is still work
> in progress, though very close to completion. The paper cited in the SID
> does describe how fields are specialized.
>
> What may look strange is that a field has two accessors, one is the generic
> one (using boxing/unboxing), the other the specialized one. Callers will
> choose the specific one whenever the static type allows it, avoiding boxing.
> The generic one is rewritten to go through the specialized one to avoid
> 'double vision' (so to say). I think this is what made you think that
> updates don't avoid boxing.
>
> Of course, I'm interested to hear of any behavior that is
> unexpected/undesirable.
>
> thanks,
> iulian
>
>>
>> > That's what I meant actually. I just wanted to stress, that you could
>> > emulate the 2-tuples in HashMaps by two primitive arrays right now
>>
>> It's not the same. It's what primitive collections like Trove rely on, but
>> you
>> incur additional cache misses. It's still worth it for primitive maps, but
>> it's
>> less obvious for non-primitive ones. If you look at the implementation of
>> the
>> Python dictionary (in C), it uses open addressing but uses a struct for
>> the
>> entries. That is superior to using two separate arrays.
>>
>> > whereas you cannot efficiently emulate multiple return values (perhaps
>> > TLS? How expensive would that be?).
>>
>> I think multiple return values are the case where there is not much to
>> worry
>> about[1]. I'd think that scalar replacement would take care of that for
>> the
>> common case.
>>
>> Best,
>> Ismael
>>
>> [1] I am assuming that Scala will fix the issue where tuples get stored as
>> a
>> field when you do val (a, b) = computeAAndB in the class body.
>>
>
>
>
> --
> « Je déteste la montagne, ça cache le paysage »
> Alphonse Allais
>



--
Johannes

-----------------------------------------------
Johannes Rudolph
http://virtual-void.net



--
« Je déteste la montagne, ça cache le paysage »
Alphonse Allais
Jon Harrop
Joined: 2009-04-28,
User offline. Last seen 42 years 45 weeks ago.
Re: Re: (Offtopic) FYI: Interesting discussion on JVM performa

On Thursday 19 November 2009 09:00:52 Johannes Rudolph wrote:
> On Thu, Nov 19, 2009 at 9:35 AM, Ismael Juma wrote:
> > Johannes Rudolph googlemail.com> writes:
> >> Me, too. IMO it's more relevant to the dynamic languages because they
> >> suffer more from boxing issues than Scala does.
> >
> > Scala does _a lot_ of boxing currently.
>
> Yes. But whereas Scala could optimize way much of them...

No, Scala cannot optimize away unnecessary boxing because the JVM itself is
incapable of doing do: it doesn't have value types. That's why the CLR is 32x
faster at filling a float->float hash table than any JVM-based language, for
example:

http://groups.google.com/group/comp.lang.java.programmer/msg/1fb78952197...

odersky
Joined: 2008-07-29,
User offline. Last seen 45 weeks 6 days ago.
Re: Re: (Offtopic) FYI: Interesting discussion on JVM performa

On Wed, Dec 23, 2009 at 11:24 PM, Jon Harrop wrote:
> On Thursday 19 November 2009 09:00:52 Johannes Rudolph wrote:
>> On Thu, Nov 19, 2009 at 9:35 AM, Ismael Juma wrote:
>> > Johannes Rudolph googlemail.com> writes:
>> >> Me, too. IMO it's more relevant to the dynamic languages because they
>> >> suffer more from boxing issues than Scala does.
>> >
>> > Scala does _a lot_ of boxing currently.
>>
>> Yes. But whereas Scala could optimize way much of them...
>
> No, Scala cannot optimize away unnecessary boxing because the JVM itself is
> incapable of doing do: it doesn't have value types. That's why the CLR is 32x
> faster at filling a float->float hash table than any JVM-based language, for
> example:
>
Of course it can. have a look at specialized. And yes, we are getting the same
order of magnitude improvements from applying it.

Cheers

Jon Harrop
Joined: 2009-04-28,
User offline. Last seen 42 years 45 weeks ago.
Re: Re: (Offtopic) FYI: Interesting discussion on JVM performa

On Wednesday 23 December 2009 21:22:57 martin odersky wrote:
> On Wed, Dec 23, 2009 at 11:24 PM, Jon Harrop wrote:
> > No, Scala cannot optimize away unnecessary boxing because the JVM itself
> > is incapable of doing do: it doesn't have value types. That's why the CLR
> > is 32x faster at filling a float->float hash table than any JVM-based
> > language, for example:
>
> Of course it can. have a look at specialized. And yes, we are getting the
> same order of magnitude improvements from applying it.

So Scala's hash table is 32x faster than Java's?

Copyright © 2012 École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland