- About Scala
- Documentation
- Code Examples
- Software
- Scala Developers
Size of standard library and possible tricks to reduce its size
So, the library jar is getting kinda bulky, especially after the addition of parallel collections. I've been noticing that there's a whole lot of repetitive trait-stub instantiations -- scala/collection/Iterator$$anon$* are great examples of these (they're all different subtypes of Iterator itself) -- and I wonder if anyone has thought to create some library-private abstract classes to flatten some of the bytecode bloat.
For instance, my personal library has the following tree of private-use types tucked away in a package object. You'll note that they are basically a parallel to the corresponding traits, inheriting from the largest abstract superclass available "with" the more specific trait.
Effectively, these let me create multiple custom subtypes of the corresponding traits, but the swath of trait stubs are only compiled to bytecode exactly once (at the abstract class level). So instead of consuming 21k of bytecode for every subtype of Iterator that I need, it only takes about 1k each. I get similar savings from the other classes in the tree.
These abstract classes for my use take up ~240k of classfile space with 2.9.1, but that hit is only taken once, not upon creation of every subtype. This saves a pretty significant chunk of bytecode, and the same trick would likely work well in the standard library. Thoughts?
package object collection { import scala.collection._
private[collection] abstract class AbstractTraversableOnce[T] extends TraversableOnce[T]
private[collection] abstract class AbstractTraversable[T] extends AbstractTraversableOnce[T] with Traversable[T]
private[collection] abstract class AbstractTraversableProxy[T] extends AbstractTraversable[T] with TraversableProxy[T]
private[collection] abstract class AbstractIterable[T] extends AbstractTraversable[T] with Iterable[T]
private[collection] abstract class AbstractIterableProxy[T] extends AbstractIterable[T] with IterableProxy[T]
private[collection] abstract class AbstractIterator[T] extends AbstractTraversableOnce[T] with Iterator[T]
private[collection] abstract class AbstractMap[A, B] extends AbstractIterable[(A, B)] with Map[A, B]
private[collection] abstract class AbstractMapProxy[A, B] extends AbstractMap[A, B] with MapProxy[A, B]
private[collection] abstract class AbstractSeq[T] extends AbstractIterable[T] with Seq[T]
private[collection] abstract class AbstractSeqProxy[T] extends AbstractSeq[T] with SeqProxy[T]
private[collection] abstract class AbstractSet[T] extends AbstractIterable[T] with Set[T]
private[collection] abstract class AbstractSetProxy[T] extends AbstractSet[T] with SetProxy[T]}
For instance, my personal library has the following tree of private-use types tucked away in a package object. You'll note that they are basically a parallel to the corresponding traits, inheriting from the largest abstract superclass available "with" the more specific trait.
Effectively, these let me create multiple custom subtypes of the corresponding traits, but the swath of trait stubs are only compiled to bytecode exactly once (at the abstract class level). So instead of consuming 21k of bytecode for every subtype of Iterator that I need, it only takes about 1k each. I get similar savings from the other classes in the tree.
These abstract classes for my use take up ~240k of classfile space with 2.9.1, but that hit is only taken once, not upon creation of every subtype. This saves a pretty significant chunk of bytecode, and the same trick would likely work well in the standard library. Thoughts?
package object collection { import scala.collection._
private[collection] abstract class AbstractTraversableOnce[T] extends TraversableOnce[T]
private[collection] abstract class AbstractTraversable[T] extends AbstractTraversableOnce[T] with Traversable[T]
private[collection] abstract class AbstractTraversableProxy[T] extends AbstractTraversable[T] with TraversableProxy[T]
private[collection] abstract class AbstractIterable[T] extends AbstractTraversable[T] with Iterable[T]
private[collection] abstract class AbstractIterableProxy[T] extends AbstractIterable[T] with IterableProxy[T]
private[collection] abstract class AbstractIterator[T] extends AbstractTraversableOnce[T] with Iterator[T]
private[collection] abstract class AbstractMap[A, B] extends AbstractIterable[(A, B)] with Map[A, B]
private[collection] abstract class AbstractMapProxy[A, B] extends AbstractMap[A, B] with MapProxy[A, B]
private[collection] abstract class AbstractSeq[T] extends AbstractIterable[T] with Seq[T]
private[collection] abstract class AbstractSeqProxy[T] extends AbstractSeq[T] with SeqProxy[T]
private[collection] abstract class AbstractSet[T] extends AbstractIterable[T] with Set[T]
private[collection] abstract class AbstractSetProxy[T] extends AbstractSet[T] with SetProxy[T]}










Re: Re: Size of standard library and possible tricks to reduce
On Thu, Dec 1, 2011 at 8:43 AM, Pavel Pavlov wrote:
> Note that all redirection stubs will disappear from all classes (and traits)
> which implement trait Foo as first supertrait (super-supertrait etc.), only
> one copy of these stubs will remain - in the class Foo$AC.
> So I doubt the library size has any chance to be increased.
I agree, it was this aspect which was so immediately appealing. All
those forwarders add a ton of weight. Also, my (unsubstantiated)
guess is hotspot will do much better at optimizing with an instance
method calling static in the same class vs. a forwarder passing 'this'
to a separate class. At least, I seriously doubt it'll do worse.
Re: Re: Size of standard library and possible tricks to reduce
As regards to JIT/HotSpot optimizations and overall performance:
1) Having one forwarder method instead of many leads to faster warm-up of these forwarder(s) and thus earlier compilation and optimization of these forwarders by JIT.
2) More polymorfic/megamorfic call sites (virtual/interface calls to forwarders) became monomorfic at run-time, it will directly improve aggressivenes of inline and other optimizations.
3) Because of (2) and well-known optimistic optimization strategy of HotSpot's JIT the number of deoptimization/repeated recompilation cases may somewhat decrease.
4) At the hardware level, pollution of code cache and branch prediction cache should somewhat decrease.
5) Small static methods (getters/setters) will be inlined by HotSpot into forwarders early, at the first compilation of the forwarder, if they will reside in the same class.
Re: Re: Size of standard library and possible tricks to reduce
On Thu, Dec 1, 2011 at 5:25 PM, Todd Vierling <tv [at] duh [dot] org> wrote:
The point is the other way around, avoiding splicing in implementations in all subtypes.
It wouldn't need to emit the Foo$AC if it's a purely virtual trait.
Re: Re: Size of standard library and possible tricks to reduce
On Thu, Dec 1, 2011 at 11:29 AM, √iktor Ҡlang wrote:
>> I'm wary of this increasing size in the standard library, though it
>> would indeed help the case of user code.
>>
>> It makes me wonder if perhaps an annotation to enable this behavior
>> per-trait might be a better choice.
>
> The point is the other way around, avoiding splicing in implementations in
> all subtypes.
Yes, I know. I like the idea, but I wonder if it will cause more
strangeness than it solves. Proof of concept would probably be the
only way to find out.
> It wouldn't need to emit the Foo$AC if it's a purely virtual trait.
True, but nearly all traits in the standard library have at least one
implemented method. I would definitely prefer overloading Foo$class
instead, though, since it will already be in use (for static methods)
and would mean one less class file added to the mix.
Re: Re: Size of standard library and possible tricks to reduce
To keep inheritance hierarchy consistent it can be done in slightly different manner:
1) Compiler generates "abstract class Foo$class extends Foo" with instance stubs and static impl. methods as proposed above.
2) Scala's "class C extends Foo with Bar { ... }" is translated to Java's "class C extends Foo$class implements Foo, Bar { ... }"
instead of "class C extends Object implements Foo, Bar { ... }" as it's done now.
This way we'll have symmetric interface inheritance at JVM level: "C <: Foo, C <: Bar", instead of "C <: Foo$class, Foo$class <: Foo, C <: Bar" as I proposed at first.
Re: Re: Size of standard library and possible tricks to reduce
Such scheme also solves previously discussed problem with scaladoc and private classes in the collections hierarchy:
Foo$class is superclass of C only at JVM level. At Scala level, it is invisible at all, just like interface ScalaObject is invisible now.
Re: Re: Size of standard library and possible tricks to reduce
On Thu, Dec 1, 2011 at 9:43 AM, Pavel Pavlov wrote:
> Foo$class is superclass of C only at JVM level. At Scala level, it is
> invisible at all, just like interface ScalaObject is invisible now.
Oh, and that reminds me, it might give us a shot at a subset of these:
https://issues.scala-lang.org/browse/SI-2296
Basically, we get burned on accessing protected members in java
classes because the jvm requires you be in a subclass of the actual
class whereas in scala the code performing the access, if defined in a
trait, will not be in a subclass but called through a forwarder.
Re: Re: Size of standard library and possible tricks to reduce
On Thu, Dec 1, 2011 at 2:48 AM, Pavel Pavlov wrote:
> What do you think of this?
Offhand, I think that's a freaking great idea.
Re: Size of standard library and possible tricks to reduce its s
On 9 Lis, 16:58, Todd Vierling wrote:
> The compiler may not be able to figure out, efficiently,
> where the "best" place to insert the abstract class layer is.
I'm wondering if the compiler can do such thing at all. I was not
reading through the whole scala spec, but in case of Java some
standards clearly say what is the mapping of some language constructs
to classes generated from the compiler (how to name anonymous classes
for example). If it is also the case for scala I don't think we can
freely generate additional classes just because it might be more
efficient from the memory and disk usage point of view.
Moreover such automatically generated classes might need to be somehow
hidden. When you look at the inheritance tree you don't want to see
classes you don't inherit from or ones that were never in any code.
Roman
Re: Size of standard library and possible tricks to reduce its
On Tue, Oct 18, 2011 at 2:19 PM, Todd Vierling wrote:
> So, the library jar is getting kinda bulky, especially after the addition of
> parallel collections. I've been noticing that there's a whole lot of
> repetitive trait-stub instantiations -- scala/collection/Iterator$$anon$*
> are great examples of these (they're all different subtypes of Iterator
> itself) -- and I wonder if anyone has thought to create some library-private
> abstract classes to flatten some of the bytecode bloat.
https://lampsvn.epfl.ch/trac/scala/changeset/20311
https://issues.scala-lang.org/browse/SI-2876
https://lampsvn.epfl.ch/trac/scala/changeset/20490
That was for views. The issue in 2876 doesn't look so bad from this
vantage, but as I recall there was a more serious compiler bug which
was never resolved which made the whole process too tedious.
It's probably doable, but in my experience one tends to discover new
or (worse) old and unfixed compiler bugs when attempting these things,
so you need to be prepared for battle.
Re: Size of standard library and possible tricks to reduce its
On Tue, Oct 18, 2011 at 11:31 PM, Paul Phillips <paulp [at] improving [dot] org> wrote:
I believe it's most likely a linearization problem. That is, I believe the change in 20311 affected linearization so that the wrong method (the one raising the UnsupportedOperationException) was called. Linearization can be tricky in a library that uses traits in intricate ways and Scala collections and in particular views are an example of that. Calling it a compiler bug is a bit premature without further evidence.
-- Martin
Re: Size of standard library and possible tricks to reduce its
On Tue, Oct 18, 2011 at 2:50 PM, martin odersky wrote:
> I believe it's most likely a linearization problem. That is, I believe the
> change in 20311 affected linearization so that the wrong method (the one
> raising the UnsupportedOperationException) was called. Linearization can be
> tricky in a library that uses traits in intricate ways and Scala collections
> and in particular views are an example of that. Calling it a compiler bug is
> a bit premature without further evidence.
The bug I was talking about is this.
https://issues.scala-lang.org/browse/SI-2897
I trust I'm allowed to call that a bug.
Re: Size of standard library and possible tricks to reduce its
On Wed, Oct 19, 2011 at 12:22 AM, Paul Phillips <paulp [at] improving [dot] org> wrote:
You certainly are. But I do not see yet what it has to do with the problem that was reported earlier. That problem manifested itself with an UnsupportedOperationException at runtime whereas 2897 is a crash at compile time. Also, I do not see how locally defined traits as shown in 2897 enter the picture. There's no use I can see of them in the collection library. But given enough time, I am sure we'll figure it out.
-- Martin
Re: Size of standard library and possible tricks to reduce its
"This week, on Behind the Bytecode... remember 200K meant something?"
https://lampsvn.epfl.ch/trac/scala/changeset/20311
Date: Wed Dec 23 17:57:36 2009 +0000
Created team of private[collection] abstract classes
and traits in scala.collection.views. Factored boilerplate
and base Transformed traits out of *ViewLike classes.
Executive summary and motivation:
4812029 Dec 23 09:47 scala-library.jar // before
4604150 Dec 23 09:24 scala-library.jar // after
Direct size savings of 4.5%. Review by odersky.
% date
Tue Oct 18 14:39:09 PDT 2011
% ls -l lib/scala-library.jar
-rw-r--r-- 1 paulp admin 9824041 Oct 16 09:02 lib/scala-library.jar
Re: Size of standard library and possible tricks to reduce its
Should I take your responses to mean that it would be worth pulling HEAD and making a candidate diff, if it trimmed things notably? My aim would only be to provide library-private abstract classes -- no new traits, as was done in the changesets you linked -- which should, in theory, not further confuse the compiler. (I hope. At least in my private library, it's working fine... fingers crossed.)
Re: Size of standard library and possible tricks to reduce its
On Wed, Oct 19, 2011 at 12:26 AM, Todd Vierling <tv [at] duh [dot] org> wrote:
Yes, I think that would be very worthwhile doing.
Thanks!
-- Martin
Re: Size of standard library and possible tricks to reduce its
On Tue, Oct 18, 2011 at 3:26 PM, Todd Vierling wrote:
> Based on my snooping around, I can see that there would be substantial
> savings just by providing Iterator, Traversable, and Iterable. Those three
> traits account for a pretty big chunk of anon class trait stubs.
> Should I take your responses to mean that it would be worth pulling HEAD and
> making a candidate diff, if it trimmed things notably?
Yes, absolutely. (But also take my responses to mean: please see that
the test suite completely passes from scratch, that is "ant all.clean
test" gets all the way to the part where it says yay.)
Re: Size of standard library and possible tricks to reduce its
$ ls -l dists/scala-2.10.0.r25850-b20111018225736/lib/scala-library.jar -rw-r--r-- 1 tvierling tvierling 8697074 2011-10-18 23:13 dists/scala-2.10.0.r25850-b20111018225736/lib/scala-library.jar
$ ls -l build/pack/lib/scala-library.jar -rw-r--r-- 1 tvierling tvierling 7825478 2011-10-19 01:48 build/pack/lib/scala-library.jar
The main changes so far: adding AbstractIterator, AbstractIterable, AbstractSet, and some *ViewLike.AbstractTransformed.
Looks like I can probably trim another 200-400k from what I see remaining so far in the collections tree. Will post more status later in the week after giving it some hefty workouts with both the tests and eyeball-glazing proofreading (mainly to ensure that the Abstract* types don't accidentally leak out into public API through methods' return type signatures).
Some of this leads me to wonder whether certain specific types (e.g., AbstractIterator) should be part of the public API after all, as convenience classes for the user. For now, they're confined to private[collection], but if they prove useful and not a speed burden, that's another debate we can have after-the-fact.
Re: Size of standard library and possible tricks to reduce its
Another hour of tweaking brought this to:
$ ls -l build/pack/lib/scala-library.jar -rw-r--r-- 1 tvierling tvierling 7229411 2011-10-19 11:39 build/pack/lib/scala-library.jar
Yep, that's a ~17% drop so far. Also, by marking a few Abstract*s as private[scala] rather than private[collection], I trimmed scalap, swing, and the compiler a tiny bit too.
There's still more to trim and other tasks to do. After I'm done injecting all the 'Abstract...' layers, I'll review it for possible type linearization issues, run the ant tests, then post it for review and community benchmarking (hell, I'm not sure what to test!) on my webserver: a vanilla dist, the source diff, and a dist with the diff applied.
While doing this, I ran across some cases outside of scala.collection that were pretty useful as well. In particular, the compiler, upon seeing { (x) => foo }, emits an anon class that extends scala.runtime.AbstractFunction1; but explicit inheritances from (A => B) instantiate all the trait stubs in Function1 (and there are a lot of them, thanks to @specialized). So I was able to trim even more by making use of AbstractFunction1 explicitly in a few places.
More news to come...
Re: Size of standard library and possible tricks to reduce its
Yes, this is a known issue. Great that you're looking into it too!
Best,Ismael
Re: Size of standard library and possible tricks to reduce its
I forgot to mention one pretty big caveat with this size-optimization work: It's going to make scaladoc a little more annoying.
Today, there are some cases where library-private classes exist in a class/trait hierarchy, but they're fairly rare. These changes will make that happen quite a bit more, meaning that some of the supertypes in the type declaration won't be clickable. In practice, all the types are available via the expandable linearization, but as you probably know, that list is pretty huge for a collection class.
I'm open to opinions on how this will affect users in practice. A workaround would be to continue to include the trait immediately after the abstract class type, even if the abstract class is an instantiation of exactly that trait ("extends AbstractSeq[A] with Seq[A] with ..."). This means a little more verbosity in the code, but no bytecode changes in practice -- would you all prefer to see it done this way?
Re: Size of standard library and possible tricks to reduce its
On Wed, Oct 19, 2011 at 6:49 AM, Todd Vierling wrote:
> Oh, of course. Well, after throwing only a couple hours of tweaking at it:
Great!
I am hopeful that the reduced indirection will also lead to improved
runtime performance. It will be interesting to do some benchmarking.
Best,
Ismael
Re: Size of standard library and possible tricks to reduce its
Hi Todd,
Very encouraging results so far!
Martin suggested [1] that one reason for the slowdowns in compilation with Scala 2.9.0 might have been the longer base type sequence for commonly used types after the addition of the GenXxx traits. A bunch of optimizations followed that restored (nay, bettered!) performance for 2.9.1. But you ought to verify that building the compiler with your new compiler isn't slower than before.
-jason
[1] http://www.scala-lang.org/node/9869#comment-42374