This page is no longer maintained — Please continue to the home page at www.scala-lang.org

Scala's collections not scalable because of... 32 bits sizes ?

30 replies
Olivier Chafik
Joined: 2010-10-04,
User offline. Last seen 1 year 38 weeks ago.

Hi all,

As you know, Scala's collection currently mimic Java's when it comes
to sizes ; it's all 32 bits ints everywhere : Traversable.size,
Seq.apply, .slice, .zipWithIndex...

The thing is, when Java came out about 15 years ago it was *good* when
you had 32 MB of RAM, so the 2+ billion elements limit seemed very
high up for collections. In contrast, tonight I've started configuring
my shiny new server with 24 GB of RAM.
2 billion elements already seems little to me (for scientifical
computations, for instance), but I'll let you imagine how small it
will look like in another 15 years.

As a result, to deal with very large amounts of data in Scala (on a 64
bits JVM, which is becoming commonplace) we're already facing
limitations. If it were only for Java arrays (which size is unlikely
to be ever changed from int to long), we could create a LargeArray[T]
that would split its data into many 2+G sub-arrays, but then it's near-
impossible to fit such a collection into Scala's standard collections
("def size: Long" cannot override "def size: Int").

So it's gonna be hard and painful but I believe we just need one more
breaking change in Scala to make it not future-proof, but really just
present-proof and actually *scalable*.

The naive way to go would be to change all the indexes / sizes-related
code from Int to Long and see everything collection-related break,
just about everywhere (that would be the "shortest suicide note in
history", cf.
http://stackoverflow.com/questions/1722726/is-the-scala-2-8-collections-...).

There are a few things that could make it possible to have a smooth
transition, though (although for sure I'm not seeing the full picture
here) :
- obviously, first annotate the Int -> Long modified types with some
@widenedin30 annotation ; Or use some "type SizeType = Long // Int"
- add some compiler magic to accept Ints with a warning and insert the
appropriate casts (in transition mode, which would be the default for
a few versions)
- ship a scala-refactoring-based tool with Scala that would help mass-
migrate code from the old to the new collections sizes (http://scala-
refactoring.org/), performing some clever usage analysis where
possible and failing with advice on manual migration where it's not
possible to modify the code automatically.

What do you guys think about all this ? (I know, Pandora box,
sorry...)
--
zOlive
http://ochafik.com

PS: told Martin about this when I recently had the chance to meet him,
but I believe this truly belongs to the debate mailing list :-)

Jesper Nordenberg
Joined: 2008-12-27,
User offline. Last seen 42 years 45 weeks ago.
Re: Scala's collections not scalable because of... 32 bits sizes

Yes, it's unfortunate that Scala's collection classes assumes Int
indexing (even up to Traversable). For example, it's common to have
files larger than 2 GB which means you can't create a Seq[Byte]
representing the content of such files. Although I don't think it's much
better to make Long the default indexing type, instead the topmost
collection classes should be index type agnostic or generic over the
indexing type, i.e. Seq[T] <: ISeq[Int, T].

/Jesper Nordenberg

olivier.chafik skrev 2011-02-15 00:08:
> Hi all,
>
> As you know, Scala's collection currently mimic Java's when it comes
> to sizes ; it's all 32 bits ints everywhere : Traversable.size,
> Seq.apply, .slice, .zipWithIndex...
>
> The thing is, when Java came out about 15 years ago it was *good* when
> you had 32 MB of RAM, so the 2+ billion elements limit seemed very
> high up for collections. In contrast, tonight I've started configuring
> my shiny new server with 24 GB of RAM.
> 2 billion elements already seems little to me (for scientifical
> computations, for instance), but I'll let you imagine how small it
> will look like in another 15 years.
>
> As a result, to deal with very large amounts of data in Scala (on a 64
> bits JVM, which is becoming commonplace) we're already facing
> limitations. If it were only for Java arrays (which size is unlikely
> to be ever changed from int to long), we could create a LargeArray[T]
> that would split its data into many 2+G sub-arrays, but then it's near-
> impossible to fit such a collection into Scala's standard collections
> ("def size: Long" cannot override "def size: Int").
>
> So it's gonna be hard and painful but I believe we just need one more
> breaking change in Scala to make it not future-proof, but really just
> present-proof and actually *scalable*.
>
> The naive way to go would be to change all the indexes / sizes-related
> code from Int to Long and see everything collection-related break,
> just about everywhere (that would be the "shortest suicide note in
> history", cf.
> http://stackoverflow.com/questions/1722726/is-the-scala-2-8-collections-...).
>
> There are a few things that could make it possible to have a smooth
> transition, though (although for sure I'm not seeing the full picture
> here) :
> - obviously, first annotate the Int -> Long modified types with some
> @widenedin30 annotation ; Or use some "type SizeType = Long // Int"
> - add some compiler magic to accept Ints with a warning and insert the
> appropriate casts (in transition mode, which would be the default for
> a few versions)
> - ship a scala-refactoring-based tool with Scala that would help mass-
> migrate code from the old to the new collections sizes (http://scala-
> refactoring.org/), performing some clever usage analysis where
> possible and failing with advice on manual migration where it's not
> possible to modify the code automatically.
>
> What do you guys think about all this ? (I know, Pandora box,
> sorry...)
> --
> zOlive
> http://ochafik.com
>
> PS: told Martin about this when I recently had the chance to meet him,
> but I believe this truly belongs to the debate mailing list :-)
>

Olivier Chafik
Joined: 2010-10-04,
User offline. Last seen 1 year 38 weeks ago.
Re: Scala's collections not scalable because of... 32 bits sizes
2011/2/15 Jesper Nordenberg <megagurka [at] yahoo [dot] com>
Yes, it's unfortunate that Scala's collection classes assumes Int indexing (even up to Traversable). For example, it's common to have files larger than 2 GB which means you can't create a Seq[Byte] representing the content of such files.

Indeed, (not-so-)large files are already off-range for collections... 
Although I don't think it's much better to make Long the default indexing type, instead the topmost collection classes should be index type agnostic or generic over the indexing type, i.e. Seq[T] <: ISeq[Int, T].

I agree Long is not better than Int in terms of abstraction / purity of the design, but performance-wise it might be a bad idea to put too much abstraction in there  (specialization of the size/index type might help, but with other drawbacks such as code bloat)...  Besides, putting abstractions to the service of different needs is generally a good thing, but when it's just to accommodate a technical issue I'm less convinced.
Long is big enough for the foreseeable future (isn't it ?), on 64 bits architectures it shouldn't involve too bad a performance loss (if any) and the resulting API wouldn't gain any extra complexity.  Cheers--zOlivehttp://ochafik.com

/Jesper Nordenberg

olivier.chafik skrev 2011-02-15 00:08:
Hi all,

As you know, Scala's collection currently mimic Java's when it comes
to sizes ; it's all 32 bits ints everywhere : Traversable.size,
Seq.apply, .slice, .zipWithIndex...<insert dozens of methods here>

The thing is, when Java came out about 15 years ago it was *good* when
you had 32 MB of RAM, so the 2+ billion elements limit seemed very
high up for collections. In contrast, tonight I've started configuring
my shiny new server with 24 GB of RAM.
2 billion elements already seems little to me (for scientifical
computations, for instance), but I'll let you imagine how small it
will look like in another 15 years.

As a result, to deal with very large amounts of data in Scala (on a 64
bits JVM, which is becoming commonplace) we're already facing
limitations. If it were only for Java arrays (which size is unlikely
to be ever changed from int to long), we could create a LargeArray[T]
that would split its data into many 2+G sub-arrays, but then it's near-
impossible to fit such a collection into Scala's standard collections
("def size: Long" cannot override "def size: Int").

So it's gonna be hard and painful but I believe we just need one more
breaking change in Scala to make it not future-proof, but really just
present-proof and actually *scalable*.

The naive way to go would be to change all the indexes / sizes-related
code from Int to Long and see everything collection-related break,
just about everywhere (that would be the "shortest suicide note in
history", cf.
http://stackoverflow.com/questions/1722726/is-the-scala-2-8-collections-library-a-case-of-the-longest-suicide-note-in-histo).

There are a few things that could make it possible to have a smooth
transition, though (although for sure I'm not seeing the full picture
here) :
- obviously, first annotate the Int ->  Long modified types with some
@widenedin30 annotation ; Or use some "type SizeType = Long // Int"
- add some compiler magic to accept Ints with a warning and insert the
appropriate casts (in transition mode, which would be the default for
a few versions)
- ship a scala-refactoring-based tool with Scala that would help mass-
migrate code from the old to the new collections sizes (http://scala-
refactoring.org/), performing some clever usage analysis where
possible and failing with advice on manual migration where it's not
possible to modify the code automatically.

What do you guys think about all this ? (I know, Pandora box,
sorry...)
--
zOlive
http://ochafik.com

PS: told Martin about this when I recently had the chance to meet him,
but I believe this truly belongs to the debate mailing list :-)



H-star Development
Joined: 2010-04-14,
User offline. Last seen 2 years 26 weeks ago.
Re: Re: Scala's collections not scalable because of... 32 bits

if i were the scala team, i would just c&p the whole collection package
and replace Int by Long. or scrap the 32 bit version. at some point,
computers will have so much memory that the wasted space won't matter
for small collections, and you'll need the lomgs for big ones.

Am 15.02.2011 19:09, schrieb Olivier Chafik:
> 2011/2/15 Jesper Nordenberg
>
>> Yes, it's unfortunate that Scala's collection classes assumes Int indexing
>> (even up to Traversable). For example, it's common to have files larger than
>> 2 GB which means you can't create a Seq[Byte] representing the content of
>> such files.
>
> Indeed, (not-so-)large files are already off-range for collections...
>
>
>> Although I don't think it's much better to make Long the default indexing
>> type, instead the topmost collection classes should be index type agnostic
>> or generic over the indexing type, i.e. Seq[T] <: ISeq[Int, T].
>>
> I agree Long is not better than Int in terms of abstraction / purity of the
> design, but performance-wise it might be a bad idea to put too much
> abstraction in there (specialization of the size/index type might help, but
> with other drawbacks such as code bloat)...
> Besides, putting abstractions to the service of different needs is generally
> a good thing, but when it's just to accommodate a technical issue I'm less
> convinced.
>
> Long is big enough for the foreseeable future (isn't it ?), on 64 bits
> architectures it shouldn't involve too bad a performance loss (if any) and
> the resulting API wouldn't gain any extra complexity.
>
> Cheers
> --
> zOlive
> http://ochafik.com
>
>
>> /Jesper Nordenberg
>>
>> olivier.chafik skrev 2011-02-15 00:08:
>>
>> Hi all,
>>> As you know, Scala's collection currently mimic Java's when it comes
>>> to sizes ; it's all 32 bits ints everywhere : Traversable.size,
>>> Seq.apply, .slice, .zipWithIndex...
>>>
>>> The thing is, when Java came out about 15 years ago it was *good* when
>>> you had 32 MB of RAM, so the 2+ billion elements limit seemed very
>>> high up for collections. In contrast, tonight I've started configuring
>>> my shiny new server with 24 GB of RAM.
>>> 2 billion elements already seems little to me (for scientifical
>>> computations, for instance), but I'll let you imagine how small it
>>> will look like in another 15 years.
>>>
>>> As a result, to deal with very large amounts of data in Scala (on a 64
>>> bits JVM, which is becoming commonplace) we're already facing
>>> limitations. If it were only for Java arrays (which size is unlikely
>>> to be ever changed from int to long), we could create a LargeArray[T]
>>> that would split its data into many 2+G sub-arrays, but then it's near-
>>> impossible to fit such a collection into Scala's standard collections
>>> ("def size: Long" cannot override "def size: Int").
>>>
>>> So it's gonna be hard and painful but I believe we just need one more
>>> breaking change in Scala to make it not future-proof, but really just
>>> present-proof and actually *scalable*.
>>>
>>> The naive way to go would be to change all the indexes / sizes-related
>>> code from Int to Long and see everything collection-related break,
>>> just about everywhere (that would be the "shortest suicide note in
>>> history", cf.
>>>
>>> http://stackoverflow.com/questions/1722726/is-the-scala-2-8-collections-...
>>> ).
>>>
>>> There are a few things that could make it possible to have a smooth
>>> transition, though (although for sure I'm not seeing the full picture
>>> here) :
>>> - obviously, first annotate the Int -> Long modified types with some
>>> @widenedin30 annotation ; Or use some "type SizeType = Long // Int"
>>> - add some compiler magic to accept Ints with a warning and insert the
>>> appropriate casts (in transition mode, which would be the default for
>>> a few versions)
>>> - ship a scala-refactoring-based tool with Scala that would help mass-
>>> migrate code from the old to the new collections sizes (http://scala-
>>> refactoring.org/), performing some clever usage analysis where
>>> possible and failing with advice on manual migration where it's not
>>> possible to modify the code automatically.
>>>
>>> What do you guys think about all this ? (I know, Pandora box,
>>> sorry...)
>>> --
>>> zOlive
>>> http://ochafik.com
>>>
>>> PS: told Martin about this when I recently had the chance to meet him,
>>> but I believe this truly belongs to the debate mailing list :-)
>>>
>>>

fanf
Joined: 2009-03-17,
User offline. Last seen 2 years 30 weeks ago.
Re: Re: Scala's collections not scalable because of... 32 bits

Le 15/02/2011 22:24, HamsterofDeath a écrit :
> if i were the scala team, i would just c&p the whole collection package
> and replace Int by Long. or scrap the 32 bit version. at some point,
> computers will have so much memory that the wasted space won't matter
> for small collections, and you'll need the lomgs for big ones.

But in the mid-time, you have a bunch of embedded devices which are
still 32 Bits and have not so much memory.

I don't have figures, so that may not be significant at all, but perhaps
such a move will actually be the fable "shortest suicide note in
history", at least for Scala on Android phones.

H-star Development
Joined: 2010-04-14,
User offline. Last seen 2 years 26 weeks ago.
Re: Re: Scala's collections not scalable because of... 32 bits

for those medival things, you can use a crippled scala lib or the old 32
bit version

Am 16.02.2011 07:39, schrieb Francois Armand:
> Le 15/02/2011 22:24, HamsterofDeath a écrit :
>> if i were the scala team, i would just c&p the whole collection package
>> and replace Int by Long. or scrap the 32 bit version. at some point,
>> computers will have so much memory that the wasted space won't matter
>> for small collections, and you'll need the lomgs for big ones.
>
> But in the mid-time, you have a bunch of embedded devices which are
> still 32 Bits and have not so much memory.
>
> I don't have figures, so that may not be significant at all, but
> perhaps such a move will actually be the fable "shortest suicide note
> in history", at least for Scala on Android phones.
>
>

paulbutcher
Joined: 2010-03-08,
User offline. Last seen 10 weeks 5 days ago.
Re: Re: Scala's collections not scalable because of... 32 bits

Speaking as someone who's in the process of developing on Android with Scala, I would like to strongly vote against this course of action.

On 16 Feb 2011, at 17:33, HamsterofDeath wrote:
> for those medival things, you can use a crippled scala lib or the old 32
> bit version
>
> Am 16.02.2011 07:39, schrieb Francois Armand:
>> Le 15/02/2011 22:24, HamsterofDeath a écrit :
>>> if i were the scala team, i would just c&p the whole collection package
>>> and replace Int by Long. or scrap the 32 bit version. at some point,
>>> computers will have so much memory that the wasted space won't matter
>>> for small collections, and you'll need the lomgs for big ones.
>>
>> But in the mid-time, you have a bunch of embedded devices which are
>> still 32 Bits and have not so much memory.
>>
>> I don't have figures, so that may not be significant at all, but
>> perhaps such a move will actually be the fable "shortest suicide note
>> in history", at least for Scala on Android phones.
>>
>>
>

--
paul.butcher->msgCount++

Snetterton, Castle Combe, Cadwell Park...
Who says I have a one track mind?

http://www.paulbutcher.com/
LinkedIn: http://www.linkedin.com/in/paulbutcher
MSN: paul [at] paulbutcher [dot] com
AIM: paulrabutcher
Skype: paulrabutcher

H-star Development
Joined: 2010-04-14,
User offline. Last seen 2 years 26 weeks ago.
Re: Re: Scala's collections not scalable because of... 32 bits

so you WANT to use a full 64 bit scala lib? great ;)

Am 16.02.2011 18:37, schrieb Paul Butcher:
> Speaking as someone who's in the process of developing on Android with Scala, I would like to strongly vote against this course of action.
>
> On 16 Feb 2011, at 17:33, HamsterofDeath wrote:
>> for those medival things, you can use a crippled scala lib or the old 32
>> bit version
>>
>> Am 16.02.2011 07:39, schrieb Francois Armand:
>>> Le 15/02/2011 22:24, HamsterofDeath a écrit :
>>>> if i were the scala team, i would just c&p the whole collection package
>>>> and replace Int by Long. or scrap the 32 bit version. at some point,
>>>> computers will have so much memory that the wasted space won't matter
>>>> for small collections, and you'll need the lomgs for big ones.
>>> But in the mid-time, you have a bunch of embedded devices which are
>>> still 32 Bits and have not so much memory.
>>>
>>> I don't have figures, so that may not be significant at all, but
>>> perhaps such a move will actually be the fable "shortest suicide note
>>> in history", at least for Scala on Android phones.
>>>
>>>
> --
> paul.butcher->msgCount++
>
> Snetterton, Castle Combe, Cadwell Park...
> Who says I have a one track mind?
>
> http://www.paulbutcher.com/
> LinkedIn: http://www.linkedin.com/in/paulbutcher
> MSN: paul [at] paulbutcher [dot] com
> AIM: paulrabutcher
> Skype: paulrabutcher
>
>

Russ P.
Joined: 2009-01-31,
User offline. Last seen 1 year 26 weeks ago.
Re: Re: Scala's collections not scalable because of... 32 bits
I know next to nothing about the internals of Scala or any programming language, but I won't let that stop me from making a suggestion. Why not just have two versions of the collection libraries available for download? The 64-bit version could be the "default" version, and the 32-bit version would be available for embedded devices or other 32-bit machines. The difference in source code for the two version would be exactly one line:

type Intx = Int // for 32-bit libraries

type Intx = Long // for 64-bit libraries

--Russ P.


On Tue, Feb 15, 2011 at 10:39 PM, Francois Armand <fanf42 [at] gmail [dot] com> wrote:
Le 15/02/2011 22:24, HamsterofDeath a écrit :
if i were the scala team, i would just c&p the whole collection package
and replace Int by Long. or scrap the 32 bit version. at some point,
computers will have so much memory that the wasted space won't matter
for small collections, and you'll need the lomgs for big ones.

But in the mid-time, you have a bunch of embedded devices which are still 32 Bits and have not so much memory.

I don't have figures, so that may not be significant at all, but perhaps such a move will actually be the fable "shortest suicide note in history", at least for Scala on Android phones.


Jörg W Mittag
Joined: 2009-12-31,
User offline. Last seen 42 years 45 weeks ago.
Re: Scala's collections not scalable because of... 32 bits sizes

olivier.chafik wrote:
> As you know, Scala's collection currently mimic Java's when it comes
> to sizes ; it's all 32 bits ints everywhere : Traversable.size,
> Seq.apply, .slice, .zipWithIndex...
>
> The thing is, when Java came out about 15 years ago it was *good* when
> you had 32 MB of RAM, so the 2+ billion elements limit seemed very
> high up for collections. In contrast, tonight I've started configuring
> my shiny new server with 24 GB of RAM.
> 2 billion elements already seems little to me (for scientifical
> computations, for instance), but I'll let you imagine how small it
> will look like in another 15 years.

Just as a data point: the ISO CLI specification allows array indices
to be either 32 or 64 bits. Microsoft chose 32, Novell chose 64,
because Mono is actually quite widely used for scientific computing,
and their customers *are* using arrays with 10 billion entries. And
that was years ago.

> What do you guys think about all this ? (I know, Pandora box,
> sorry...)

As was already pointed out by others in this thread, the average
computer is getting smaller, slower and narrower, not bigger. Most of
my non-technical friends have *upgraded* their 64 bit, dual-core, 4
gig machines to 32 bit, single core, 256 MB ones. For them, 26 bits
would be plenty enough.

Ideally, there would be a way to make them both larger and smaller at
the same time.

jwm

ichoran
Joined: 2009-08-14,
User offline. Last seen 2 years 3 weeks ago.
Re: Re: Scala's collections not scalable because of... 32 bits
2011/2/16 Jörg W Mittag <2BUsenet [at] googlemail [dot] com" rel="nofollow">JoergWMittag+Usenet [at] googlemail [dot] com>
As was already pointed out by others in this thread, the average
computer is getting smaller, slower and narrower, not bigger. Most of
my non-technical friends have *upgraded* their 64 bit, dual-core, 4
gig machines to 32 bit, single core, 256 MB ones. For them, 26 bits
would be plenty enough.

Ideally, there would be a way to make them both larger and smaller at
the same time.

I think you make a good point about people selecting lower-performance devices.

I am afraid that the only way to support both sides without either risking all sorts of subtle bugs or failing to meet some people's needs would be to fork the collection library, and have mixed-in 64 bit or 32 bit traits, much like the mutable/immutable split.

Alternatively, at the risk of failing to meet a few people's needs, one could write a much smaller 64 bit library that contained the key methods and presented 32 bit views of subsets of the data where you'd have the power of the full collections library.

Simply switching to Long seems unwise because of increased memory consumption and difficulties with Java interoperability--many things expect ints, not longs, including the JVM itself with sizes of primitive arrays.  (You'd have to stitch several together in large array-based collections.)

  --Rex

Aydjen
Joined: 2009-08-21,
User offline. Last seen 1 year 28 weeks ago.
Re: Re: Scala's collections not scalable because of... 32 bits

Hi,

I am a bit hesitant to write this ... but ... does anybody have good numbers to back this all up?

To add another anecdote to the discussion, I for one certainly don't see a move to slower and smaller devices. If it were so, shouldn't we also forget about all this fancy concurrent stuff, like computations that scale to dozens (hundreds?) of processors? Customers going back to slower single-core machines out of free will would certainly be one easy (and unexpected) solution to the concurrency problem...

I have a hard time imagining that the increase of the collections' size attribute to 64bit will cause a noticeable increase in memory consumption. Add a field to one of the collection classes and the effect should be bigger...

Maybe I am being ignorant here, so if you can share any insights, I am happy to learn.

Kind regards
Andreas

Rex Kerr wrote:

> 2011/2/16 Jörg W Mittag
> As was already pointed out by others in this thread, the average
> computer is getting smaller, slower and narrower, not bigger. Most of
> my non-technical friends have *upgraded* their 64 bit, dual-core, 4
> gig machines to 32 bit, single core, 256 MB ones. For them, 26 bits
> would be plenty enough.
>
> Ideally, there would be a way to make them both larger and smaller at
> the same time.
>
> I think you make a good point about people selecting lower-performance devices.
>
> I am afraid that the only way to support both sides without either risking all sorts of subtle bugs or failing to meet some people's needs would be to fork the collection library, and have mixed-in 64 bit or 32 bit traits, much like the mutable/immutable split.
>
> Alternatively, at the risk of failing to meet a few people's needs, one could write a much smaller 64 bit library that contained the key methods and presented 32 bit views of subsets of the data where you'd have the power of the full collections library.
>
> Simply switching to Long seems unwise because of increased memory consumption and difficulties with Java interoperability--many things expect ints, not longs, including the JVM itself with sizes of primitive arrays. (You'd have to stitch several together in large array-based collections.)
>
> --Rex
>

Bernd Johannes
Joined: 2011-01-28,
User offline. Last seen 42 years 45 weeks ago.
Re: [Bulk] Re: Re: Scala's collections not scalable because of.

Hi folks,

as we are in scala-debate I dare to add my well... esoteric thoughts.

Can't we have something that frees us from the hassle of talking to the
machine about sizes of registers? On the one hand we strive for abstractions
that touch the sky while we are counting bits with the fingers of our other
hand. It seems so... absurd.

Maybe it's not feasible in the "real world" - but somehow byte, int, long,
short and what else (I think microsoft has invented some more primitive types)
just talk about ranges of natural numbers.
For me that would be the way to go: tell the machine you want natural numbers
- and if we want to make the compiler happy we can add some promises about the
range to expect, so that this clever tool can come up with some optimizations
on its own. And even better: it can cleverly choose this optimization based on
the target platform.
In the end int etc. is exactly that: an implicit promise to keep that range.
But its tragic that this promise cannot be changed or adjusted without
breaking everybody's code.

Well - I guess this kind of abstraction will never come true because it might
be very difficult to achieve (or painfully slow) - and it surely would have
its own very special pitfalls as well.

Greetings
Bernd

Am Mittwoch, 16. Februar 2011, 19:56:11 schrieb Russ Paielli:
> I know next to nothing about the internals of Scala or any programming
> language, but I won't let that stop me from making a suggestion. Why not
> just have two versions of the collection libraries available for download?
> The 64-bit version could be the "default" version, and the 32-bit version
> would be available for embedded devices or other 32-bit machines. The
> difference in source code for the two version would be exactly one line:
>
> type Intx = Int // for 32-bit libraries
>
> type Intx = Long // for 64-bit libraries
>
> --Russ P.
>

ichoran
Joined: 2009-08-14,
User offline. Last seen 2 years 3 weeks ago.
Re: [Bulk] Re: Re: Scala's collections not scalable because of.
On Wed, Feb 16, 2011 at 3:31 PM, Bernd Johannes <bjohanns [at] bacon [dot] de> wrote:
Hi folks,

as we are in scala-debate I dare to add my well... esoteric thoughts.

Can't we have something that frees us from the hassle of talking to the
machine about sizes of registers?

Depends.  Do you want things to be fast?  Do you want them to be small?  Then no, you can't.

Otherwise, sure!  Many highest-level languages have arbitrary precision integers as the default (e.g. Python).  Scala supports BigInt pretty transparently too.  You can use it whenever and wherever you want!

Hardware support for small integers (1, 2, 4, or 8 bytes) makes them perform *extremely* well.  You can usually do six operations on such integers in the time that it takes to execute one if statement or jump one pointer or do anything else that allows for abstraction.  If we had hardware support for arbitrary precision arithmetic (including hardware support for variable-size structures and efficient memory layout for said structures), then we could abstract away this detail and not worry.  But right now we don't.

Whether you care or not really depends on your use case.  But deciding that users of a language should not expect good performance in basic components of the language (e.g. collections library) will drive away a significant subset of users for whom that is important.

  --Rex

Naftoli Gugenheim
Joined: 2008-12-17,
User offline. Last seen 42 years 45 weeks ago.
Re: Re: Scala's collections not scalable because of... 32 bits


On Wed, Feb 16, 2011 at 3:28 PM, Andreas Flierl <andreas [at] flierl [dot] eu> wrote:
To add another anecdote to the discussion, I for one certainly don't see a move to slower and smaller devices. If it were so, shouldn't we also forget about all this fancy concurrent stuff, like computations that scale to dozens (hundreds?) of processors? Customers going back to slower single-core machines out of free will would certainly be one easy (and unexpected) solution to the concurrency problem...

I don't understand your question. Many people are going to have powerful machines, and many people are going to have small devices. It's not an either or!
Adding parallel collections doesn't make Scala run less well on small devices. Why not let multi-cores benefit?
But you can't change a behavior that will negatively impact a significant set of devices!

Aydjen
Joined: 2009-08-21,
User offline. Last seen 1 year 28 weeks ago.
Re: Re: Scala's collections not scalable because of... 32 bits

Hi,

Naftoli Gugenheim wrote:

> On Wed, Feb 16, 2011 at 3:28 PM, Andreas Flierl wrote:
> To add another anecdote to the discussion, I for one certainly don't see a move to slower and smaller devices. If it were so, shouldn't we also forget about all this fancy concurrent stuff, like computations that scale to dozens (hundreds?) of processors? Customers going back to slower single-core machines out of free will would certainly be one easy (and unexpected) solution to the concurrency problem...
>
> I don't understand your question. Many people are going to have powerful machines, and many people are going to have small devices. It's not an either or!

You're absolutely right.

> Adding parallel collections doesn't make Scala run less well on small devices. Why not let multi-cores benefit?

It was a rhetorical question. I admit it was a really bad comparison.

> But you can't change a behavior that will negatively impact a significant set of devices!

Yes, correct, but does it really, significantly increase the memory usage? How many millions of collection instances are live in the average smartphone app?

Kind regards
Andreas

phlegmaticprogrammer
Joined: 2010-07-23,
User offline. Last seen 2 years 15 weeks ago.
Re: Re: Scala's collections not scalable because of... 32 bits
Also, the newest smartphones are already dual-core and have 1GB of RAM. Not that small at all ...
- Steven Obua
On 16.02.2011, at 22:34, Naftoli Gugenheim wrote:


On Wed, Feb 16, 2011 at 3:28 PM, Andreas Flierl <andreas [at] flierl [dot] eu> wrote:
To add another anecdote to the discussion, I for one certainly don't see a move to slower and smaller devices. If it were so, shouldn't we also forget about all this fancy concurrent stuff, like computations that scale to dozens (hundreds?) of processors? Customers going back to slower single-core machines out of free will would certainly be one easy (and unexpected) solution to the concurrency problem...

I don't understand your question. Many people are going to have powerful machines, and many people are going to have small devices. It's not an either or!
Adding parallel collections doesn't make Scala run less well on small devices. Why not let multi-cores benefit?
But you can't change a behavior that will negatively impact a significant set of devices!


Naftoli Gugenheim
Joined: 2008-12-17,
User offline. Last seen 42 years 45 weeks ago.
Re: Re: Scala's collections not scalable because of... 32 bits
Let me ask a different question.
How many places is Int used, that would become Long?


On Wed, Feb 16, 2011 at 4:46 PM, Andreas Flierl <andreas [at] flierl [dot] eu> wrote:
Hi,

Naftoli Gugenheim wrote:

> On Wed, Feb 16, 2011 at 3:28 PM, Andreas Flierl <andreas [at] flierl [dot] eu> wrote:
> To add another anecdote to the discussion, I for one certainly don't see a move to slower and smaller devices. If it were so, shouldn't we also forget about all this fancy concurrent stuff, like computations that scale to dozens (hundreds?) of processors? Customers going back to slower single-core machines out of free will would certainly be one easy (and unexpected) solution to the concurrency problem...
>
> I don't understand your question. Many people are going to have powerful machines, and many people are going to have small devices. It's not an either or!

You're absolutely right.

> Adding parallel collections doesn't make Scala run less well on small devices. Why not let multi-cores benefit?

It was a rhetorical question. I admit it was a really bad comparison.

> But you can't change a behavior that will negatively impact a significant set of devices!

Yes, correct, but does it really, significantly increase the memory usage? How many millions of collection instances are live in the average smartphone app?

Kind regards
Andreas

Aydjen
Joined: 2009-08-21,
User offline. Last seen 1 year 28 weeks ago.
Re: Re: Scala's collections not scalable because of... 32 bits

Naftoli Gugenheim wrote:

> Let me ask a different question.
> How many places is Int used, that would become Long?

Many. Though nearly all of them would seem to be method parameters or return values, which would be allocated on the stack.

Let's take List as an example. List gets its size number from uhm... SeqLike, which defines it to be length... which is supplied by LinearSeq, which implements it by traversing the list and counting the elements. Again, if size would be a Long, nothing would be allocated on the heap, if I understand things right.

So, inside the library, for this case, I don't see any increased memory consumption at all. Of course if you store collection sizes somewhere (like another collection or object fields), that would cause increased memory consumption.

Please correct me if I'm wrong.

Kind regards
Andreas

Ismael Juma
Joined: 2009-01-07,
User offline. Last seen 42 years 45 weeks ago.
Re: Re: Scala's collections not scalable because of... 32 bits
On Wednesday, February 16, 2011 10:02:53 PM UTC, Andreas Flierl wrote:

So, inside the library, for this case, I don't see any increased memory consumption at all. Of course if you store collection sizes somewhere (like another collection or object fields), that would cause increased memory consumption.

In many cases changing it to long would not cause much of an issue. However, if you actually want to support array-based collections with more elements than presently supported, then you need to use a data structure that potentially uses multiple JVM arrays internally. That would require a decent amount of implementation effort.
Personally, I would be happy if changes were made to the Scala library so that others could implement collections with support for long indices without ugly hacks. A second step would be to update Scala collection implementations to also handle this. Who knows, maybe we'll have JVM arrays with 64-bit indices by then.
Best,Ismael

dcsobral
Joined: 2009-04-23,
User offline. Last seen 38 weeks 5 days ago.
Re: Re: Scala's collections not scalable because of... 32 bits
2011/2/16 Jörg W Mittag <2BUsenet [at] googlemail [dot] com" rel="nofollow">JoergWMittag+Usenet [at] googlemail [dot] com>
> What do you guys think about all this ? (I know, Pandora box, > sorry...)

As was already pointed out by others in this thread, the average
computer is getting smaller, slower and narrower, not bigger. Most of
my non-technical friends have *upgraded* their 64 bit, dual-core, 4
gig machines to 32 bit, single core, 256 MB ones. For them, 26 bits
would be plenty enough.

 I've heard that before, and its a fallacy. That's like saying the average computer was getting smaller back when personal computers were first introduced. Yes, compared to the big iron at the time, they were smaller and much less capable, but they kept increasing in capacity all the same.
If your non-technical friends get a new phone this year, they'll be getting a dual core with 1 GB, unless they wait until the end of the year, and bag one of the early quad cores the might be released by then. And if they get a tablet -- something quite likely these days -- they'll already get a more capable computer.
On the other hand, what they DO with these new computers of them is to check their twitter, foursquare, facebook and gmail, which means that, as a practical matter, they are using cloud computing, with thousands of computers doing meeting their processing needs. Interestingly, two of the four examples I gave use Scala. 
And, back to phones, I have 8 GB storage in mine, and that's straight out of factory. That's already past the 31 bits limit. Sure, I don't have enough _RAM_ for all that, but if I created an disk-backed IndexedSeq, I'd already face the index problem.

Design something with 32 bits in mind today, and be prepared to go back to the drawing board two years from now.

--
Daniel C. Sobral

I travel to the future all the time.
Bernd Johannes
Joined: 2011-01-28,
User offline. Last seen 42 years 45 weeks ago.
Re: Scala's collections not scalable because of... 32 bits sizes

Am Mittwoch, 16. Februar 2011, 22:27:56 schrieb Rex Kerr:
> On Wed, Feb 16, 2011 at 3:31 PM, Bernd Johannes wrote:
> > Hi folks,
> >
> > as we are in scala-debate I dare to add my well... esoteric thoughts.
> >
> > Can't we have something that frees us from the hassle of talking to the
> > machine about sizes of registers?
>
> Depends. Do you want things to be fast? Do you want them to be small?
> Then no, you can't.

I know... but I hope that someone can come up with a clever idea at some point
in the future.

> Whether you care or not really depends on your use case. But deciding that
> users of a language should not expect good performance in basic components
> of the language (e.g. collections library) will drive away a significant
> subset of users for whom that is important.

Of course we can't just say: ok let's use bigints everywhere. And for scala it
has been already decided long ago that we don't get an abstraction over the
"internal sizes" of collection indexes... because the API tells us that it
uses Int... period.

But I was thinking along the line of compiler cleverness (no scala code - just
to sketch my thoughts):

Library:
class Array[T] {
def apply (i: Naturalnumber) : T
def size: Naturalnumber
}

Now the compiler could be able to provide "fitting" projections on the given
hardware / runtime environment according to the actual use case of the client
(compile-time optimization).

Client:
// 0..10 represents a from .. to notation - just to make up an example
val a = new Array[String](0 .. 10)
var i : Int = 0
a(i) = "hello"
i = 65538
a(i) = "world" // index out of bounds

Internal target representations (transparent to the client):

class Array[String] {
def apply (i: Byte) : String
def size: Byte
}
or
class Array[String] {
def apply (i: Int) : String
def size: Byte
}
The compiler could choose based on the use pattern (is it too costly to
constantly cast from Int to Byte?) which of the representations fits the bill.

Client:
val b = new Array[Int](33'333'333'333'333 .. 33'333'333'333'355)

Intermediate internal optimization (transparent to the client):

class Array[Int] {
def apply (i: Long) : Int
def size: Byte
}

While the type parameter is almost irrelevant to the library binaries except
for primitive type specialization in itself, the primitive type API signature
is not.
So the library binaries (and the fitting runtime environment) must either
provide any acceptable primitive type for the compler to choose in advance or
provide a mechanism to derive such a representation from a binary "template"
(or require the client to always compile and deploy the whole library binaries
along with its client code... but that's not really an option).

The whole idea (which is an old one... not invented by me :-) rests on the
definition of required value ranges by the client (here the index from..to).
So the compiler can select hardware representations that can handle the range.
Anything outside the range does not have to fit into the representation
because it is already an error in itself (index out of bounds // broken
promise). So the compiler can savely choose Byte as index and size
representation in the first example and Long as Index and Byte as size in the
second one (and the hypothetical language could provide syntax for the client
to choose the representation explicitely - for these uses where the compiler
cannot derive the range promise).

I agree - that's not easy at all. The "specialized" annotation of scala goes a
first step toward that direction (targeting the type parameter). But I think
on the actual JVM such API-flexibility is not feasible at all. There has to be
some support from the execution environment to be able to develop such a
feature with reasonable effort and success.

But I have some hope, that we will see something like that in the future.

Greetings
Bernd

Igor Malinin
Joined: 2011-02-18,
User offline. Last seen 42 years 45 weeks ago.
Re: Scala's collections not scalable because of... 32 bits sizes

At least in the todays JVM you don't need this, as you can't have more
pointers than fit into 32 bit. Remember, size of pointer is fixed to 32
bit by the JVM spec., as it should fit one stack variable slot. So, you
cannot have more in-memory objects...

Still the idea can be useful for sparce and not-in-memory collections.

On 2011-02-17 0:21, ijuma wrote:
> In many cases changing it to long would not cause much of an issue.
> However, if you actually want to support array-based collections with
> more elements than presently supported, then you need to use a data
> structure that potentially uses multiple JVM arrays internally. That
> would require a decent amount of implementation effort.

Johannes Rudolph 2
Joined: 2010-02-12,
User offline. Last seen 42 years 45 weeks ago.
Re: Re: Scala's collections not scalable because of... 32 bits

On Fri, Feb 18, 2011 at 11:17 AM, Igor Malinin wrote:
> At least in the todays JVM you don't need this, as you can't have more
> pointers than fit into 32 bit. Remember, size of pointer is fixed to 32 bit
> by the JVM spec., as it should fit one stack variable slot. So, you cannot
> have more in-memory objects...

I don't think this is correct. Where should that be defined like that?
What would be the purpose of CompressedOops in this case.

Igor Malinin
Joined: 2011-02-18,
User offline. Last seen 42 years 45 weeks ago.
Re: Re: Scala's collections not scalable because of... 32 bits
On 2011-02-18 12:38, Johannes Rudolph wrote:
AANLkTincJXVYYpQyz7-W+Epq9RdTuqbewO0QBoWmgxjn [at] mail [dot] gmail [dot] com" type="cite">
On Fri, Feb 18, 2011 at 11:17 AM, Igor Malinin igorzep [at] gmail [dot] com (<igorzep [at] gmail [dot] com>) wrote:
At least in the todays JVM you don't need this, as you can't have more
pointers than fit into 32 bit. Remember, size of pointer is fixed to 32 bit
by the JVM spec., as it should fit one stack variable slot. So, you cannot
have more in-memory objects...
I don't think this is correct. Where should that be defined like that?
What would be the purpose of CompressedOops in this case

From
http://wikis.sun.com/display/HotSpotInternals/CompressedOops

This quote:
"This allows applications to address up to four billion objects (not bytes), or a heap size of up to about 32Gb. At the same time, data structure compactness is competitive with ILP32 mode."

So, we still cannot have more OBJECTS than fit into 32 bit... The only bad thing is that int in Java is signed, so we lose 1 bit here, but 31/32 bit is not as big difference as 32/64 bit.
Johannes Rudolph 2
Joined: 2010-02-12,
User offline. Last seen 42 years 45 weeks ago.
Re: Re: Scala's collections not scalable because of... 32 bits

On Fri, Feb 18, 2011 at 11:48 AM, Igor Malinin wrote:
> This quote:
> "This allows applications to address up to four billion objects (not bytes),
> or a heap size of up to about 32Gb. At the same time, data structure
> compactness is competitive with ILP32 mode."
>
> So, we still cannot have more OBJECTS than fit into 32 bit... The only bad
> thing is that int in Java is signed, so we lose 1 bit here, but 31/32 bit is
> not as big difference as 32/64 bit.

Interesting, but that sounds more like an implementation detail of
CompressedOops than necessarily by specification. If you switch off
CompressedOops (don't know if that's possible in Hotspot) you should
be able to address the complete address space.

(e.g. see IBM's JVM:
http://publib.boulder.ibm.com/infocenter/javasdk/v6r0/index.jsp?topic=/c...

"Use -Xcompressedrefs in any of these situations:
When your Java applications does not need more than a 25 GB Java heap.")

IIRC the size of stack and local variable slots are not fixed to one
particular bitsize in the specification of the JVM.

Igor Malinin
Joined: 2011-02-18,
User offline. Last seen 42 years 45 weeks ago.
Re: Re: Scala's collections not scalable because of... 32 bits

On 2011-02-18 13:15, Johannes Rudolph wrote:
> On Fri, Feb 18, 2011 at 11:48 AM, Igor Malinin wrote:
>> This quote:
>> "This allows applications to address up to four billion objects (not bytes),
>> or a heap size of up to about 32Gb. At the same time, data structure
>> compactness is competitive with ILP32 mode."
>>
>> So, we still cannot have more OBJECTS than fit into 32 bit... The only bad
>> thing is that int in Java is signed, so we lose 1 bit here, but 31/32 bit is
>> not as big difference as 32/64 bit.
> Interesting, but that sounds more like an implementation detail of
> CompressedOops than necessarily by specification. If you switch off
> CompressedOops (don't know if that's possible in Hotspot) you should
> be able to address the complete address space.
>
> (e.g. see IBM's JVM:
> http://publib.boulder.ibm.com/infocenter/javasdk/v6r0/index.jsp?topic=/c...
>
> "Use -Xcompressedrefs in any of these situations:
> When your Java applications does not need more than a 25 GB Java heap.")
>
> IIRC the size of stack and local variable slots are not fixed to one
> particular bitsize in the specification of the JVM.
Ok. I rechecked the spec. and seems that you are correct. The
implementation bit-size is not fixed in it. But on the other side
long/double are fixed to 64 bit, and are required to occupy two local
variable slots on the stack, so, in such a 64-bit JVM we will have
double stack memory usage than necessary even for native 64 bit
variables. Hmm... JVM bytecode is too optimized for 32 bit architectures...

Jesper Nordenberg 2
Joined: 2011-02-17,
User offline. Last seen 42 years 45 weeks ago.
Re: Scala's collections not scalable because of... 32 bits sizes

No, it isnt. JVM bytecode is not optimized at all, it's a sort of high
level machine code. The optimized target machine code probably won't
look anything like the byte code and variables placed on the JVM stack
might be placed in CPU registers or optimized away completely. So,
that a long/double takes two stack slots in the JVM doesn't mean
anything for performance really.

/Jesper Nordenberg

On 18 Feb, 16:09, Igor Malinin wrote:
> On 2011-02-18 13:15, Johannes Rudolph wrote:
>
> > On Fri, Feb 18, 2011 at 11:48 AM, Igor Malinin  wrote:
> >> This quote:
> >> "This allows applications to address up to four billion objects (not bytes),
> >> or a heap size of up to about 32Gb. At the same time, data structure
> >> compactness is competitive with ILP32 mode."
>
> >> So, we still cannot have more OBJECTS than fit into 32 bit... The only bad
> >> thing is that int in Java is signed, so we lose 1 bit here, but 31/32 bit is
> >> not as big difference as 32/64 bit.
> > Interesting, but that sounds more like an implementation detail of
> > CompressedOops than necessarily by specification. If you switch off
> > CompressedOops (don't know if that's possible in Hotspot) you should
> > be able to address the complete address space.
>
> > (e.g. see IBM's JVM:
> >http://publib.boulder.ibm.com/infocenter/javasdk/v6r0/index.jsp?topic...
>
> > "Use -Xcompressedrefs in any of these situations:
> > When your Java applications does not need more than a 25 GB Java heap.")
>
> > IIRC the size of stack and local variable slots are not fixed to one
> > particular bitsize in the specification of the JVM.
>
> Ok. I rechecked the spec. and seems that you are correct. The
> implementation bit-size is not fixed in it. But on the other side
> long/double are fixed to 64 bit, and are required to occupy two local
> variable slots on the stack, so, in such a 64-bit JVM we will have
> double stack memory usage than necessary even for native 64 bit
> variables. Hmm... JVM bytecode is too optimized for 32 bit architectures...

Igor Malinin
Joined: 2011-02-18,
User offline. Last seen 42 years 45 weeks ago.
Re: Scala's collections not scalable because of... 32 bits sizes

It is. I remember the days when there was no target-machine-optimized
code. There were only interpreters, and today many embedded JVM still
has only an interpreter. This high-level-machine-code is optimized to 32
bit. If would not not require two-slot allocation for longs and doubles
if it was not optimized and truly high-level, and VM itself would be
able to determine the required machine-specific size on stack. IIRC this
is actually done on dotNET IL where every type always take exactly one
local variable slot.

It is enough to call it optimized just because it favors JVM
implementations on 32-bit architectures, at least for interpreters it is
true. Optimized doesn't always mean for performance. Surely for
highly-optimized native compiler it is not hard to eliminate the waste
of space in this 'high-level' representation.

On 2011-02-18 17:59, Jesper Nordenberg wrote:
> No, it isnt. JVM bytecode is not optimized at all, it's a sort of high
> level machine code. The optimized target machine code probably won't
> look anything like the byte code and variables placed on the JVM stack
> might be placed in CPU registers or optimized away completely. So,
> that a long/double takes two stack slots in the JVM doesn't mean
> anything for performance really.
>
> /Jesper Nordenberg
>
> On 18 Feb, 16:09, Igor Malinin wrote:
>> On 2011-02-18 13:15, Johannes Rudolph wrote:
>>
>>> On Fri, Feb 18, 2011 at 11:48 AM, Igor Malinin wrote:
>>>> This quote:
>>>> "This allows applications to address up to four billion objects (not bytes),
>>>> or a heap size of up to about 32Gb. At the same time, data structure
>>>> compactness is competitive with ILP32 mode."
>>
>>>> So, we still cannot have more OBJECTS than fit into 32 bit... The only bad
>>>> thing is that int in Java is signed, so we lose 1 bit here, but 31/32 bit is
>>>> not as big difference as 32/64 bit.
>>> Interesting, but that sounds more like an implementation detail of
>>> CompressedOops than necessarily by specification. If you switch off
>>> CompressedOops (don't know if that's possible in Hotspot) you should
>>> be able to address the complete address space.
>>
>>> (e.g. see IBM's JVM:
>>> http://publib.boulder.ibm.com/infocenter/javasdk/v6r0/index.jsp?topic...
>>
>>> "Use -Xcompressedrefs in any of these situations:
>>> When your Java applications does not need more than a 25 GB Java heap.")
>>
>>> IIRC the size of stack and local variable slots are not fixed to one
>>> particular bitsize in the specification of the JVM.
>>
>> Ok. I rechecked the spec. and seems that you are correct. The
>> implementation bit-size is not fixed in it. But on the other side
>> long/double are fixed to 64 bit, and are required to occupy two local
>> variable slots on the stack, so, in such a 64-bit JVM we will have
>> double stack memory usage than necessary even for native 64 bit
>> variables. Hmm... JVM bytecode is too optimized for 32 bit architectures...
>

Ismael Juma
Joined: 2009-01-07,
User offline. Last seen 42 years 45 weeks ago.
Re: Scala's collections not scalable because of... 32 bits sizes
On Friday, February 18, 2011 10:17:34 AM UTC, Igor Malinin wrote:
At least in the todays JVM you don't need this, as you can't have more
pointers than fit into 32 bit. Remember, size of pointer is fixed to 32
bit by the JVM spec., as it should fit one stack variable slot. So, you
cannot have more in-memory objects...

I see that this was clarified eventually, but this limitation would make no sense at all in the JVM. Note that there are people who use heaps of several hundred GBs with the Azul JVM.
Best,Ismael 

Ismael Juma
Joined: 2009-01-07,
User offline. Last seen 42 years 45 weeks ago.
Re: Re: Scala's collections not scalable because of... 32 bits
On Friday, February 18, 2011 11:15:00 AM UTC, Johannes Rudolph wrote:

Interesting, but that sounds more like an implementation detail of
CompressedOops than necessarily by specification. If you switch off
CompressedOops (don't know if that's possible in Hotspot) you should
be able to address the complete address space.

CompressedOops is only enabled if the Xmx is below a certain value (and even that is only on by default from JDK6u23 onwards). Also, they've made it possible for CompressedOops to address more than 32GB by changing the object alignment, but the user must pass an option for this as it has trade-offs (some memory may be wasted due to fragmentation).
Best,Ismael

Copyright © 2012 École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland