This page is no longer maintained — Please continue to the home page at www.scala-lang.org

REPL: null-values in parallel ranges?

170 replies

Thu, 2011-03-31, 21:42

Bernd Johannes

Joined: 2011-01-28,

Hi scalaists

this evening I started playing a little with the parallel collections and this
crossed my way...
This is most likely only a REPL display effect - but it felt disturbing
somehow (thinking about races and uninitialized values):

-----------------
bjo@noordx:~/Tools/scala-2.9.0.RC1> bin/scala -J-Xmx3000m
Welcome to Scala version 2.9.0.RC1 (Java HotSpot(TM) 64-Bit Server VM, Java
1.6.0_24).
Type in expressions to have them evaluated.
Type :help for more information.

scala> val big = (1 to 10000000)
big: scala.collection.immutable.Range.Inclusive = Range(1, 2, 3, 4, 5, 6, 7,
8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27,
28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46,
47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65,
66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84,
85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102,
103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117,
118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132,
133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147,
148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162,
163, 164, 165, 166, 167, 168, 169, 170,...
scala> val bigger = big.map(e => -e)
bigger: scala.collection.immutable.IndexedSeq[Int] = Vector(-1, -2, -3, -4,
-5, -6, -7, -8, -9, -10, -11, -12, -13, -14, -15, -16, -17, -18, -19, -20,
-21, -22, -23, -24, -25, -26, -27, -28, -29, -30, -31, -32, -33, -34, -35,
-36, -37, -38, -39, -40, -41, -42, -43, -44, -45, -46, -47, -48, -49, -50,
-51, -52, -53, -54, -55, -56, -57, -58, -59, -60, -61, -62, -63, -64, -65,
-66, -67, -68, -69, -70, -71, -72, -73, -74, -75, -76, -77, -78, -79, -80,
-81, -82, -83, -84, -85, -86, -87, -88, -89, -90, -91, -92, -93, -94, -95,
-96, -97, -98, -99, -100, -101, -102, -103, -104, -105, -106, -107, -108,
-109, -110, -111, -112, -113, -114, -115, -116, -117, -118, -119, -120, -121,
-122, -123, -124, -125, -126, -127, -128, -129, -130, -131, -132, -133, -134,
-135, -136, -137, -138, -139, -140, -141,...
scala> val bigpar = big.par
bigpar: scala.collection.parallel.immutable.ParRange = ParRange(1, 2, 3, 4, 5,
6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25,
26, 251, 27, 252, 28, 253, 29, 30, null, null, null, null, null, null, null,
null, null, null, null, null, null, null, null, null, null, null, null, null,
null, null, null, null, null, null, null, null, null, null, null, 511, 261,
46, 512, 47, 513, 262, 48, 514, 49, 515, 50, 263, 51, 516, 52, 264, 53, 517,
54, 518, 55, 265, 520, 56, 521, 57, 522, 58, 523, 59, 524, 60, 525, 266, 61,
526, 62, 527, 267, 528, 529, 530, 268, 531, 63, 269, 532, 533, 64, 270, 534,
535, 65, 536, 271, 537, 66, 538, 272, 539, 67, 540, 273, 541, 68, 542, 274,
543, 69, 544, 545, 275, 546, 547, 70, 548, 276, 549, 71, 126, 550, 277, 551,
72, 552, 127, 278, 553, 73,...
------------------
needless to say that I cannot reproduce this faithfully... most of the time it
looks unsuspicious. But as I saw it once I retried several times - this was
the second time I got this - directly after the start of the REPL session (the
transcript is unaltered)

I'm not sure if this qualifies as bug, as I was not able to get a hold on the
"nulls" during range processing. So it's maybe only a REPL print thingy.

Greetings
Bernd

Thu, 2011-03-31, 22:07

#1

H-star Development

Joined: 2010-04-14,

Re: REPL: null-values in parallel ranges?

what happens if you do this inside a scala program and use tostring?

Am 31.03.2011 22:42, schrieb Bernd Johannes:
> Hi scalaists
>
> this evening I started playing a little with the parallel collections and this
> crossed my way...
> This is most likely only a REPL display effect - but it felt disturbing
> somehow (thinking about races and uninitialized values):
>
> -----------------
> bjo@noordx:~/Tools/scala-2.9.0.RC1> bin/scala -J-Xmx3000m
> Welcome to Scala version 2.9.0.RC1 (Java HotSpot(TM) 64-Bit Server VM, Java
> 1.6.0_24).
> Type in expressions to have them evaluated.
> Type :help for more information.
>
> scala> val big = (1 to 10000000)
> big: scala.collection.immutable.Range.Inclusive = Range(1, 2, 3, 4, 5, 6, 7,
> 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27,
> 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46,
> 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65,
> 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84,
> 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102,
> 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117,
> 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132,
> 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147,
> 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162,
> 163, 164, 165, 166, 167, 168, 169, 170,...
> scala> val bigger = big.map(e => -e)
> bigger: scala.collection.immutable.IndexedSeq[Int] = Vector(-1, -2, -3, -4,
> -5, -6, -7, -8, -9, -10, -11, -12, -13, -14, -15, -16, -17, -18, -19, -20,
> -21, -22, -23, -24, -25, -26, -27, -28, -29, -30, -31, -32, -33, -34, -35,
> -36, -37, -38, -39, -40, -41, -42, -43, -44, -45, -46, -47, -48, -49, -50,
> -51, -52, -53, -54, -55, -56, -57, -58, -59, -60, -61, -62, -63, -64, -65,
> -66, -67, -68, -69, -70, -71, -72, -73, -74, -75, -76, -77, -78, -79, -80,
> -81, -82, -83, -84, -85, -86, -87, -88, -89, -90, -91, -92, -93, -94, -95,
> -96, -97, -98, -99, -100, -101, -102, -103, -104, -105, -106, -107, -108,
> -109, -110, -111, -112, -113, -114, -115, -116, -117, -118, -119, -120, -121,
> -122, -123, -124, -125, -126, -127, -128, -129, -130, -131, -132, -133, -134,
> -135, -136, -137, -138, -139, -140, -141,...
> scala> val bigpar = big.par
> bigpar: scala.collection.parallel.immutable.ParRange = ParRange(1, 2, 3, 4, 5,
> 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25,
> 26, 251, 27, 252, 28, 253, 29, 30, null, null, null, null, null, null, null,
> null, null, null, null, null, null, null, null, null, null, null, null, null,
> null, null, null, null, null, null, null, null, null, null, null, 511, 261,
> 46, 512, 47, 513, 262, 48, 514, 49, 515, 50, 263, 51, 516, 52, 264, 53, 517,
> 54, 518, 55, 265, 520, 56, 521, 57, 522, 58, 523, 59, 524, 60, 525, 266, 61,
> 526, 62, 527, 267, 528, 529, 530, 268, 531, 63, 269, 532, 533, 64, 270, 534,
> 535, 65, 536, 271, 537, 66, 538, 272, 539, 67, 540, 273, 541, 68, 542, 274,
> 543, 69, 544, 545, 275, 546, 547, 70, 548, 276, 549, 71, 126, 550, 277, 551,
> 72, 552, 127, 278, 553, 73,...
> ------------------
> needless to say that I cannot reproduce this faithfully... most of the time it
> looks unsuspicious. But as I saw it once I retried several times - this was
> the second time I got this - directly after the start of the REPL session (the
> transcript is unaltered)
>
> I'm not sure if this qualifies as bug, as I was not able to get a hold on the
> "nulls" during range processing. So it's maybe only a REPL print thingy.
>
> Greetings
> Bernd
>
>

Thu, 2011-03-31, 22:37

#2

Bernd Johannes

Joined: 2011-01-28,

Re: REPL: null-values in parallel ranges?

Am Donnerstag, 31. März 2011, 22:56:25 schrieb HamsterofDeath:
> what happens if you do this inside a scala program and use tostring?
>

That's the first thing I tried: test it inside a function in the REPL - and as
I said: I wasn't able to provoke a NPE or something alike in 1000 loops on 6
cores (each loop processed a range from 0 .. 10000000 with map and a ParRange
with map and foreach with toString)

And I am sure all of this has already seen severe testing from the scala team.
I suppose this to be a special behaviour from the REPL printout...

Greetings
Bernd

> Am 31.03.2011 22:42, schrieb Bernd Johannes:
> > Hi scalaists
> >
> > this evening I started playing a little with the parallel collections and
> > this crossed my way...
> > This is most likely only a REPL display effect - but it felt disturbing
> > somehow (thinking about races and uninitialized values):
> >
> > -----------------
> > bjo@noordx:~/Tools/scala-2.9.0.RC1> bin/scala -J-Xmx3000m
> > Welcome to Scala version 2.9.0.RC1 (Java HotSpot(TM) 64-Bit Server VM,
> > Java 1.6.0_24).
> > Type in expressions to have them evaluated.
> > Type :help for more information.
> >
> > scala> val big = (1 to 10000000)
> > big: scala.collection.immutable.Range.Inclusive = Range(1, 2, 3, 4, 5, 6,
> > 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25,
> > 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43,
...
> > 169, 170,...
> > scala> val bigger = big.map(e => -e)
> > bigger: scala.collection.immutable.IndexedSeq[Int] = Vector(-1, -2, -3,
> > -4, -5, -6, -7, -8, -9, -10, -11, -12, -13, -14, -15, -16, -17, -18,
...
> > -139, -140, -141,...
> > scala> val bigpar = big.par
> > bigpar: scala.collection.parallel.immutable.ParRange = ParRange(1, 2, 3,
> > 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22,
> > 23, 24, 25, 26, 251, 27, 252, 28, 253, 29, 30, null, null, null, null,
> > null, null, null, null, null, null, null, null, null, null, null, null,
> > null, null, null, null, null, null, null, null, null, null, null, null,
> > null, null, null, 511, 261, 46, 512, 47, 513, 262, 48, 514, 49, 515, 50,
> > 263, 51, 516, 52, 264, 53, 517, 54, 518, 55, 265, 520, 56, 521, 57, 522,
...
> > 553, 73,...
> > ------------------
> > needless to say that I cannot reproduce this faithfully... most of the
> > time it looks unsuspicious. But as I saw it once I retried several times
> > - this was the second time I got this - directly after the start of the
> > REPL session (the transcript is unaltered)
> >
> > I'm not sure if this qualifies as bug, as I was not able to get a hold on
> > the "nulls" during range processing. So it's maybe only a REPL print
> > thingy.
> >
> > Greetings
> > Bernd

Thu, 2011-03-31, 23:07

#3

extempore

Joined: 2008-12-17,

Re: REPL: null-values in parallel ranges?

On 3/31/11 1:42 PM, Bernd Johannes wrote:
> I'm not sure if this qualifies as bug, as I was not able to get a hold on the
> "nulls" during range processing. So it's maybe only a REPL print thingy.

It isn't taking me long trying .par in the repl to see all kinds of
crazy stuff. The main thing is that foreach seems to return things in a
random order and iterator doesn't. That doesn't sound like something
intentional, but I try not to assume too much. But it is likely at the
heart of what you are seeing too.

scala> def fx = (1 to 5).par
fx: scala.collection.parallel.immutable.ParRange

scala> fx foreach println
1
3
4
2
5

scala> fx foreach println
2
3
1
5
4

scala> fx.iterator foreach println
1
2
3
4
5

scala> fx.iterator foreach println
1
2
3
4
5

Thu, 2011-03-31, 23:17

#4

Ismael Juma 2

Joined: 2011-01-22,

Re: REPL: null-values in parallel ranges?

On Thu, Mar 31, 2011 at 11:03 PM, Paul Phillips wrote:
> It isn't taking me long trying .par in the repl to see all kinds of crazy
> stuff. The main thing is that foreach seems to return things in a random
> order and iterator doesn't. That doesn't sound like something intentional,
> but I try not to assume too much. But it is likely at the heart of what you
> are seeing too.

I came across this today too while playing with the REPL and intended
to investigate it before filing a bug (if it was indeed a bug). Seems
like you've already found the root cause of the issue. Good. :)

Best,
Ismael

Thu, 2011-03-31, 23:27

#5

extempore

Joined: 2008-12-17,

Re: REPL: null-values in parallel ranges?

Well, I guess it is intentional. This has rather severe consequences
for something like a pre-existing method which is not written expecting
that things implementing Seq[T] may now use a random order for foreach.
Maybe I was naive all this time thinking that the reproducible quality
of foreach on a sequence was something I could count on, and I should
have been calling "iterator" all the time. Either way, there's a lot of
code out there.

Thu, 2011-03-31, 23:27

#6

Ismael Juma 2

Joined: 2011-01-22,

Re: REPL: null-values in parallel ranges?

On Thu, Mar 31, 2011 at 11:11 PM, Paul Phillips wrote:
> Well, I guess it is intentional. This has rather severe consequences for
> something like a pre-existing method which is not written expecting that
> things implementing Seq[T] may now use a random order for foreach. Maybe I
> was naive all this time thinking that the reproducible quality of foreach on
> a sequence was something I could count on, and I should have been calling
> "iterator" all the time. Either way, there's a lot of code out there.

Oh I see. The foreach is executed in parallel. I had noticed this
elsewhere, but I didn't connect the dots that it was the reason for
the weird output of the REPL after a seq.par.map(identity). Like you,
I am concerned about the implications!

Best,
Ismael

Thu, 2011-03-31, 23:37

#7

extempore

Joined: 2008-12-17,

Re: REPL: null-values in parallel ranges?

On 3/31/11 3:09 PM, Ismael Juma wrote:
> I came across this today too while playing with the REPL and intended
> to investigate it before filing a bug (if it was indeed a bug). Seems
> like you've already found the root cause of the issue. Good. :)

It's fixed in trunk now. I'm trying not to be nervous about what other
code I've written which might be vulnerable to surprises here.

Fri, 2011-04-01, 06:57

#8

Bernd Johannes

Joined: 2011-01-28,

Re: REPL: null-values in parallel ranges?

Am Freitag, 1. April 2011, 00:22:28 schrieb Ismael Juma:
> On Thu, Mar 31, 2011 at 11:11 PM, Paul Phillips wrote:
> > Well, I guess it is intentional. This has rather severe consequences for
> > something like a pre-existing method which is not written expecting that
> > things implementing Seq[T] may now use a random order for foreach. Maybe
> > I was naive all this time thinking that the reproducible quality of
> > foreach on a sequence was something I could count on, and I should have
> > been calling "iterator" all the time. Either way, there's a lot of code
> > out there.
>
> Oh I see. The foreach is executed in parallel. I had noticed this
> elsewhere, but I didn't connect the dots that it was the reason for
> the weird output of the REPL after a seq.par.map(identity). Like you,
> I am concerned about the implications!
>
> Best,
> Ismael

Maybe this should be clarified in the documentation - either Seq foreaches in
the order of the sequence and thus cannot be parallelized (that is obvious) or
it doesn't and can act in parallel.

But it's true - an unordered foreach on Seq is not intuitive. That's a
conflict that has to be resolved - best before the 2.9 final.

When backward compatibility and "transparency of parallel actions" is the main
focus this would call for a strictly serial execution of foreach() as it has
been that way in the past.
Then something like "unorderedforeach()" [ugly I know - but at least not
surprising] would be the option (and a review of the scala library internals -
which foeach call can be savely promoted to unorderedforeach and which one
can't).

Or scala could ressort to a bunch of behaviour / compatibility properties with
something like "AllowUnorderedForeach = true / false". But given that those
properties are global in nature it's certainly not the best way to go.

Who claims that parallel stuff is easy? ;-)
Greetings
Bernd

Fri, 2011-04-01, 07:07

#9

bmaso

Joined: 2009-10-04,

Re: REPL: null-values in parallel ranges?

On Thu, Mar 31, 2011 at 10:48 PM, Bernd Johannes <bjohanns [at] bacon [dot] de> wrote:

Am Freitag, 1. April 2011, 00:22:28 schrieb Ismael Juma:
> On Thu, Mar 31, 2011 at 11:11 PM, Paul Phillips <paulp [at] improving [dot] org> wrote:
> > Well, I guess it is intentional. This has rather severe consequences for
> > something like a pre-existing method which is not written expecting that
> > things implementing Seq[T] may now use a random order for foreach. Maybe
> > I was naive all this time thinking that the reproducible quality of
> > foreach on a sequence was something I could count on, and I should have
> > been calling "iterator" all the time. Either way, there's a lot of code
> > out there.
>
> Oh I see. The foreach is executed in parallel. I had noticed this
> elsewhere, but I didn't connect the dots that it was the reason for
> the weird output of the REPL after a seq.par.map(identity). Like you,
> I am concerned about the implications!
>
> Best,
> Ismael

Maybe this should be clarified in the documentation - either Seq foreaches in
the order of the sequence and thus cannot be parallelized (that is obvious) or
it doesn't and can act in parallel.

But it's true - an unordered foreach on Seq is not intuitive. That's a
conflict that has to be resolved - best before the 2.9 final.

When backward compatibility and "transparency of parallel actions" is the main
focus this would call for a strictly serial execution of foreach() as it has
been that way in the past.
Then something like "unorderedforeach()" [ugly I know - but at least not
surprising] would be the option (and a review of the scala library internals -
which foeach call can be savely promoted to unorderedforeach and which one
can't).

"forevery()"?

Or scala could ressort to a bunch of behaviour / compatibility properties with
something like "AllowUnorderedForeach = true / false". But given that those
properties are global in nature it's certainly not the best way to go.

Who claims that parallel stuff is easy? ;-)
Greetings
Bernd

--
Best regards,
Brian Maso
(949) 395-8551
brian [at] blumenfeld-maso [dot] com

Fri, 2011-04-01, 12:37

#10

Matthew Pocock 3

Joined: 2010-07-30,

Re: REPL: null-values in parallel ranges?

Could foreach take an implicit 'can be out of order' hint? I expect that usually these foreach calls should be broken into a parallel map followed by a sequential foreach that only processes the side effect. However, there is a lot of existing code.

On 1 Apr 2011 07:04, "Brian Maso" <brian [at] blumenfeld-maso [dot] com> wrote:

On Thu, Mar 31, 2011 at 10:48 PM, Bernd Johannes <bjohanns [at] bacon [dot] de> wrote:
>
> Am Freitag, 1. April...

"forevery()"?

>
>
> Or scala could ressort to a bunch of behaviour / compatibility properties with
> something lik...

--
Best regards,
Brian Maso
(949) 395-8551
brian [at] blumenfeld-maso [dot] com

Fri, 2011-04-01, 22:17

#11

dcsobral

Joined: 2009-04-23,

Re: REPL: null-values in parallel ranges?

I don't understand what's the trouble here. You said you wanted "out of order" the instance you called ".par" on your collection. If your code depends on things being sequential, don't pass parallel collections to it.

On Fri, Apr 1, 2011 at 08:26, Matthew Pocock <turingatemyhamster [at] gmail [dot] com> wrote:

Could foreach take an implicit 'can be out of order' hint? I expect that usually these foreach calls should be broken into a parallel map followed by a sequential foreach that only processes the side effect. However, there is a lot of existing code.

On 1 Apr 2011 07:04, "Brian Maso" <brian [at] blumenfeld-maso [dot] com> wrote:

On Thu, Mar 31, 2011 at 10:48 PM, Bernd Johannes <bjohanns [at] bacon [dot] de> wrote:
>
> Am Freitag, 1. April...

"forevery()"?

>
>
> Or scala could ressort to a bunch of behaviour / compatibility properties with
> something lik...

--
Best regards,
Brian Maso
(949) 395-8551
brian [at] blumenfeld-maso [dot] com

--
Daniel C. Sobral

I travel to the future all the time.

Fri, 2011-04-01, 22:47

#12

extempore

Joined: 2008-12-17,

Re: REPL: null-values in parallel ranges?

On 4/1/11 2:14 PM, Daniel Sobral wrote:
> I don't understand what's the trouble here.

Thread is still there.

> You said you wanted "out of
> order" the instance you called ".par" on your collection. If your code
> depends on things being sequential, don't pass parallel collections to it.

If you "want" out of order then why do the parallel collections bother
having iterator preserve the order? I thought I just asked for out of
order, why is it wasting all that energy?

More relevantly, nobody passed a parallel collection to anybody's code
unless "guy from the past" counts. (Keep in mind that "guy from the
past" committed this transgression *before the parallel collections
existed*.) A pre-existing method written to operate on generic
traversables was passively called by another pre-existing method. The
person who called ".par" just called ".par". It seems unreasonable to
require omniscience and time travel powers from the users.

Fri, 2011-04-01, 23:17

#13

Alex Cruise

Joined: 2008-12-17,

Re: REPL: null-values in parallel ranges?

I think a lot of people have a mental model of parallel collections as "just like sequential, only hopefully faster." Whether this is similar to the intent of the designers of 2.9's parallel collections is an interesting question.
I think it's safe to say that most people would expect the result of xs.map(f) to be indistinguishable from xs.par.map(f), and it would be surprising if this were not the case in 2.9. In the absence of side effects, we don't particularly care (or, indeed, have any way to know) about the order in which f is applied to x's, as long as the sausage that comes out the other end is tasty.
But when side effects are the whole point, I think many people really will care about the order in which the actions are taken, so for whatever it's worth, I think I'll go along with the "keep foreach sequential, and come up something new that works like foreach but makes you sign an explicit waiver that you won't get upset when things happen out of order" camp. At least until convinced otherwise. :)
-0xe1a

Sat, 2011-04-02, 00:37

#14

dcsobral

Joined: 2009-04-23,

Re: REPL: null-values in parallel ranges?

On Fri, Apr 1, 2011 at 18:45, Paul Phillips <paulp [at] improving [dot] org> wrote:

On 4/1/11 2:14 PM, Daniel Sobral wrote:

I don't understand what's the trouble here.

Thread is still there.

You said you wanted "out of
order" the instance you called ".par" on your collection. If your code
depends on things being sequential, don't pass parallel collections to it.

If you "want" out of order then why do the parallel collections bother having iterator preserve the order? I thought I just asked for out of order, why is it wasting all that energy?

More relevantly, nobody passed a parallel collection to anybody's code unless "guy from the past" counts. (Keep in mind that "guy from the past" committed this transgression *before the parallel collections existed*.) A pre-existing method written to operate on generic traversables was passively called by another pre-existing method. The person who called ".par" just called ".par". It seems unreasonable to require omniscience and time travel powers from the users.

The REPL user did, in a sense. Regarding r24658, I think the "fix", as a general rule, should have been a special treatment for parallel collections, instead of changing the code to special case around parallel collections: if ParIterable, special case it; otherwise, act as normal.

--
Daniel C. Sobral

I travel to the future all the time.

Sat, 2011-04-02, 02:27

#15

Seth Tisue

Joined: 2008-12-16,

Re: REPL: null-values in parallel ranges?

>>>>> "Daniel" == Daniel Sobral writes:

Daniel> I don't understand what's the trouble here. You said you wanted
Daniel> "out of order" the instance you called ".par" on your
Daniel> collection. If your code depends on things being sequential,
Daniel> don't pass parallel collections to it.

Barbara Liskov called for you. She seems angry! I think you better
call her back.

Sat, 2011-04-02, 09:27

#16

Bernd Johannes

Joined: 2011-01-28,

Re: REPL: null-values in parallel ranges?

Am Samstag, 2. April 2011, 00:13:39 schrieb Alex Cruise:
> I think a lot of people have a mental model of parallel collections as
> "just like sequential, only hopefully faster." Whether this is similar to
> the intent of the designers of 2.9's parallel collections is an
> interesting question.

My first concern goes towards documentation: if foreach on Seq in a parallel
context is going to be evaluated out of order (i.e. parallelized) there should
be a reminder in the documentation for Seq.foreach(). I.e. if somebody is
going to write something which accepts a Seq and is going to execute side
effects on this Seq by means of foreach has to cope with out of order
execution because the Seq might be a parallelized Seq.

> But when side effects are the whole point, I think many people really will
> care about the order in which the actions are taken, so for whatever it's
> worth, I think I'll go along with the "keep foreach sequential, and come up
> something new that works like foreach but makes you sign an explicit
> waiver that you won't get upset when things happen out of order" camp. At
> least until convinced otherwise. :)

I support this. In my experience (as a member of regression test camps) many
errors in production code can be traced back to unexpected "out of order"
execution or unexpected sort odering of things. It's amacing how often I have
seen code break on the expectation that a "select * from whatever" imposes
some order naturally. And its even more amacing how long this error can go
unnoticed... Had the inventor of SQL specified that "select" always requires a
"order by" whereas "unorderedselect" takes no order at all we would have been
spared from some of the most severe errors.

With this experience in the past I recommend to keep the "ordered" behaviour
of foreach() because breaking this will introduce a whole bunch of errors in
parallel context. Errors for which scala is not to be blamed... but it's
terribly error prune to switch from guaranteed ordered / sequencial to
unordered behaviour in parallel context.

I would even go as far as to recommend an artifical "unordered" behaviour of
the sequencial foreach if the scala team decides to keep the parallel
behaviour.

Greetings
Bernd

Sat, 2011-04-02, 10:57

#17

Tony Morris 2

Joined: 2009-03-20,

Re: REPL: null-values in parallel ranges?

On 02/04/11 18:24, Bernd Johannes wrote:
> Am Samstag, 2. April 2011, 00:13:39 schrieb Alex Cruise:
>> I think a lot of people have a mental model of parallel collections as
>> "just like sequential, only hopefully faster." Whether this is similar to
>> the intent of the designers of 2.9's parallel collections is an
>> interesting question.
> My first concern goes towards documentation: if foreach on Seq in a parallel
> context is going to be evaluated out of order (i.e. parallelized) there should
> be a reminder in the documentation for Seq.foreach(). I.e. if somebody is
> going to write something which accepts a Seq and is going to execute side
> effects on this Seq by means of foreach has to cope with out of order
> execution because the Seq might be a parallelized Seq.
>
>> But when side effects are the whole point, I think many people really will
>> care about the order in which the actions are taken, so for whatever it's
>> worth, I think I'll go along with the "keep foreach sequential, and come up
>> something new that works like foreach but makes you sign an explicit
>> waiver that you won't get upset when things happen out of order" camp. At
>> least until convinced otherwise. :)
When side-effects are the "whole point" (which is far, far, far, far,
far, far rarer than is often alleged), it would be wise to stop the
silly buggers and admit that you have just lost parallelisation.

Sat, 2011-04-02, 12:17

#18

H-star Development

Joined: 2010-04-14,

Re: REPL: null-values in parallel ranges?

if you need a sequencial foreach, make your parallel collection a
sequential one. according to the documentation, this conversion is
basically free.
for side effect free code, the order doesn't matter at all. i don't see
a big problem here.

Am 02.04.2011 10:24, schrieb Bernd Johannes:
> Am Samstag, 2. April 2011, 00:13:39 schrieb Alex Cruise:
>> I think a lot of people have a mental model of parallel collections as
>> "just like sequential, only hopefully faster." Whether this is similar to
>> the intent of the designers of 2.9's parallel collections is an
>> interesting question.
> My first concern goes towards documentation: if foreach on Seq in a parallel
> context is going to be evaluated out of order (i.e. parallelized) there should
> be a reminder in the documentation for Seq.foreach(). I.e. if somebody is
> going to write something which accepts a Seq and is going to execute side
> effects on this Seq by means of foreach has to cope with out of order
> execution because the Seq might be a parallelized Seq.
>
>> But when side effects are the whole point, I think many people really will
>> care about the order in which the actions are taken, so for whatever it's
>> worth, I think I'll go along with the "keep foreach sequential, and come up
>> something new that works like foreach but makes you sign an explicit
>> waiver that you won't get upset when things happen out of order" camp. At
>> least until convinced otherwise. :)
> I support this. In my experience (as a member of regression test camps) many
> errors in production code can be traced back to unexpected "out of order"
> execution or unexpected sort odering of things. It's amacing how often I have
> seen code break on the expectation that a "select * from whatever" imposes
> some order naturally. And its even more amacing how long this error can go
> unnoticed... Had the inventor of SQL specified that "select" always requires a
> "order by" whereas "unorderedselect" takes no order at all we would have been
> spared from some of the most severe errors.
>
> With this experience in the past I recommend to keep the "ordered" behaviour
> of foreach() because breaking this will introduce a whole bunch of errors in
> parallel context. Errors for which scala is not to be blamed... but it's
> terribly error prune to switch from guaranteed ordered / sequencial to
> unordered behaviour in parallel context.
>
> I would even go as far as to recommend an artifical "unordered" behaviour of
> the sequencial foreach if the scala team decides to keep the parallel
> behaviour.
>
> Greetings
> Bernd
>
>

Sat, 2011-04-02, 14:37

#19

Bernd Johannes

Joined: 2011-01-28,

Re: REPL: null-values in parallel ranges - Seq.foreach acting u

Am Samstag, 2. April 2011, 13:14:11 schrieb HamsterofDeath:
> if you need a sequencial foreach, make your parallel collection a
> sequential one. according to the documentation, this conversion is
> basically free.
> for side effect free code, the order doesn't matter at all. i don't see
> a big problem here.

It's true - it's just about operations where the order of sequencial traversal
matters (either globally or locally). That's why I don't see big issues with
map() and the like.

Maybe I am naiv - or the average coding quality has improved a lot beyond that
what I used to see during my testing years (that's been a while in the past
from now).
With the obvious and simple example below I want to illustrate where I fear
that parallelized foreach will wreak havoc. It's not obvious during
compilation - maybe it goes unnoticed on small test systems... but it will
fail in production environment at some point in time.

And that's what I am concerned about. It's not obvious, not fast failing and
may be introduced almost "silently" many many lines of code away...

Just my 5 cents...
Greetings
Bernd

scala> def writeResult(res: scala.collection.immutable.Seq[String]): Unit = {
| // normally does things like derive filename, cleanup old versions
| // opens streams, error handling and the like and then writes
| // the content
| res.foreach((s) => println(s))
| }
writeResult: (res: scala.collection.immutable.Seq[String])Unit

// just a make up result (and almost always in result files order matters)
scala> val a = List("a", "b", "c", "d", "e")
a: List[java.lang.String] = List(a, b, c, d, e)

// that one is ok
scala> writeResult(a)
a
b
c
d
e

// that one might be introduced because the Seq is a bigger one
// and there are many cores awaiting work...
scala> val pa = a.par
pa: scala.collection.parallel.immutable.ParSeq[java.lang.String] =
ParVector(a, b, c, d, e)

// after some processing the Seq is to be saved...
scala> writeResult(pa)
a
b
d
e
c

// an unexpected behaviour that spreads from the point of par
// introduction into
// formerly proper working pieces of code which silently assumed that
// foreach is acting faithfully on the sequence of the Seq

Sat, 2011-04-02, 15:07

#20

H-star Development

Joined: 2010-04-14,

Re: REPL: null-values in parallel ranges - Seq.foreach acting u

if the order matters, you cannot use multiple threads. every
multithreading coding knows that. or am i wrong?

Am 02.04.2011 15:34, schrieb Bernd Johannes:
> Am Samstag, 2. April 2011, 13:14:11 schrieb HamsterofDeath:
>> if you need a sequencial foreach, make your parallel collection a
>> sequential one. according to the documentation, this conversion is
>> basically free.
>> for side effect free code, the order doesn't matter at all. i don't see
>> a big problem here.
> It's true - it's just about operations where the order of sequencial traversal
> matters (either globally or locally). That's why I don't see big issues with
> map() and the like.
>
> Maybe I am naiv - or the average coding quality has improved a lot beyond that
> what I used to see during my testing years (that's been a while in the past
> from now).
> With the obvious and simple example below I want to illustrate where I fear
> that parallelized foreach will wreak havoc. It's not obvious during
> compilation - maybe it goes unnoticed on small test systems... but it will
> fail in production environment at some point in time.
>
> And that's what I am concerned about. It's not obvious, not fast failing and
> may be introduced almost "silently" many many lines of code away...
>
> Just my 5 cents...
> Greetings
> Bernd
>
> scala> def writeResult(res: scala.collection.immutable.Seq[String]): Unit = {
> | // normally does things like derive filename, cleanup old versions
> | // opens streams, error handling and the like and then writes
> | // the content
> | res.foreach((s) => println(s))
> | }
> writeResult: (res: scala.collection.immutable.Seq[String])Unit
>
> // just a make up result (and almost always in result files order matters)
> scala> val a = List("a", "b", "c", "d", "e")
> a: List[java.lang.String] = List(a, b, c, d, e)
>
> // that one is ok
> scala> writeResult(a)
> a
> b
> c
> d
> e
>
> // that one might be introduced because the Seq is a bigger one
> // and there are many cores awaiting work...
> scala> val pa = a.par
> pa: scala.collection.parallel.immutable.ParSeq[java.lang.String] =
> ParVector(a, b, c, d, e)
>
> // after some processing the Seq is to be saved...
> scala> writeResult(pa)
> a
> b
> d
> e
> c
>
> // an unexpected behaviour that spreads from the point of par
> // introduction into
> // formerly proper working pieces of code which silently assumed that
> // foreach is acting faithfully on the sequence of the Seq
>

Sat, 2011-04-02, 15:27

#21

roland.kuhn

Joined: 2011-02-21,

Re: REPL: null-values in parallel ranges - Seq.foreach acting u

… from which follows that parallel collections cannot be subtypes of Seq.

It has been said several times and is obviously true, so why not acknowledge the fact?

(pesky old me)

On Apr 2, 2011, at 16:00, HamsterofDeath wrote:

> if the order matters, you cannot use multiple threads. every
> multithreading coding knows that. or am i wrong?
>
> Am 02.04.2011 15:34, schrieb Bernd Johannes:
>> Am Samstag, 2. April 2011, 13:14:11 schrieb HamsterofDeath:
>>> if you need a sequencial foreach, make your parallel collection a
>>> sequential one. according to the documentation, this conversion is
>>> basically free.
>>> for side effect free code, the order doesn't matter at all. i don't see
>>> a big problem here.
>> It's true - it's just about operations where the order of sequencial traversal
>> matters (either globally or locally). That's why I don't see big issues with
>> map() and the like.
>>
>> Maybe I am naiv - or the average coding quality has improved a lot beyond that
>> what I used to see during my testing years (that's been a while in the past
>> from now).
>> With the obvious and simple example below I want to illustrate where I fear
>> that parallelized foreach will wreak havoc. It's not obvious during
>> compilation - maybe it goes unnoticed on small test systems... but it will
>> fail in production environment at some point in time.
>>
>> And that's what I am concerned about. It's not obvious, not fast failing and
>> may be introduced almost "silently" many many lines of code away...
>>
>> Just my 5 cents...
>> Greetings
>> Bernd
>>
>> scala> def writeResult(res: scala.collection.immutable.Seq[String]): Unit = {
>> | // normally does things like derive filename, cleanup old versions
>> | // opens streams, error handling and the like and then writes
>> | // the content
>> | res.foreach((s) => println(s))
>> | }
>> writeResult: (res: scala.collection.immutable.Seq[String])Unit
>>
>> // just a make up result (and almost always in result files order matters)
>> scala> val a = List("a", "b", "c", "d", "e")
>> a: List[java.lang.String] = List(a, b, c, d, e)
>>
>> // that one is ok
>> scala> writeResult(a)
>> a
>> b
>> c
>> d
>> e
>>
>> // that one might be introduced because the Seq is a bigger one
>> // and there are many cores awaiting work...
>> scala> val pa = a.par
>> pa: scala.collection.parallel.immutable.ParSeq[java.lang.String] =
>> ParVector(a, b, c, d, e)
>>
>> // after some processing the Seq is to be saved...
>> scala> writeResult(pa)
>> a
>> b
>> d
>> e
>> c
>>
>> // an unexpected behaviour that spreads from the point of par
>> // introduction into
>> // formerly proper working pieces of code which silently assumed that
>> // foreach is acting faithfully on the sequence of the Seq
>>
>

Sat, 2011-04-02, 16:07

#22

H-star Development

Joined: 2010-04-14,

Re: REPL: null-values in parallel ranges - Seq.foreach acting u

a parallel sequence can replace a sequencial sequence (feels weird
writing that) for every method except foreach. maybe there should be an
ordered-foreach-able trait and a foreachable-random-order-trait? then it
could be added to any collection that can do it.

Am 02.04.2011 16:28, schrieb Roland Kuhn:
> … from which follows that parallel collections cannot be subtypes of Seq.
>
> It has been said several times and is obviously true, so why not acknowledge the fact?
>
> (pesky old me)
>
> On Apr 2, 2011, at 16:00, HamsterofDeath wrote:
>
>> if the order matters, you cannot use multiple threads. every
>> multithreading coding knows that. or am i wrong?
>>
>> Am 02.04.2011 15:34, schrieb Bernd Johannes:
>>> Am Samstag, 2. April 2011, 13:14:11 schrieb HamsterofDeath:
>>>> if you need a sequencial foreach, make your parallel collection a
>>>> sequential one. according to the documentation, this conversion is
>>>> basically free.
>>>> for side effect free code, the order doesn't matter at all. i don't see
>>>> a big problem here.
>>> It's true - it's just about operations where the order of sequencial traversal
>>> matters (either globally or locally). That's why I don't see big issues with
>>> map() and the like.
>>>
>>> Maybe I am naiv - or the average coding quality has improved a lot beyond that
>>> what I used to see during my testing years (that's been a while in the past
>>> from now).
>>> With the obvious and simple example below I want to illustrate where I fear
>>> that parallelized foreach will wreak havoc. It's not obvious during
>>> compilation - maybe it goes unnoticed on small test systems... but it will
>>> fail in production environment at some point in time.
>>>
>>> And that's what I am concerned about. It's not obvious, not fast failing and
>>> may be introduced almost "silently" many many lines of code away...
>>>
>>> Just my 5 cents...
>>> Greetings
>>> Bernd
>>>
>>> scala> def writeResult(res: scala.collection.immutable.Seq[String]): Unit = {
>>> | // normally does things like derive filename, cleanup old versions
>>> | // opens streams, error handling and the like and then writes
>>> | // the content
>>> | res.foreach((s) => println(s))
>>> | }
>>> writeResult: (res: scala.collection.immutable.Seq[String])Unit
>>>
>>> // just a make up result (and almost always in result files order matters)
>>> scala> val a = List("a", "b", "c", "d", "e")
>>> a: List[java.lang.String] = List(a, b, c, d, e)
>>>
>>> // that one is ok
>>> scala> writeResult(a)
>>> a
>>> b
>>> c
>>> d
>>> e
>>>
>>> // that one might be introduced because the Seq is a bigger one
>>> // and there are many cores awaiting work...
>>> scala> val pa = a.par
>>> pa: scala.collection.parallel.immutable.ParSeq[java.lang.String] =
>>> ParVector(a, b, c, d, e)
>>>
>>> // after some processing the Seq is to be saved...
>>> scala> writeResult(pa)
>>> a
>>> b
>>> d
>>> e
>>> c
>>>
>>> // an unexpected behaviour that spreads from the point of par
>>> // introduction into
>>> // formerly proper working pieces of code which silently assumed that
>>> // foreach is acting faithfully on the sequence of the Seq
>>>

Sat, 2011-04-02, 16:07

#23

dcsobral

Joined: 2009-04-23,

Re: REPL: null-values in parallel ranges?

On Sat, Apr 2, 2011 at 05:24, Bernd Johannes <bjohanns [at] bacon [dot] de> wrote:

With this experience in the past I recommend to keep the "ordered" behaviour
of foreach() because breaking this will introduce a whole bunch of errors in
parallel context. Errors for which scala is not to be blamed... but it's
terribly error prune to switch from guaranteed ordered / sequencial to
unordered behaviour in parallel context.

I would even go as far as to recommend an artifical "unordered" behaviour of
the sequencial foreach if the scala team decides to keep the parallel
behaviour.

In which case what will happen is that people will come complain that they changed their collection to parallel, but nothing happened! At which point we'll tell them, "No, it's not just changing the collection to parallel. You'll have to rewrite your whole code as well."
I someone uses a parallel collection, he _expects_ things to be parallel. Sure, they'll break code that worked for serial collection in new and interesting ways, but that is easily fixed: don't use parallel collections. You can even call .seq before sending a collection to that part of the code, and call .par on anything getting back.
--
Daniel C. Sobral

I travel to the future all the time.

Sat, 2011-04-02, 16:17

#24

dcsobral

Joined: 2009-04-23,

Re: REPL: null-values in parallel ranges?

On Fri, Apr 1, 2011 at 22:19, Seth Tisue <seth [at] tisue [dot] net> wrote:

>>>>> "Daniel" == Daniel Sobral <dcsobral [at] gmail [dot] com> writes:

Daniel> I don't understand what's the trouble here. You said you wanted
Daniel> "out of order" the instance you called ".par" on your
Daniel> collection. If your code depends on things being sequential,
Daniel> don't pass parallel collections to it.

Barbara Liskov called for you. She seems angry! I think you better
call her back.

Tell her to call me back when programmers start coding to a language's specification, instead of coding to what the compiler does.
Non-parallelism is presumed throughout Scala (and most languages) with few exceptions. The Liskov's Principle thing to do is completely separate parallel and sequential collections, or, otherwise, make sure ALL Iterable/Seq/Set/Map methods are sequential, and add a whole new set of methods for parallel operations.
So I see three choices:
1. Violate Liskov's principle.2. Make it so a parallel collection have different interfaces/class hierarchy than sequential ones. 3. Add parallel methods and make sure all others are sequential.
Do you see any other choice? I'd be happy to hear of it. Otherwise, what do _you_ think should be done?
--
Daniel C. Sobral

I travel to the future all the time.

Sat, 2011-04-02, 16:27

#25

dcsobral

Joined: 2009-04-23,

Re: REPL: null-values in parallel ranges - Seq.foreach acting u

On Sat, Apr 2, 2011 at 11:28, Roland Kuhn <google [at] rkuhn [dot] info> wrote:

… from which follows that parallel collections cannot be subtypes of Seq.

It has been said several times and is obviously true, so why not acknowledge the fact?

A sequence simply means one can expect to tell member keep position relative to each other. One _can_ have a sequence and make operations on it and still have these operations processed in parallel.
What I don't understand is why people want to remove this desirable property just so people can actually call ".par" on something and see nothing happen?
I'll grant that foreach, which cannot do anything without side-effects, is a prime candidate to be left sequential... aside from the unfortunate fact that all Traversable methods are based on it.

(pesky old me)

On Apr 2, 2011, at 16:00, HamsterofDeath <h-star [at] gmx [dot] de> wrote:

> if the order matters, you cannot use multiple threads. every
> multithreading coding knows that. or am i wrong?
>
> Am 02.04.2011 15:34, schrieb Bernd Johannes:
>> Am Samstag, 2. April 2011, 13:14:11 schrieb HamsterofDeath:
>>> if you need a sequencial foreach, make your parallel collection a
>>> sequential one. according to the documentation, this conversion is
>>> basically free.
>>> for side effect free code, the order doesn't matter at all. i don't see
>>> a big problem here.
>> It's true - it's just about operations where the order of sequencial traversal
>> matters (either globally or locally). That's why I don't see big issues with
>> map() and the like.
>>
>> Maybe I am naiv - or the average coding quality has improved a lot beyond that
>> what I used to see during my testing years (that's been a while in the past
>> from now).
>> With the obvious and simple example below I want to illustrate where I fear
>> that parallelized foreach will wreak havoc. It's not obvious during
>> compilation - maybe it goes unnoticed on small test systems... but it will
>> fail in production environment at some point in time.
>>
>> And that's what I am concerned about. It's not obvious, not fast failing and
>> may be introduced almost "silently" many many lines of code away...
>>
>> Just my 5 cents...
>> Greetings
>> Bernd
>>
>> scala> def writeResult(res: scala.collection.immutable.Seq[String]): Unit = {
>> | // normally does things like derive filename, cleanup old versions
>> | // opens streams, error handling and the like and then writes
>> | // the content
>> | res.foreach((s) => println(s))
>> | }
>> writeResult: (res: scala.collection.immutable.Seq[String])Unit
>>
>> // just a make up result (and almost always in result files order matters)
>> scala> val a = List("a", "b", "c", "d", "e")
>> a: List[java.lang.String] = List(a, b, c, d, e)
>>
>> // that one is ok
>> scala> writeResult(a)
>> a
>> b
>> c
>> d
>> e
>>
>> // that one might be introduced because the Seq is a bigger one
>> // and there are many cores awaiting work...
>> scala> val pa = a.par
>> pa: scala.collection.parallel.immutable.ParSeq[java.lang.String] =
>> ParVector(a, b, c, d, e)
>>
>> // after some processing the Seq is to be saved...
>> scala> writeResult(pa)
>> a
>> b
>> d
>> e
>> c
>>
>> // an unexpected behaviour that spreads from the point of par
>> // introduction into
>> // formerly proper working pieces of code which silently assumed that
>> // foreach is acting faithfully on the sequence of the Seq
>>
>

--
Daniel C. Sobral

I travel to the future all the time.

Sat, 2011-04-02, 16:37

#26

ichoran

Joined: 2009-08-14,

Re: REPL: null-values in parallel ranges?

On Fri, Apr 1, 2011 at 5:45 PM, Paul Phillips <paulp [at] improving [dot] org> wrote:

More relevantly, nobody passed a parallel collection to anybody's code unless "guy from the past" counts. (Keep in mind that "guy from the past" committed this transgression *before the parallel collections existed*.) A pre-existing method written to operate on generic traversables was passively called by another pre-existing method. The person who called ".par" just called ".par". It seems unreasonable to require omniscience and time travel powers from the users.

That's what unit tests are for, isn't it? If the par collections can't pass _every_ unit test for the regular collections then either
(1) the parallel collections need to be fixed, or
(2) the unit tests need to be fixed, or
(3) you're asking for trouble

So, the collections library should be retrofitted to be .par-safe.

Then, we need a rule of explicit escalation:
No sequential collection will be promoted to a parallel one without an explicit call.

And then old code can be protected relatively easily until unit tests (or others) can be run.

--Rex

Sat, 2011-04-02, 16:47

#27

ichoran

Joined: 2009-08-14,

Re: REPL: null-values in parallel ranges?

On Sat, Apr 2, 2011 at 10:56 AM, Daniel Sobral <dcsobral [at] gmail [dot] com> wrote:

On Fri, Apr 1, 2011 at 22:19, Seth Tisue <seth [at] tisue [dot] net> wrote:
>>>>> "Daniel" == Daniel Sobral <dcsobral [at] gmail [dot] com> writes:

Daniel> I don't understand what's the trouble here. You said you wanted
Daniel> "out of order" the instance you called ".par" on your
Daniel> collection. If your code depends on things being sequential,
Daniel> don't pass parallel collections to it.

Barbara Liskov called for you. She seems angry! I think you better
call her back.

Tell her to call me back when programmers start coding to a language's specification, instead of coding to what the compiler does.
Non-parallelism is presumed throughout Scala (and most languages) with few exceptions. The Liskov's Principle thing to do is completely separate parallel and sequential collections, or, otherwise, make sure ALL Iterable/Seq/Set/Map methods are sequential, and add a whole new set of methods for parallel operations.
So I see three choices:
1. Violate Liskov's principle.

Bad.

2. Make it so a parallel collection have different interfaces/class hierarchy than sequential ones.

Unnecessary, though I guess one could add implicit conversions that made it pretty transparent except where it mattered.

3. Add parallel methods and make sure all others are sequential.

Yes.

Parallel collections should add methods when it makes a difference.

There aren't _that_ many that even make sense to parallelize and where out-of-order simultaneous execution matters. For seq:
- collect
- corresponds
- count
- exists
- filter / filterNot / withFilter
- find
- flatMap
- forall
- foreach
- groupBy
- map / reverseMap
- partition
- reduce
- segmentLength
- sortBy / sortWith
- transpose

The relevant methods are easy to identify--if they take a function and they do not intrinsically start at one end of the collection and go from there (e.g. firstIndexOf), then they need their own method since it is very easy to pass in a closure that has mutable state and thereby mess everything up in parallel. Not only do you have all the "well, this didn't happen in order" problems, you also have the "ack, this underlying structure isn't thread safe" problem (which can be solved by introducing the "ack, everything is synchronized and it runs so slow" problem).

I'm sorry that I haven't been paying much attention until we're already into release candidates, but I really don't see how one can avoid making separate parallel methods for these things without mysteriously and unexpectedly breaking all sorts of code that's out there.

I strongly advise that these things be named as new methods, or that they all go into a trait that is NOT shared by the parallel collections hierarchy so that people can realize when they use them that they have to re-think what the code might be doing.

--Rex

Sat, 2011-04-02, 17:27

#28

Matthew Pocock 3

Joined: 2010-07-30,

Re: REPL: null-values in parallel ranges?

On 2 April 2011 16:44, Rex Kerr <ichoran [at] gmail [dot] com> wrote:

The relevant methods are easy to identify--if they take a function and they do not intrinsically start at one end of the collection and go from there (e.g. firstIndexOf),

I'm not sure it's quite that black and white. I can imagine parallel implementations of firstInstanceOf that use a mixture of divide-and-conquer, and 'kill all workers to the right of me if I evaluate to true'. If your predicate had side-effects (which is an evil thing to do, but without effects checking and purity annotations, how can you enforce this?) then we're back to nondeterminism.

I'm sorry that I haven't been paying much attention until we're already into release candidates, but I really don't see how one can avoid making separate parallel methods for these things without mysteriously and unexpectedly breaking all sorts of code that's out there.

I much prefer the model of having the exact same methods but having sequential and parallel implementations that are clearly flagged as such, and ideally where there is some type-level way to see if your collection is sequential or parallel - that is, every collection instance will derive from exactly one of SequentialCollection/ParallelCollection.

--Rex

--
Matthew Pocockmailto: turingatemyhamster [at] gmail [dot] comgchat: turingatemyhamster [at] gmail [dot] com msn: matthew_pocock [at] yahoo [dot] co [dot] ukirc.freenode.net: drdozer(0191) 2566550

Sat, 2011-04-02, 17:37

#29

Matthew Pocock 3

Joined: 2010-07-30,

Re: REPL: null-values in parallel ranges - Seq.foreach acting u

On 2 April 2011 15:50, HamsterofDeath <h-star [at] gmx [dot] de> wrote:

a parallel sequence can replace a sequencial sequence (feels weird
writing that) for every method except foreach.

Not exactly. If the function you pass to map or the predicate you pass to filter accesses shared mutable state or performs IO, then these are just as affected by out-of-order execution as foreach. In the pathological case where you call foreach with pure code, you won't notice any difference in sequential and parallel execution.
Matthew

--
Matthew Pocockmailto: turingatemyhamster [at] gmail [dot] comgchat: turingatemyhamster [at] gmail [dot] com msn: matthew_pocock [at] yahoo [dot] co [dot] ukirc.freenode.net: drdozer(0191) 2566550

Sat, 2011-04-02, 17:57

#30

H-star Development

Joined: 2010-04-14,

Re: REPL: null-values in parallel ranges - Seq.foreach acting u

what is this "mutable" you are talking about? sounds like witchery. surely you are not going to put a mutable state into a function that is going to be passed around to unknown callers in unknown threads. because that would be insane. you might as well use a dynamically typed language and complain about methods randomly appearing and disappearing.

seriously:
if your function is not threadsafe, don't use it in a multithreading environment. when i said "could replace", i implicitly meant "if using side effect free functions"

Am 02.04.2011 18:22, schrieb Matthew Pocock:

3v_qB08XfsETmN+JaFfu [at] mail [dot] gmail [dot] com" type="cite">

On 2 April 2011 15:50, HamsterofDeath <h-star [at] gmx [dot] de" rel="nofollow">h-star [at] gmx [dot] de> wrote:

a parallel sequence can replace a sequencial sequence (feels weird
writing that) for every method except foreach.

Not exactly. If the function you pass to map or the predicate you pass to filter accesses shared mutable state or performs IO, then these are just as affected by out-of-order execution as foreach. In the pathological case where you call foreach with pure code, you won't notice any difference in sequential and parallel execution.
Matthew

--
Matthew Pocock mailto: turingatemyhamster [at] gmail [dot] com" target="_blank" rel="nofollow">turingatemyhamster [at] gmail [dot] com gchat: turingatemyhamster [at] gmail [dot] com" target="_blank" rel="nofollow">turingatemyhamster [at] gmail [dot] com msn: matthew_pocock [at] yahoo [dot] co [dot] uk" target="_blank" rel="nofollow">matthew_pocock [at] yahoo [dot] co [dot] uk irc.freenode.net: drdozer (0191) 2566550

Sat, 2011-04-02, 18:17

#31

Seth Tisue

Joined: 2008-12-16,

Re: REPL: null-values in parallel ranges?

>>>>> "Daniel" == Daniel Sobral writes:

Daniel> Non-parallelism is presumed throughout Scala (and most
Daniel> languages) with few exceptions. The Liskov's Principle thing to
Daniel> do is completely separate parallel and sequential collections,
Daniel> or, otherwise, make sure ALL Iterable/Seq/Set/Map methods are
Daniel> sequential [...]

Overkill -- only Seq is at issue here.

Daniel> Do you see any other choice? I'd be happy to hear of
Daniel> it. Otherwise, what do _you_ think should be done?

Unless someone has new arguments to offer, I'm with Rex and Paul.
Seq.foreach should be sequential. A Seq is a Seq. It's not negotiable.
This is bedrock.

If enough people want it, there could be a variant of foreach that
allows out-of-order execution.

Sat, 2011-04-02, 18:37

#32

Matthew Pocock 3

Joined: 2010-07-30,

Re: REPL: null-values in parallel ranges - Seq.foreach acting u

On 2 April 2011 17:52, HamsterofDeath <h-star [at] gmx [dot] de> wrote:

what is this "mutable" you are talking about? sounds like witchery. surely you are not going to put a mutable state into a function that is going to be passed around to unknown callers in unknown threads. because that would be insane.

I would be very surprised if there isn't scala code 'in the wild' that does exactly this. It probably works just fine for them right now because collections are processed sequentially. I'd expect IO side-effects to be the most common case.

seriously:
if your function is not threadsafe, don't use it in a multithreading environment. when i said "could replace", i implicitly meant "if using side effect free functions"

In the absence of an effects system, it isn't always immediately obvious if your code will give different orders if the collection is processed in a different order, particularly for less experienced programmers. If there is some legacy or 3rd party code that now gets pushed into a parallel collection method, how could we expect people to know if it is threadsafe or not?
Matthew
--
Matthew Pocockmailto: turingatemyhamster [at] gmail [dot] comgchat: turingatemyhamster [at] gmail [dot] com msn: matthew_pocock [at] yahoo [dot] co [dot] ukirc.freenode.net: drdozer(0191) 2566550

Sat, 2011-04-02, 18:57

#33

ichoran

Joined: 2009-08-14,

Re: REPL: null-values in parallel ranges?

On Sat, Apr 2, 2011 at 12:16 PM, Matthew Pocock <turingatemyhamster [at] gmail [dot] com> wrote:

On 2 April 2011 16:44, Rex Kerr <ichoran [at] gmail [dot] com> wrote:
I'm sorry that I haven't been paying much attention until we're already into release candidates, but I really don't see how one can avoid making separate parallel methods for these things without mysteriously and unexpectedly breaking all sorts of code that's out there.

I much prefer the model of having the exact same methods but having sequential and parallel implementations that are clearly flagged as such, and ideally where there is some type-level way to see if your collection is sequential or parallel - that is, every collection instance will derive from exactly one of SequentialCollection/ParallelCollection.

That's fine, but the entire existing collections hierarchy needs to stay at the same level and _not_ have ParallelCollection's out-of-order methods mixed in. So we need a new hierarchy to implement this:

PossiblyParSeq (generic)
Seq (sequential)
ParSeq (parallel)

so people using existing methods will be guaranteed that the evaluation order will be the same, but can specifically say that they know that they're okay.

(Maybe "collection.scalable" could provide the common ancestor for both parallel and sequential operations. You then could import collection.scalable._ to auto-enable parallel out-of-order operations without any additional changes to your code. Which, unmodified, may well not work--but at least you have to actively do something that declares that you think your code is okay, instead of randomly being hit with stuff that doesn't work.)

--Rex

Sat, 2011-04-02, 20:27

#34

ScottC 2

Joined: 2011-01-30,

Re: REPL: null-values in parallel ranges?

If foreach is executed in parallel in some cases, that is going to
cause a lot of bugs from users in a couple related ways:
* Code that took a generic sequence and expected the order to be
consistent.
* Code that builds up _any_ data structure that is not thread-safe.

Someone may have some legacy code that traverses a sequence and put
them into a Btree. If they are now passed a parallel collection
(which they knew nothing about at the time of the original code), and
were a bit careless, that code will likely explode. It may have been
bad code in the first place, but that is entirely besides the point.
The semantics of an API are changing in a way that will break existing
code. The compiler doesn't yell at you and break with "sorry, that
is not a pure function!"

How much code out there using foreach on a sequence was written in a
way that won't break if executed with parallel threads?

Users of parallel collections can call .seq before handing off to
elsewhere, but this is not type safe and error prone. Are parallel
collections meant to be type safe with respect to ordinary
collections? Or should consumers add boilerplate to check for
parallel or call .seq for every one they receive? Should all legacy
code be converted to check?

I'm far too new to Scala to know what can and can't be done with types
here, but my gut tells me that legacy code shouldn't break or require
change, and new code shouldn't require boilerplate type checks or
conversions in order to be safe.

On Apr 2, 11:03 am, Daniel Sobral wrote:
> On Sat, Apr 2, 2011 at 05:24, Bernd Johannes wrote:
>
> > With this experience in the past I recommend to keep the "ordered"
> > behaviour
> > of foreach() because breaking this will introduce a whole bunch of errors
> > in
> > parallel context. Errors for which scala is not to be blamed... but it's
> > terribly error prune to switch from guaranteed ordered / sequencial to
> > unordered behaviour in parallel context.
>
> > I would even go as far as to recommend an artifical "unordered" behaviour
> > of
> > the sequencial foreach if the scala team decides to keep the parallel
> > behaviour.
>
> In which case what will happen is that people will come complain that they
> changed their collection to parallel, but nothing happened! At which point
> we'll tell them, "No, it's not just changing the collection to parallel.
> You'll have to rewrite your whole code as well."
>
> I someone uses a parallel collection, he _expects_ things to be parallel.
> Sure, they'll break code that worked for serial collection in new and
> interesting ways, but that is easily fixed: don't use parallel collections.
> You can even call .seq before sending a collection to that part of the code,
> and call .par on anything getting back.
>
> --
> Daniel C. Sobral
>
> I travel to the future all the time.

Sat, 2011-04-02, 21:07

#35

odersky

Joined: 2008-07-29,

Re: REPL: null-values in parallel ranges?

On Fri, Apr 1, 2011 at 12:11 AM, Paul Phillips <paulp [at] improving [dot] org> wrote:

Well, I guess it is intentional. This has rather severe consequences for something like a pre-existing method which is not written expecting that things implementing Seq[T] may now use a random order for foreach. Maybe I was naive all this time thinking that the reproducible quality of foreach on a sequence was something I could count on, and I should have been calling "iterator" all the time. Either way, there's a lot of code out there.

True. Parallel collections are much more of a game changer than people realize so far. In particular, we will have to introduce quite a few seq's to make things work as before. I agree that documentation needs to be updated to take this into account. We were thinking for a while on introducing new operations for parallel operations and leaving foreach sequential, but in the end decided not to. If you write

for { x <- pxs } ...

where pxs is a parallel collection you expect the loop to be parallel.
So foreach needs to be parallel.

Let's face it: We will not get the genefits of automatic multi-core execution by continuing in our old ways. The Fortress language makes every collection parallel by default. Scala is not quite as radical, but I believe it's ultimately correct not to sacrifice convenience for the parallel case just so that naive sequential code continues to work unchanged.

But yes, it is a worry that we'll have to think about this issue now everywhere. Hopefully, it will translate into a much stronger push to be fully functional.

Cheers

-- Martin

Sat, 2011-04-02, 21:27

#36

Viktor Klang

Joined: 2008-12-17,

Re: REPL: null-values in parallel ranges?

The biggest risk I see is that people will use non-thread-safe mutable state in their closures, and then things will work in 99.99% of all cases but corrupt things silently.

On Sat, Apr 2, 2011 at 10:00 PM, martin odersky <martin [dot] odersky [at] epfl [dot] ch> wrote:

On Fri, Apr 1, 2011 at 12:11 AM, Paul Phillips <paulp [at] improving [dot] org> wrote:

Well, I guess it is intentional. This has rather severe consequences for something like a pre-existing method which is not written expecting that things implementing Seq[T] may now use a random order for foreach. Maybe I was naive all this time thinking that the reproducible quality of foreach on a sequence was something I could count on, and I should have been calling "iterator" all the time. Either way, there's a lot of code out there.

True. Parallel collections are much more of a game changer than people realize so far. In particular, we will have to introduce quite a few seq's to make things work as before. I agree that documentation needs to be updated to take this into account. We were thinking for a while on introducing new operations for parallel operations and leaving foreach sequential, but in the end decided not to. If you write

for { x <- pxs } ...

where pxs is a parallel collection you expect the loop to be parallel.
So foreach needs to be parallel.

Let's face it: We will not get the genefits of automatic multi-core execution by continuing in our old ways. The Fortress language makes every collection parallel by default. Scala is not quite as radical, but I believe it's ultimately correct not to sacrifice convenience for the parallel case just so that naive sequential code continues to work unchanged.

But yes, it is a worry that we'll have to think about this issue now everywhere. Hopefully, it will translate into a much stronger push to be fully functional.

Cheers

-- Martin

--
Viktor Klang,
Relentless is more
Work:   Scalable Solutions
Code:   github.com/viktorklang
Follow: twitter.com/viktorklang
Read:   klangism.tumblr.com

Sat, 2011-04-02, 21:37

#37

Tony Morris

Joined: 2008-12-19,

Re: REPL: null-values in parallel ranges?

Viktor, scala has no effect tracking. This is essentially what you said. This issue is not specific to parallel code. We all deal with this already, somehow.

That is not to say "parallel foreach" is a good idea. It isn't.

On 03/04/2011 6:11 AM, "√iktor Ҡlang" <viktor [dot] klang [at] gmail [dot] com> wrote:

Sat, 2011-04-02, 22:07

#38

Jonas Bonér

Joined: 2008-12-19,

Re: REPL: null-values in parallel ranges?

p.p1 {margin: 0.0px 0.0px 0.0px 0.0px; font: 16.0px Arial}

I don't think parallel collections should "lie" and provide a leaky abstraction over concurrency. When someone asks for parallel execution I think it is best that they get what they are asking for. Sure it might break sequential code, but threads does that; they turn determinism into non-determinism. Let's embrace parallelism at its core and reap the benefits of it rather than trying to force it into something that it is not. It is a matter of education; documentation needs to be a lot better. People needs to understand that concurrency and parallelism is here to stay and a lot of the time it still ain't that pretty.

2011/4/2 Tony Morris <tmorris [at] tmorris [dot] net>

Viktor, scala has no effect tracking. This is essentially what you said. This issue is not specific to parallel code. We all deal with this already, somehow.

That is not to say "parallel foreach" is a good idea. It isn't.
On 03/04/2011 6:11 AM, "√iktor Ҡlang" <viktor [dot] klang [at] gmail [dot] com> wrote:

--
Jonas Bonér

Specialist at Large
work: http://scalablesolutions.se
code: http://akka.io
blog: http://jonasboner.com
twtr: @jboner

Sat, 2011-04-02, 22:57

#39

ichoran

Joined: 2009-08-14,

Re: REPL: null-values in parallel ranges?

On Sat, Apr 2, 2011 at 4:00 PM, martin odersky <martin [dot] odersky [at] epfl [dot] ch> wrote:

On Fri, Apr 1, 2011 at 12:11 AM, Paul Phillips <paulp [at] improving [dot] org> wrote:
Well, I guess it is intentional. This has rather severe consequences for something like a pre-existing method which is not written expecting that things implementing Seq[T] may now use a random order for foreach. Maybe I was naive all this time thinking that the reproducible quality of foreach on a sequence was something I could count on, and I should have been calling "iterator" all the time. Either way, there's a lot of code out there.

True. Parallel collections are much more of a game changer than people realize so far. In particular, we will have to introduce quite a few seq's to make things work as before. I agree that documentation needs to be updated to take this into account. We were thinking for a while on introducing new operations for parallel operations and leaving foreach sequential, but in the end decided not to.

That's unfortunate, because it makes it more difficult to supply parameters regarding the parallelism that you want. You probably don't want all parallel operations to use the same number of processors if your machine might be doing something else than executing your Scala code, or even if your Scala program might be running several different operation simultaneously on a machine with many cores. I guess you could make it an argument to .par.

If you write

for { x <- pxs } ...

where pxs is a parallel collection you expect the loop to be parallel.

Agreed, if you know that pxs is a parallel collection.

But if, two years ago, I wrote
for {x <- xs } ...
and stuck it in a library, I'm pretty sure I was _not_ expecting it to be parallel.

So you cannot possibly trust that code--unless you've inspected it yourself--to work sensibly in a parallel context. It might break horribly and obviously every time, or it might break subtly on rare occasions, or it might not break at all. The point is that you have no way to know which it is without looking at every use of the offending operations.

If you don't perform that extensive code review, the burden is _entirely on you_ as the user of the library to remember every single time that you use it that you'd better .seq whatever you pass in or Bad Things might happen.

This is a bad design, unless you're going to rewrite all my old libraries for me.

So foreach needs to be parallel.

foreach needs to be _able to be_ parallel, but it need not always have the option of being parallel, if used in the context of sequential collections.

Let's face it: We will not get the genefits of automatic multi-core execution by continuing in our old ways.

Agreed. This is the reason why throwing parallel collections automatically into code written assuming sequential collections is a bad idea, and the library should discourage it.

The Fortress language makes every collection parallel by default. Scala is not quite as radical

It, as designed now, basically as radical. Every base collection is now parallel by default. If you use standard garden-variety Seq, Iterable, Map, etc., you are required to assume that it is parallel or your code will randomly break when reused. Unlike with Fortress, however, this new reality is partially disguised--it looks like parallel collections are an optional add-on, not something that must be treated in every case.

If you want, say, a mutable sequence that is sequential, you cannot use collection.mutable.Seq. You have to decide which implementation you want, and use, say, collection.mutable.ArrayBuffer.

, but I believe it's ultimately correct not to sacrifice convenience for the parallel case just so that naive sequential code continues to work unchanged.

But yes, it is a worry that we'll have to think about this issue now everywhere. Hopefully, it will translate into a much stronger push to be fully functional.

Unfortunately, the library is not nearly powerful enough for side-effect-free code to perform acceptably in very many instances.

For example, Scala's collections have no "filterIndex" method--something which returns not the elements but the indices of the elements. You might

    xs.zipWithIndex.filter(xi => p(xi._1)).map(_._2)

or you might

    { var i = -1; xs.view.filter(x => {i += 1; p(x) }).map(_ => i).force }

or perhaps

    { var i = 0; val b = xs.genericBuilder[Int]; xs.foreach(x => { if (p(x)) { b += i }; i += 1 }); b.result }

Unfortunately, the second is twice as fast for relatively simple p(x) (like a test of a string length), and the third is three to four times as fast. So, what do I want to do? Throw two to four cores at it with the slightly nicer syntax, or be able to write it to use fewer cores? I'm pretty sure that I'd at least like to be able to _choose_.

So for two very important reasons, namely correctness of running of old code, and performance of new code using parallel-unsafe operations, I think that the collections library should exist in the hierarchy in such a way that you can be sure you're not using parallel collections in any operations where it's easy and useful access shared mutable state or have side effects.

That is,

scala.collection.scalable.Seq // Use this if parallel collections are ok
scala.collection.Seq         // We're sure it's sequential
scala.collection.parallel.ParSeq // We're sure it's parallel

Or, at the very least,

scala.collection.Seq // Must assume that it may be parallel!
scala.collection.sequential.Seq   // Sure to be sequential
scala.collection.parallel.Seq     // Sure to be parallel

with the latter done in such a way that an import collection.sequential._ will yield the old behavior in all cases.

I think it's better still to use different method names, but I agree that the for case is non-ideal, since although it could use the parallel version if the collection was typed appropriately, it would be too wasteful to go searching for it at runtime.

--Rex

Sun, 2011-04-03, 01:17

#40

extempore

Joined: 2008-12-17,

Re: REPL: null-values in parallel ranges?

I looked around trunk a little bit for stuff which is broken because of
this. Here are the first couple examples I found.

// more "dynamic" hashcodes
scala> val x = (1 to 10).par
x: scala.collection.parallel.immutable.ParRange = ParRange(1, 2, 3, 4,
5, 6, 7, 8, 9, 10)

scala> x.hashCode
res0: Int = 1137090302

scala> x.hashCode
res1: Int = -555212555

scala> x.hashCode
res2: Int = -1216588130

// grab your partner, do-sa-do
scala> (1 to 10 par, 1 to 10).zipped foreach ((x, y) => println(x + " "
+ y))
3 2
6 3
2 4
7 6
1 1
8 7
4 5
5 9
9 8
10 10

Sun, 2011-04-03, 02:27

#41

Joshua.Suereth

Joined: 2008-09-02,

Re: REPL: null-values in parallel ranges?

hashCode seems like a deal breaker...
So I may not have been paying attention. Are parallel collections "invasive" in the sense that code needs to be sprinkled everywhere throughout the std library to ensure correctness, or can we limit the fixes just to the parallel collections themselves?

On Sat, Apr 2, 2011 at 8:12 PM, Paul Phillips <paulp [at] improving [dot] org> wrote:

I looked around trunk a little bit for stuff which is broken because of this. Here are the first couple examples I found.

// more "dynamic" hashcodes
scala> val x = (1 to 10).par
x: scala.collection.parallel.immutable.ParRange = ParRange(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)

scala> x.hashCode
res0: Int = 1137090302

scala> x.hashCode
res1: Int = -555212555

scala> x.hashCode
res2: Int = -1216588130

// grab your partner, do-sa-do
scala> (1 to 10 par, 1 to 10).zipped foreach ((x, y) => println(x + " " + y))
3 2
6 3
2 4
7 6
1 1
8 7
4 5
5 9
9 8
10 10

Sun, 2011-04-03, 03:17

#42

Bill Venners

Joined: 2008-12-18,

Re: REPL: null-values in parallel ranges?

Hi All,

On Sat, Apr 2, 2011 at 8:44 AM, Rex Kerr wrote:
>
> On Sat, Apr 2, 2011 at 10:56 AM, Daniel Sobral wrote:
>>
>> On Fri, Apr 1, 2011 at 22:19, Seth Tisue wrote:
>>>
>>> >>>>> "Daniel" == Daniel Sobral writes:
>>>
>>> Daniel> I don't understand what's the trouble here. You said you wanted
>>> Daniel> "out of order" the instance you called ".par" on your
>>> Daniel> collection. If your code depends on things being sequential,
>>> Daniel> don't pass parallel collections to it.
>>>
>>> Barbara Liskov called for you. She seems angry! I think you better
>>> call her back.
>>
>> Tell her to call me back when programmers start coding to a language's
>> specification, instead of coding to what the compiler does.
>> Non-parallelism is presumed throughout Scala (and most languages) with few
>> exceptions. The Liskov's Principle thing to do is completely separate
>> parallel and sequential collections, or, otherwise, make sure ALL
>> Iterable/Seq/Set/Map methods are sequential, and add a whole new set of
>> methods for parallel operations.
>> So I see three choices:
>> 1. Violate Liskov's principle.
>
> Bad.
>
I think the problem here is the written "contract" for foreach was
different than how most people thought of it, including me. I've known
.par was coming for a good long time and one part of my brain knew
that meant that foreach would execute in parallel. Nevertheless at the
same time another part thought of a side-effecting for expression as
an analogue to Java's for loop. Those two parts never got together and
compared notes.

I had to go look at the Scaladoc for foreach in Seq to see what it
said. Sure enough, it doesn't say execution will be in Seq order:

Applies a function f to all elements of this sequence.

Looks like Martin never promised us a rose garden. Foreach also
doesn't even make an in-order promise on iterator:

Applies a function f to all values produced by this iterator.

So if we take this text as the contract, then Barbara Liskov should be
happy. It never was promised it would be in order. This addition of
.par is not a breaking change as far as the contract goes, but
programs may break because people had a different contract in their
heads when they wrote code. So that's one problem. The other is that
people may continue to think that way. This looks too much like
for-like things in other languages that do things in order, and in
Scala that was usually the case prior to 2.9:

scala> def printElements(seq: Seq[Any]) {
| for(e <- seq)
| println (e)
| }
printElements: (seq: Seq[Any])Unit

scala> printElements(List("Hello", "ordered", "world1"))
Hello
ordered
world1

One question I have is: is there a way to use a for expression with a
Seq and be sure to get things in Seq order? From the contract of
Iterator, this doesn't seem to be a guaranteed way to do it:

for(e <- seq.iterator)
println (e)

And if it is not guaranteed to go in Seq order if it is just a Seq,
then this may not be guaranteed to work either in 2.9:

for(e <- seq.seq)
println (e)

So this a there do is to way?

Sorry, I meant: so is there a way to do this?

Bill
----
Bill Venners
Artima, Inc.
http://www.artima.com

Sun, 2011-04-03, 03:17

#43

extempore

Joined: 2008-12-17,

Re: REPL: null-values in parallel ranges?

On 4/2/11 6:18 PM, Josh Suereth wrote:
> hashCode seems like a deal breaker...
>
> So I may not have been paying attention. Are parallel collections
> "invasive" in the sense that code needs to be sprinkled everywhere
> throughout the std library to ensure correctness, or can we limit the
> fixes just to the parallel collections themselves?

They are "invasive" in that sense. To use the hashCode example, here's
why the .par hashCode is non-deterministic.

// SeqLike.scala
override def hashCode() = {
val h = new util.MurmurHash[A](Seq.hashSeed)
this.foreach(h)
h.hash
}

Easy enough to fix of course. But only with the aforementioned
sprinkling. When the method was written one could pretty reasonably
expect the same outcome on each run.

I'm having trouble imagining happy outcomes if we go forward with this.

Sun, 2011-04-03, 03:27

#44

Joshua.Suereth

Joined: 2008-09-02,

Re: REPL: null-values in parallel ranges?

Yeah... you just convinced me.
foreach should stay sequential.. *OR* (crazy thought) Parallel collections should *not* derive from the current non-parallel collections. (I'm looking at you sequential-order-assuming Traversable base class).
Instead there can be some other types of interfaces that make sense in a parallel world that both parallel collections and non-parallel collections can derive from.
If you want parallelism, you're stuck with map/flatMap/filter. You want sequential, you convert back to a regular collection and win.
- Josh

On Sat, Apr 2, 2011 at 10:11 PM, Paul Phillips <paulp [at] improving [dot] org> wrote:

On 4/2/11 6:18 PM, Josh Suereth wrote:

hashCode seems like a deal breaker...

So I may not have been paying attention. Are parallel collections
"invasive" in the sense that code needs to be sprinkled everywhere
throughout the std library to ensure correctness, or can we limit the
fixes just to the parallel collections themselves?

They are "invasive" in that sense. To use the hashCode example, here's why the .par hashCode is non-deterministic.

// SeqLike.scala
override def hashCode() = {
val h = new util.MurmurHash[A](Seq.hashSeed)
this.foreach(h)
h.hash
}

Easy enough to fix of course. But only with the aforementioned sprinkling. When the method was written one could pretty reasonably expect the same outcome on each run.

I'm having trouble imagining happy outcomes if we go forward with this.

Sun, 2011-04-03, 03:37

#45

Paul Phillips 2

Joined: 2011-02-10,

Re: REPL: null-values in parallel ranges?

On 4/2/11 7:10 PM, Bill Venners wrote:
> One question I have is: is there a way to use a for expression with a
> Seq and be sure to get things in Seq order? From the contract of
> Iterator, this doesn't seem to be a guaranteed way to do it:
>
> for(e<- seq.iterator)
> println (e)

The documentation for reverseIterator guarantees them in reverse order,
so the optimal solution is

for (e <- seq.reverseIterator.toList.reverseIterator)
println (e)

Problem solved.

Sun, 2011-04-03, 03:47

#46

Joshua.Suereth

Joined: 2008-09-02,

Re: REPL: null-values in parallel ranges?

I sleep easier at night knowing this is the solution.

On Sat, Apr 2, 2011 at 10:21 PM, Paul Phillips <paul [dot] phillips [at] gmail [dot] com> wrote:

On 4/2/11 7:10 PM, Bill Venners wrote:

One question I have is: is there a way to use a for expression with a
Seq and be sure to get things in Seq order? From the contract of
Iterator, this doesn't seem to be a guaranteed way to do it:

for(e<- seq.iterator)
println (e)

The documentation for reverseIterator guarantees them in reverse order, so the optimal solution is

for (e <- seq.reverseIterator.toList.reverseIterator)
println (e)

Problem solved.

Sun, 2011-04-03, 03:57

#47

extempore

Joined: 2008-12-17,

Re: REPL: null-values in parallel ranges?

On 4/2/11 1:00 PM, martin odersky wrote:
> But yes, it is a worry that we'll have to think about this issue now
> everywhere. Hopefully, it will translate into a much stronger push to be
> fully functional.

For the record I don't mind worrying about it everywhere from now on.
I'm all for it. I just don't want to spend the rest of my life fixing
bugs in pre-existing code.

> If you write
>
> for { x <- pxs } ...
>
> where pxs is a parallel collection you expect the loop to be parallel.
> So foreach needs to be parallel.

That tells me the foreach at the base of the parallel collections has to
be parallel, but not that the foreach we already have has to be. If
that's the main motivation then we can re-root the parallel collections
around a parallel foreach. I already pushed foreach backward for
TraversableOnce, I'd be glad to push it sideways for this one.

There's another interesting tie-in here, which I already had on my
agenda for a while, ever since looking too closely at slice. There is a
real issue with having the same trait provide default implementations
for both mutable and immutable collections. The default implementation
is either outright wrong or grossly inefficient half the time. So the
side which was not chosen as the favorite side has to be sure every
method implementation is overridden (and many bugs have been the result
of not achieving this.) But what is the point of having default
implementations if they are menaces which must be guarded against half
the time?

Example. What is the default implementation of tail? There are pretty
much two choices.

// if this is a mutable seq, you're now sharing mutable cells
def tail = pointer to the second element of this collection
// if this is an immutable seq, you're now copying a billion
// elements for no reason
def tail = copy this entire collection except the first

For almost every method in TraversableLike this same issue arises. (The
methods where one can safely use the same implementation are generally
folds and other chew-up-the-sequence methods, and those are in
TraversableOnce.) The conclusion I drew is that the mutable and
immutable collections should not share any code past TraversableOnce.
They should (according to this theory) share a pure interface in
TraversableLike and each have their own default implementations.

So back to the parallel foreach problem: same thing.

trait MutableTraversableLike extends TraversableMethods
trait ImmutableTraversableLike extends TraversableMethods
trait ParallelTraversableLike extends TraversableMethods

(I won't speculate on whether it should really be
ParallelImmutableTraversableLike and so on or how these might compose
unless I see a glimmer of hope we could go this way.)

Sun, 2011-04-03, 04:07

#48

Joshua.Suereth

Joined: 2008-09-02,

Re: REPL: null-values in parallel ranges?

+1

On Sat, Apr 2, 2011 at 10:49 PM, Paul Phillips <paulp [at] improving [dot] org> wrote:

On 4/2/11 1:00 PM, martin odersky wrote:

But yes, it is a worry that we'll have to think about this issue now
everywhere. Hopefully, it will translate into a much stronger push to be
fully functional.

For the record I don't mind worrying about it everywhere from now on. I'm all for it. I just don't want to spend the rest of my life fixing bugs in pre-existing code.

If you write

for { x <- pxs } ...

where pxs is a parallel collection you expect the loop to be parallel.
So foreach needs to be parallel.

That tells me the foreach at the base of the parallel collections has to be parallel, but not that the foreach we already have has to be. If that's the main motivation then we can re-root the parallel collections around a parallel foreach. I already pushed foreach backward for TraversableOnce, I'd be glad to push it sideways for this one.

There's another interesting tie-in here, which I already had on my agenda for a while, ever since looking too closely at slice. There is a real issue with having the same trait provide default implementations for both mutable and immutable collections. The default implementation is either outright wrong or grossly inefficient half the time. So the side which was not chosen as the favorite side has to be sure every method implementation is overridden (and many bugs have been the result of not achieving this.) But what is the point of having default implementations if they are menaces which must be guarded against half the time?

Example. What is the default implementation of tail? There are pretty much two choices.

// if this is a mutable seq, you're now sharing mutable cells
def tail = pointer to the second element of this collection
// if this is an immutable seq, you're now copying a billion
// elements for no reason
def tail = copy this entire collection except the first

For almost every method in TraversableLike this same issue arises. (The methods where one can safely use the same implementation are generally folds and other chew-up-the-sequence methods, and those are in TraversableOnce.) The conclusion I drew is that the mutable and immutable collections should not share any code past TraversableOnce. They should (according to this theory) share a pure interface in TraversableLike and each have their own default implementations.

So back to the parallel foreach problem: same thing.

trait MutableTraversableLike extends TraversableMethods
trait ImmutableTraversableLike extends TraversableMethods
trait ParallelTraversableLike extends TraversableMethods

(I won't speculate on whether it should really be ParallelImmutableTraversableLike and so on or how these might compose unless I see a glimmer of hope we could go this way.)

Sun, 2011-04-03, 04:17

#49

Bill Venners

Joined: 2008-12-18,

Re: REPL: null-values in parallel ranges?

Hi Paul,

I'm not sure how to implement it, but I wonder if the following
behavior would make sense: have for expression rewriting rules for
foreach take into account whether the generator mixes in the Parallel
marker trait.

scala> import scala.collection.Parallel
import scala.collection.Parallel

scala> val ls = List(1, 2, 3, 4)
ls: List[Int] = List(1, 2, 3, 4)

scala> ls.isInstanceOf[Parallel]
res3: Boolean = false

scala> val pls = ls.par
pls: scala.collection.parallel.immutable.ParSeq[Int] = ParVector(1, 2, 3, 4)

scala> pls.isInstanceOf[Parallel]
res5: Boolean = true

If I do this:

for (e <- ls) println(e)

This would somehow cause a an in-order execution, because the type of
ls doesn't mix in Parallel.

But if I do this:

for (e <- pls) println(e)

It would happen in Parallel, because the type of generator pls mixes
in Parallel. That way if people have a method that takes a Seq[T],
say, and uses that in a for expression, then because Seq[T] isn't a
subtype of Parallel, it would do it in order (somehow). That means all
old code should not break.

def oldMethod(seq: Seq[String]) { for (e <- seq) println(e) }

But new code that really wants parallel execution could just do .par
on that Seq[T], which would somehow kick it into parallel execution.

def newMethod(pSeq: Seq[String] with Parallel) { for (e <- pSeq) println(e) }

By saying the method takes a Seq[String] with Parallel, it is
announcing that it might do foreaches in parallel.

I think it would be fine to let flatMap, filter, withFilter, map be
rewritten in the same way, because most times people wouldn't have
side effects in those. But foreach is all about the side effects.

Bill

On Sat, Apr 2, 2011 at 7:49 PM, Paul Phillips wrote:
> On 4/2/11 1:00 PM, martin odersky wrote:
>>
>> But yes, it is a worry that we'll have to think about this issue now
>> everywhere. Hopefully, it will translate into a much stronger push to be
>> fully functional.
>
> For the record I don't mind worrying about it everywhere from now on. I'm
> all for it. I just don't want to spend the rest of my life fixing bugs in
> pre-existing code.
>
>> If you write
>>
>> for { x <- pxs } ...
>>
>> where pxs is a parallel collection you expect the loop to be parallel.
>> So foreach needs to be parallel.
>
> That tells me the foreach at the base of the parallel collections has to be
> parallel, but not that the foreach we already have has to be. If that's the
> main motivation then we can re-root the parallel collections around a
> parallel foreach. I already pushed foreach backward for TraversableOnce,
> I'd be glad to push it sideways for this one.
>
> There's another interesting tie-in here, which I already had on my agenda
> for a while, ever since looking too closely at slice. There is a real issue
> with having the same trait provide default implementations for both mutable
> and immutable collections. The default implementation is either outright
> wrong or grossly inefficient half the time. So the side which was not
> chosen as the favorite side has to be sure every method implementation is
> overridden (and many bugs have been the result of not achieving this.) But
> what is the point of having default implementations if they are menaces
> which must be guarded against half the time?
>
> Example. What is the default implementation of tail? There are pretty much
> two choices.
>
> // if this is a mutable seq, you're now sharing mutable cells
> def tail = pointer to the second element of this collection
> // if this is an immutable seq, you're now copying a billion
> // elements for no reason
> def tail = copy this entire collection except the first
>
> For almost every method in TraversableLike this same issue arises. (The
> methods where one can safely use the same implementation are generally folds
> and other chew-up-the-sequence methods, and those are in TraversableOnce.)
> The conclusion I drew is that the mutable and immutable collections should
> not share any code past TraversableOnce. They should (according to this
> theory) share a pure interface in TraversableLike and each have their own
> default implementations.
>
> So back to the parallel foreach problem: same thing.
>
> trait MutableTraversableLike extends TraversableMethods
> trait ImmutableTraversableLike extends TraversableMethods
> trait ParallelTraversableLike extends TraversableMethods
>
> (I won't speculate on whether it should really be
> ParallelImmutableTraversableLike and so on or how these might compose unless
> I see a glimmer of hope we could go this way.)
>
>

Sun, 2011-04-03, 04:27

#50

Bill Venners

Joined: 2008-12-18,

Re: REPL: null-values in parallel ranges?

Hi Paul,

Actually I said that wrongly. I meant that if I really want to do a
parallel foreach on a Seq type that doesn't extend Parallel, I could
write it like this:

for (e <- ls.par) {
// some thread-safe side-effecting code
}

The example I showed was a method that took a Parallel Seq already.

Bill

On Sat, Apr 2, 2011 at 8:03 PM, Bill Venners wrote:
> Hi Paul,
>
> I'm not sure how to implement it, but I wonder if the following
> behavior would make sense: have for expression rewriting rules for
> foreach take into account whether the generator mixes in the Parallel
> marker trait.
>
> scala> import scala.collection.Parallel
> import scala.collection.Parallel
>
> scala> val ls = List(1, 2, 3, 4)
> ls: List[Int] = List(1, 2, 3, 4)
>
> scala> ls.isInstanceOf[Parallel]
> res3: Boolean = false
>
> scala> val pls = ls.par
> pls: scala.collection.parallel.immutable.ParSeq[Int] = ParVector(1, 2, 3, 4)
>
> scala> pls.isInstanceOf[Parallel]
> res5: Boolean = true
>
> If I do this:
>
> for (e <- ls) println(e)
>
> This would somehow cause a an in-order execution, because the type of
> ls doesn't mix in Parallel.
>
> But if I do this:
>
> for (e <- pls) println(e)
>
> It would happen in Parallel, because the type of generator pls mixes
> in Parallel. That way if people have a method that takes a Seq[T],
> say, and uses that in a for expression, then because Seq[T] isn't a
> subtype of Parallel, it would do it in order (somehow). That means all
> old code should not break.
>
> def oldMethod(seq: Seq[String]) { for (e <- seq) println(e) }
>
> But new code that really wants parallel execution could just do .par
> on that Seq[T], which would somehow kick it into parallel execution.
>
> def newMethod(pSeq: Seq[String] with Parallel) { for (e <- pSeq) println(e) }
>
> By saying the method takes a Seq[String] with Parallel, it is
> announcing that it might do foreaches in parallel.
>
> I think it would be fine to let flatMap, filter, withFilter, map be
> rewritten in the same way, because most times people wouldn't have
> side effects in those. But foreach is all about the side effects.
>
> Bill
>
> On Sat, Apr 2, 2011 at 7:49 PM, Paul Phillips wrote:
>> On 4/2/11 1:00 PM, martin odersky wrote:
>>>
>>> But yes, it is a worry that we'll have to think about this issue now
>>> everywhere. Hopefully, it will translate into a much stronger push to be
>>> fully functional.
>>
>> For the record I don't mind worrying about it everywhere from now on. I'm
>> all for it. I just don't want to spend the rest of my life fixing bugs in
>> pre-existing code.
>>
>>> If you write
>>>
>>> for { x <- pxs } ...
>>>
>>> where pxs is a parallel collection you expect the loop to be parallel.
>>> So foreach needs to be parallel.
>>
>> That tells me the foreach at the base of the parallel collections has to be
>> parallel, but not that the foreach we already have has to be. If that's the
>> main motivation then we can re-root the parallel collections around a
>> parallel foreach. I already pushed foreach backward for TraversableOnce,
>> I'd be glad to push it sideways for this one.
>>
>> There's another interesting tie-in here, which I already had on my agenda
>> for a while, ever since looking too closely at slice. There is a real issue
>> with having the same trait provide default implementations for both mutable
>> and immutable collections. The default implementation is either outright
>> wrong or grossly inefficient half the time. So the side which was not
>> chosen as the favorite side has to be sure every method implementation is
>> overridden (and many bugs have been the result of not achieving this.) But
>> what is the point of having default implementations if they are menaces
>> which must be guarded against half the time?
>>
>> Example. What is the default implementation of tail? There are pretty much
>> two choices.
>>
>> // if this is a mutable seq, you're now sharing mutable cells
>> def tail = pointer to the second element of this collection
>> // if this is an immutable seq, you're now copying a billion
>> // elements for no reason
>> def tail = copy this entire collection except the first
>>
>> For almost every method in TraversableLike this same issue arises. (The
>> methods where one can safely use the same implementation are generally folds
>> and other chew-up-the-sequence methods, and those are in TraversableOnce.)
>> The conclusion I drew is that the mutable and immutable collections should
>> not share any code past TraversableOnce. They should (according to this
>> theory) share a pure interface in TraversableLike and each have their own
>> default implementations.
>>
>> So back to the parallel foreach problem: same thing.
>>
>> trait MutableTraversableLike extends TraversableMethods
>> trait ImmutableTraversableLike extends TraversableMethods
>> trait ParallelTraversableLike extends TraversableMethods
>>
>> (I won't speculate on whether it should really be
>> ParallelImmutableTraversableLike and so on or how these might compose unless
>> I see a glimmer of hope we could go this way.)
>>
>>
>
>
>
> --
> Bill Venners
> Artima, Inc.
> http://www.artima.com
>

Copyright © 2012 École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland