REPL: null-values in parallel ranges?

Hi scalaists

this evening I started playing a little with the parallel collections and this
crossed my way...
This is most likely only a REPL display effect - but it felt disturbing
somehow (thinking about races and uninitialized values):

-----------------
bjo@noordx:~/Tools/scala-2.9.0.RC1> bin/scala -J-Xmx3000m
Welcome to Scala version 2.9.0.RC1 (Java HotSpot(TM) 64-Bit Server VM, Java
1.6.0_24).
Type in expressions to have them evaluated.
Type :help for more information.

scala> val big = (1 to 10000000)
big: scala.collection.immutable.Range.Inclusive = Range(1, 2, 3, 4, 5, 6, 7,
8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27,
28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46,
47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65,
66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84,
85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102,
103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117,
118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132,
133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147,
148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162,
163, 164, 165, 166, 167, 168, 169, 170,...
scala> val bigger = big.map(e => -e)
bigger: scala.collection.immutable.IndexedSeq[Int] = Vector(-1, -2, -3, -4,
-5, -6, -7, -8, -9, -10, -11, -12, -13, -14, -15, -16, -17, -18, -19, -20,
-21, -22, -23, -24, -25, -26, -27, -28, -29, -30, -31, -32, -33, -34, -35,
-36, -37, -38, -39, -40, -41, -42, -43, -44, -45, -46, -47, -48, -49, -50,
-51, -52, -53, -54, -55, -56, -57, -58, -59, -60, -61, -62, -63, -64, -65,
-66, -67, -68, -69, -70, -71, -72, -73, -74, -75, -76, -77, -78, -79, -80,
-81, -82, -83, -84, -85, -86, -87, -88, -89, -90, -91, -92, -93, -94, -95,
-96, -97, -98, -99, -100, -101, -102, -103, -104, -105, -106, -107, -108,
-109, -110, -111, -112, -113, -114, -115, -116, -117, -118, -119, -120, -121,
-122, -123, -124, -125, -126, -127, -128, -129, -130, -131, -132, -133, -134,
-135, -136, -137, -138, -139, -140, -141,...
scala> val bigpar = big.par
bigpar: scala.collection.parallel.immutable.ParRange = ParRange(1, 2, 3, 4, 5,
6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25,
26, 251, 27, 252, 28, 253, 29, 30, null, null, null, null, null, null, null,
null, null, null, null, null, null, null, null, null, null, null, null, null,
null, null, null, null, null, null, null, null, null, null, null, 511, 261,
46, 512, 47, 513, 262, 48, 514, 49, 515, 50, 263, 51, 516, 52, 264, 53, 517,
54, 518, 55, 265, 520, 56, 521, 57, 522, 58, 523, 59, 524, 60, 525, 266, 61,
526, 62, 527, 267, 528, 529, 530, 268, 531, 63, 269, 532, 533, 64, 270, 534,
535, 65, 536, 271, 537, 66, 538, 272, 539, 67, 540, 273, 541, 68, 542, 274,
543, 69, 544, 545, 275, 546, 547, 70, 548, 276, 549, 71, 126, 550, 277, 551,
72, 552, 127, 278, 553, 73,...
------------------
needless to say that I cannot reproduce this faithfully... most of the time it
looks unsuspicious. But as I saw it once I retried several times - this was
the second time I got this - directly after the start of the REPL session (the
transcript is unaltered)

I'm not sure if this qualifies as bug, as I was not able to get a hold on the
"nulls" during range processing. So it's maybe only a REPL print thingy.

Greetings
Bernd

Re: REPL: null-values in parallel ranges?

On 3/31/11 1:42 PM, Bernd Johannes wrote:
> I'm not sure if this qualifies as bug, as I was not able to get a hold on the
> "nulls" during range processing. So it's maybe only a REPL print thingy.

It isn't taking me long trying .par in the repl to see all kinds of
crazy stuff. The main thing is that foreach seems to return things in a
random order and iterator doesn't. That doesn't sound like something
intentional, but I try not to assume too much. But it is likely at the
heart of what you are seeing too.

scala> def fx = (1 to 5).par
fx: scala.collection.parallel.immutable.ParRange

scala> fx foreach println
1
3
4
2
5

scala> fx foreach println
2
3
1
5
4

scala> fx.iterator foreach println
1
2
3
4
5

scala> fx.iterator foreach println
1
2
3
4
5

Re: REPL: null-values in parallel ranges?

> It isn't taking me long trying .par in the repl to see all kinds of
> crazy stuff.  The main thing is that foreach seems to return things in a
> random order and iterator doesn't.  That doesn't sound like something
> intentional, but I try not to assume too much.  But it is likely at the
> heart of what you are seeing too.
>

We used to have `pforeach` being implemented in parallel, and
`foreach` sequential, but this was changed recently.
I'm still not sure if that was the best thing to do, but, yes, it was
intentional.

> scala> def fx = (1 to 5).par
> fx: scala.collection.parallel.immutable.ParRange
>
> scala> fx foreach println
> 1
> 3
> 4
> 2
> 5
>
> scala> fx foreach println
> 2
> 3
> 1
> 5
> 4
>
> scala> fx.iterator foreach println
> 1
> 2
> 3
> 4
> 5
>
> scala> fx.iterator foreach println
> 1
> 2
> 3
> 4
> 5

Re: Re: REPL: null-values in parallel ranges?

I think the question was whether the difference between foreach and iterator is intentional.

-- Jim


On Mon, Apr 4, 2011 at 1:41 AM, Aleksandar Prokopec <aleksandar [dot] prokopec [at] gmail [dot] com> wrote:

> It isn't taking me long trying .par in the repl to see all kinds of
> crazy stuff.  The main thing is that foreach seems to return things in a
> random order and iterator doesn't.  That doesn't sound like something
> intentional, but I try not to assume too much.  But it is likely at the
> heart of what you are seeing too.
>

We used to have `pforeach` being implemented in parallel, and
`foreach` sequential, but this was changed recently.
I'm still not sure if that was the best thing to do, but, yes, it was
intentional.

> scala> def fx = (1 to 5).par
> fx: scala.collection.parallel.immutable.ParRange
>
> scala> fx foreach println
> 1
> 3
> 4
> 2
> 5
>
> scala> fx foreach println
> 2
> 3
> 1
> 5
> 4
>
> scala> fx.iterator foreach println
> 1
> 2
> 3
> 4
> 5
>
> scala> fx.iterator foreach println
> 1
> 2
> 3
> 4
> 5

Re: REPL: null-values in parallel ranges?

Well, I guess it is intentional. This has rather severe consequences
for something like a pre-existing method which is not written expecting
that things implementing Seq[T] may now use a random order for foreach.
Maybe I was naive all this time thinking that the reproducible quality
of foreach on a sequence was something I could count on, and I should
have been calling "iterator" all the time. Either way, there's a lot of
code out there.

Re: REPL: null-values in parallel ranges?



On Fri, Apr 1, 2011 at 12:11 AM, Paul Phillips <paulp [at] improving [dot] org> wrote:
Well, I guess it is intentional.  This has rather severe consequences for something like a pre-existing method which is not written expecting that things implementing Seq[T] may now use a random order for foreach.  Maybe I was naive all this time thinking that the reproducible quality of foreach on a sequence was something I could count on, and I should have been calling "iterator" all the time.  Either way, there's a lot of code out there.

True. Parallel collections are much more of a game changer than people realize so far. In particular, we will have to introduce quite a few seq's to make things work as before. I agree that documentation needs to be updated to take this into account. We were thinking for a while on introducing new operations for parallel operations and leaving foreach sequential, but in the end decided not to. If you write

for  { x <- pxs } ...

where pxs is a parallel collection you expect the loop to be parallel.
So foreach needs to be parallel.

Let's face it: We will not get the genefits of automatic multi-core execution by continuing in our old ways.  The Fortress language makes every collection parallel by default. Scala is not quite as radical, but I believe it's ultimately correct not to sacrifice convenience for the parallel case just so that naive sequential code continues to work unchanged.

But yes, it is a worry that we'll have to think about this issue now everywhere. Hopefully, it will translate into a much stronger push to be fully functional.

Cheers

 -- Martin

Re: REPL: null-values in parallel ranges?

On 4/2/11 1:00 PM, martin odersky wrote:
> But yes, it is a worry that we'll have to think about this issue now
> everywhere. Hopefully, it will translate into a much stronger push to be
> fully functional.

For the record I don't mind worrying about it everywhere from now on.
I'm all for it. I just don't want to spend the rest of my life fixing
bugs in pre-existing code.

> If you write
>
> for { x <- pxs } ...
>
> where pxs is a parallel collection you expect the loop to be parallel.
> So foreach needs to be parallel.

That tells me the foreach at the base of the parallel collections has to
be parallel, but not that the foreach we already have has to be. If
that's the main motivation then we can re-root the parallel collections
around a parallel foreach. I already pushed foreach backward for
TraversableOnce, I'd be glad to push it sideways for this one.

There's another interesting tie-in here, which I already had on my
agenda for a while, ever since looking too closely at slice. There is a
real issue with having the same trait provide default implementations
for both mutable and immutable collections. The default implementation
is either outright wrong or grossly inefficient half the time. So the
side which was not chosen as the favorite side has to be sure every
method implementation is overridden (and many bugs have been the result
of not achieving this.) But what is the point of having default
implementations if they are menaces which must be guarded against half
the time?

Example. What is the default implementation of tail? There are pretty
much two choices.

// if this is a mutable seq, you're now sharing mutable cells
def tail = pointer to the second element of this collection
// if this is an immutable seq, you're now copying a billion
// elements for no reason
def tail = copy this entire collection except the first

For almost every method in TraversableLike this same issue arises. (The
methods where one can safely use the same implementation are generally
folds and other chew-up-the-sequence methods, and those are in
TraversableOnce.) The conclusion I drew is that the mutable and
immutable collections should not share any code past TraversableOnce.
They should (according to this theory) share a pure interface in
TraversableLike and each have their own default implementations.

So back to the parallel foreach problem: same thing.

trait MutableTraversableLike extends TraversableMethods
trait ImmutableTraversableLike extends TraversableMethods
trait ParallelTraversableLike extends TraversableMethods

(I won't speculate on whether it should really be
ParallelImmutableTraversableLike and so on or how these might compose
unless I see a glimmer of hope we could go this way.)

Re: REPL: null-values in parallel ranges?

On 03/04/2011 04:49, Paul Phillips wrote:
>
> For the record I don't mind worrying about it everywhere from now on.
> I'm all for it. I just don't want to spend the rest of my life fixing
> bugs in pre-existing code.

I'm all with that. Without the certainty that using parallel collection
won't break existing code, I just won't take the chance. And if other
take it with my code, I see an interesting times coming.

Moreover I think that (at least for me), preservation of order of things
in Seq is not something I test really often... When I want unordered
collection, I try to make it explicit with Sets.

Most of the time, it's not a matter, but you never know where side
effects will come from...

[...]
> That tells me the foreach at the base of the parallel collections has to
> be parallel, but not that the foreach we already have has to be. If
> that's the main motivation then we can re-root the parallel collections
> around a parallel foreach. I already pushed foreach backward for
> TraversableOnce, I'd be glad to push it sideways for this one.
>
> There's another interesting tie-in here, which I already had on my
> agenda for a while, ever since looking too closely at slice. There is a
> real issue with having the same trait provide default implementations
> for both mutable and immutable collections.
[...]

I'm all with you here (for what it matters...).
With slices, you showed that even in a piece of code like the Scala
standard library, developed, looked and tested by lots of competent
people, that strategy leads to big errors - it's just too hard to keep
in mind all the implicit requirements and contracts of a complex code.

Cheers,

Re: REPL: null-values in parallel ranges?

+1 (I knew you could say it so much more convincingly)

While the old types continue to keep their behavior, the "default" Map etc. could change to parallel, making for a pretty compile time error when passing them to legacy code. Problem solved.

On Apr 3, 2011, at 4:49, Paul Phillips wrote:

> On 4/2/11 1:00 PM, martin odersky wrote:
>> But yes, it is a worry that we'll have to think about this issue now
>> everywhere. Hopefully, it will translate into a much stronger push to be
>> fully functional.
>
> For the record I don't mind worrying about it everywhere from now on. I'm all for it. I just don't want to spend the rest of my life fixing bugs in pre-existing code.
>
>> If you write
>>
>> for { x <- pxs } ...
>>
>> where pxs is a parallel collection you expect the loop to be parallel.
>> So foreach needs to be parallel.
>
> That tells me the foreach at the base of the parallel collections has to be parallel, but not that the foreach we already have has to be. If that's the main motivation then we can re-root the parallel collections around a parallel foreach. I already pushed foreach backward for TraversableOnce, I'd be glad to push it sideways for this one.
>
> There's another interesting tie-in here, which I already had on my agenda for a while, ever since looking too closely at slice. There is a real issue with having the same trait provide default implementations for both mutable and immutable collections. The default implementation is either outright wrong or grossly inefficient half the time. So the side which was not chosen as the favorite side has to be sure every method implementation is overridden (and many bugs have been the result of not achieving this.) But what is the point of having default implementations if they are menaces which must be guarded against half the time?
>
> Example. What is the default implementation of tail? There are pretty much two choices.
>
> // if this is a mutable seq, you're now sharing mutable cells
> def tail = pointer to the second element of this collection
> // if this is an immutable seq, you're now copying a billion
> // elements for no reason
> def tail = copy this entire collection except the first
>
> For almost every method in TraversableLike this same issue arises. (The methods where one can safely use the same implementation are generally folds and other chew-up-the-sequence methods, and those are in TraversableOnce.) The conclusion I drew is that the mutable and immutable collections should not share any code past TraversableOnce. They should (according to this theory) share a pure interface in TraversableLike and each have their own default implementations.
>
> So back to the parallel foreach problem: same thing.
>
> trait MutableTraversableLike extends TraversableMethods
> trait ImmutableTraversableLike extends TraversableMethods
> trait ParallelTraversableLike extends TraversableMethods
>
> (I won't speculate on whether it should really be ParallelImmutableTraversableLike and so on or how these might compose unless I see a glimmer of hope we could go this way.)
>

Re: REPL: null-values in parallel ranges?

Hi Paul,

I'm not sure how to implement it, but I wonder if the following
behavior would make sense: have for expression rewriting rules for
foreach take into account whether the generator mixes in the Parallel
marker trait.

scala> import scala.collection.Parallel
import scala.collection.Parallel

scala> val ls = List(1, 2, 3, 4)
ls: List[Int] = List(1, 2, 3, 4)

scala> ls.isInstanceOf[Parallel]
res3: Boolean = false

scala> val pls = ls.par
pls: scala.collection.parallel.immutable.ParSeq[Int] = ParVector(1, 2, 3, 4)

scala> pls.isInstanceOf[Parallel]
res5: Boolean = true

If I do this:

for (e <- ls) println(e)

This would somehow cause a an in-order execution, because the type of
ls doesn't mix in Parallel.

But if I do this:

for (e <- pls) println(e)

It would happen in Parallel, because the type of generator pls mixes
in Parallel. That way if people have a method that takes a Seq[T],
say, and uses that in a for expression, then because Seq[T] isn't a
subtype of Parallel, it would do it in order (somehow). That means all
old code should not break.

def oldMethod(seq: Seq[String]) { for (e <- seq) println(e) }

But new code that really wants parallel execution could just do .par
on that Seq[T], which would somehow kick it into parallel execution.

def newMethod(pSeq: Seq[String] with Parallel) { for (e <- pSeq) println(e) }

By saying the method takes a Seq[String] with Parallel, it is
announcing that it might do foreaches in parallel.

I think it would be fine to let flatMap, filter, withFilter, map be
rewritten in the same way, because most times people wouldn't have
side effects in those. But foreach is all about the side effects.

Bill

On Sat, Apr 2, 2011 at 7:49 PM, Paul Phillips wrote:
> On 4/2/11 1:00 PM, martin odersky wrote:
>>
>> But yes, it is a worry that we'll have to think about this issue now
>> everywhere. Hopefully, it will translate into a much stronger push to be
>> fully functional.
>
> For the record I don't mind worrying about it everywhere from now on. I'm
> all for it.  I just don't want to spend the rest of my life fixing bugs in
> pre-existing code.
>
>> If you write
>>
>> for  { x <- pxs } ...
>>
>> where pxs is a parallel collection you expect the loop to be parallel.
>> So foreach needs to be parallel.
>
> That tells me the foreach at the base of the parallel collections has to be
> parallel, but not that the foreach we already have has to be.  If that's the
> main motivation then we can re-root the parallel collections around a
> parallel foreach.  I already pushed foreach backward for TraversableOnce,
> I'd be glad to push it sideways for this one.
>
> There's another interesting tie-in here, which I already had on my agenda
> for a while, ever since looking too closely at slice.  There is a real issue
> with having the same trait provide default implementations for both mutable
> and immutable collections.  The default implementation is either outright
> wrong or grossly inefficient half the time.  So the side which was not
> chosen as the favorite side has to be sure every method implementation is
> overridden (and many bugs have been the result of not achieving this.) But
> what is the point of having default implementations if they are menaces
> which must be guarded against half the time?
>
> Example.  What is the default implementation of tail? There are pretty much
> two choices.
>
>  // if this is a mutable seq, you're now sharing mutable cells
>  def tail = pointer to the second element of this collection
>  // if this is an immutable seq, you're now copying a billion
>  // elements for no reason
>  def tail = copy this entire collection except the first
>
> For almost every method in TraversableLike this same issue arises.  (The
> methods where one can safely use the same implementation are generally folds
> and other chew-up-the-sequence methods, and those are in TraversableOnce.)
> The conclusion I drew is that the mutable and immutable collections should
> not share any code past TraversableOnce. They should (according to this
> theory) share a pure interface in TraversableLike and each have their own
> default implementations.
>
> So back to the parallel foreach problem: same thing.
>
>  trait MutableTraversableLike extends TraversableMethods
>  trait ImmutableTraversableLike extends TraversableMethods
>  trait ParallelTraversableLike extends TraversableMethods
>
> (I won't speculate on whether it should really be
> ParallelImmutableTraversableLike and so on or how these might compose unless
> I see a glimmer of hope we could go this way.)
>
>

Re: REPL: null-values in parallel ranges?

Hi Paul,

Actually I said that wrongly. I meant that if I really want to do a
parallel foreach on a Seq type that doesn't extend Parallel, I could
write it like this:

for (e <- ls.par) {
// some thread-safe side-effecting code
}

The example I showed was a method that took a Parallel Seq already.

Bill

On Sat, Apr 2, 2011 at 8:03 PM, Bill Venners wrote:
> Hi Paul,
>
> I'm not sure how to implement it, but I wonder if the following
> behavior would make sense: have for expression rewriting rules for
> foreach take into account whether the generator mixes in the Parallel
> marker trait.
>
> scala> import scala.collection.Parallel
> import scala.collection.Parallel
>
> scala> val ls = List(1, 2, 3, 4)
> ls: List[Int] = List(1, 2, 3, 4)
>
> scala> ls.isInstanceOf[Parallel]
> res3: Boolean = false
>
> scala> val pls = ls.par
> pls: scala.collection.parallel.immutable.ParSeq[Int] = ParVector(1, 2, 3, 4)
>
> scala> pls.isInstanceOf[Parallel]
> res5: Boolean = true
>
> If I do this:
>
> for (e <- ls) println(e)
>
> This would somehow cause a an in-order execution, because the type of
> ls doesn't mix in Parallel.
>
> But if I do this:
>
> for (e <- pls) println(e)
>
> It would happen in Parallel, because the type of generator pls mixes
> in Parallel. That way if people have a method that takes a Seq[T],
> say, and uses that in a for expression, then because Seq[T] isn't a
> subtype of Parallel, it would do it in order (somehow). That means all
> old code should not break.
>
> def oldMethod(seq: Seq[String]) { for (e <- seq) println(e) }
>
> But new code that really wants parallel execution could just do .par
> on that Seq[T], which would somehow kick it into parallel execution.
>
> def newMethod(pSeq: Seq[String] with Parallel) { for (e <- pSeq) println(e) }
>
> By saying the method takes a Seq[String] with Parallel, it is
> announcing that it might do foreaches in parallel.
>
> I think it would be fine to let flatMap, filter, withFilter, map be
> rewritten in the same way, because most times people wouldn't have
> side effects in those. But foreach is all about the side effects.
>
> Bill
>
> On Sat, Apr 2, 2011 at 7:49 PM, Paul Phillips wrote:
>> On 4/2/11 1:00 PM, martin odersky wrote:
>>>
>>> But yes, it is a worry that we'll have to think about this issue now
>>> everywhere. Hopefully, it will translate into a much stronger push to be
>>> fully functional.
>>
>> For the record I don't mind worrying about it everywhere from now on. I'm
>> all for it.  I just don't want to spend the rest of my life fixing bugs in
>> pre-existing code.
>>
>>> If you write
>>>
>>> for  { x <- pxs } ...
>>>
>>> where pxs is a parallel collection you expect the loop to be parallel.
>>> So foreach needs to be parallel.
>>
>> That tells me the foreach at the base of the parallel collections has to be
>> parallel, but not that the foreach we already have has to be.  If that's the
>> main motivation then we can re-root the parallel collections around a
>> parallel foreach.  I already pushed foreach backward for TraversableOnce,
>> I'd be glad to push it sideways for this one.
>>
>> There's another interesting tie-in here, which I already had on my agenda
>> for a while, ever since looking too closely at slice.  There is a real issue
>> with having the same trait provide default implementations for both mutable
>> and immutable collections.  The default implementation is either outright
>> wrong or grossly inefficient half the time.  So the side which was not
>> chosen as the favorite side has to be sure every method implementation is
>> overridden (and many bugs have been the result of not achieving this.) But
>> what is the point of having default implementations if they are menaces
>> which must be guarded against half the time?
>>
>> Example.  What is the default implementation of tail? There are pretty much
>> two choices.
>>
>>  // if this is a mutable seq, you're now sharing mutable cells
>>  def tail = pointer to the second element of this collection
>>  // if this is an immutable seq, you're now copying a billion
>>  // elements for no reason
>>  def tail = copy this entire collection except the first
>>
>> For almost every method in TraversableLike this same issue arises.  (The
>> methods where one can safely use the same implementation are generally folds
>> and other chew-up-the-sequence methods, and those are in TraversableOnce.)
>> The conclusion I drew is that the mutable and immutable collections should
>> not share any code past TraversableOnce. They should (according to this
>> theory) share a pure interface in TraversableLike and each have their own
>> default implementations.
>>
>> So back to the parallel foreach problem: same thing.
>>
>>  trait MutableTraversableLike extends TraversableMethods
>>  trait ImmutableTraversableLike extends TraversableMethods
>>  trait ParallelTraversableLike extends TraversableMethods
>>
>> (I won't speculate on whether it should really be
>> ParallelImmutableTraversableLike and so on or how these might compose unless
>> I see a glimmer of hope we could go this way.)
>>
>>
>
>
>
> --
> Bill Venners
> Artima, Inc.
> http://www.artima.com
>

Re: REPL: null-values in parallel ranges?

Hi Again,

In thinking further, I think the way this would have to work is the
extra method idea that was proposed earlier, and Martin said they
considered but rejected. The approach would be:

Tighten the foreach contract to require in-order execution.

Add a pareach method or something that does foreach but in a parallel
way. Maybe this is in trait Parallel so all Parallel things have to
implement it. And this is the method a for expression would translate
to invocations of if the generator type is Parallel, not invocations
of foreach.

Bill

On Sat, Apr 2, 2011 at 8:07 PM, Bill Venners wrote:
> Hi Paul,
>
> Actually I said that wrongly. I meant that if I really want to do a
> parallel foreach on a Seq type that doesn't extend Parallel, I could
> write it like this:
>
> for (e <- ls.par) {
>  // some thread-safe side-effecting code
> }
>
> The example I showed was a method that took a Parallel Seq already.
>
> Bill
>
> On Sat, Apr 2, 2011 at 8:03 PM, Bill Venners wrote:
>> Hi Paul,
>>
>> I'm not sure how to implement it, but I wonder if the following
>> behavior would make sense: have for expression rewriting rules for
>> foreach take into account whether the generator mixes in the Parallel
>> marker trait.
>>
>> scala> import scala.collection.Parallel
>> import scala.collection.Parallel
>>
>> scala> val ls = List(1, 2, 3, 4)
>> ls: List[Int] = List(1, 2, 3, 4)
>>
>> scala> ls.isInstanceOf[Parallel]
>> res3: Boolean = false
>>
>> scala> val pls = ls.par
>> pls: scala.collection.parallel.immutable.ParSeq[Int] = ParVector(1, 2, 3, 4)
>>
>> scala> pls.isInstanceOf[Parallel]
>> res5: Boolean = true
>>
>> If I do this:
>>
>> for (e <- ls) println(e)
>>
>> This would somehow cause a an in-order execution, because the type of
>> ls doesn't mix in Parallel.
>>
>> But if I do this:
>>
>> for (e <- pls) println(e)
>>
>> It would happen in Parallel, because the type of generator pls mixes
>> in Parallel. That way if people have a method that takes a Seq[T],
>> say, and uses that in a for expression, then because Seq[T] isn't a
>> subtype of Parallel, it would do it in order (somehow). That means all
>> old code should not break.
>>
>> def oldMethod(seq: Seq[String]) { for (e <- seq) println(e) }
>>
>> But new code that really wants parallel execution could just do .par
>> on that Seq[T], which would somehow kick it into parallel execution.
>>
>> def newMethod(pSeq: Seq[String] with Parallel) { for (e <- pSeq) println(e) }
>>
>> By saying the method takes a Seq[String] with Parallel, it is
>> announcing that it might do foreaches in parallel.
>>
>> I think it would be fine to let flatMap, filter, withFilter, map be
>> rewritten in the same way, because most times people wouldn't have
>> side effects in those. But foreach is all about the side effects.
>>
>> Bill
>>
>> On Sat, Apr 2, 2011 at 7:49 PM, Paul Phillips wrote:
>>> On 4/2/11 1:00 PM, martin odersky wrote:
>>>>
>>>> But yes, it is a worry that we'll have to think about this issue now
>>>> everywhere. Hopefully, it will translate into a much stronger push to be
>>>> fully functional.
>>>
>>> For the record I don't mind worrying about it everywhere from now on. I'm
>>> all for it.  I just don't want to spend the rest of my life fixing bugs in
>>> pre-existing code.
>>>
>>>> If you write
>>>>
>>>> for  { x <- pxs } ...
>>>>
>>>> where pxs is a parallel collection you expect the loop to be parallel.
>>>> So foreach needs to be parallel.
>>>
>>> That tells me the foreach at the base of the parallel collections has to be
>>> parallel, but not that the foreach we already have has to be.  If that's the
>>> main motivation then we can re-root the parallel collections around a
>>> parallel foreach.  I already pushed foreach backward for TraversableOnce,
>>> I'd be glad to push it sideways for this one.
>>>
>>> There's another interesting tie-in here, which I already had on my agenda
>>> for a while, ever since looking too closely at slice.  There is a real issue
>>> with having the same trait provide default implementations for both mutable
>>> and immutable collections.  The default implementation is either outright
>>> wrong or grossly inefficient half the time.  So the side which was not
>>> chosen as the favorite side has to be sure every method implementation is
>>> overridden (and many bugs have been the result of not achieving this.) But
>>> what is the point of having default implementations if they are menaces
>>> which must be guarded against half the time?
>>>
>>> Example.  What is the default implementation of tail? There are pretty much
>>> two choices.
>>>
>>>  // if this is a mutable seq, you're now sharing mutable cells
>>>  def tail = pointer to the second element of this collection
>>>  // if this is an immutable seq, you're now copying a billion
>>>  // elements for no reason
>>>  def tail = copy this entire collection except the first
>>>
>>> For almost every method in TraversableLike this same issue arises.  (The
>>> methods where one can safely use the same implementation are generally folds
>>> and other chew-up-the-sequence methods, and those are in TraversableOnce.)
>>> The conclusion I drew is that the mutable and immutable collections should
>>> not share any code past TraversableOnce. They should (according to this
>>> theory) share a pure interface in TraversableLike and each have their own
>>> default implementations.
>>>
>>> So back to the parallel foreach problem: same thing.
>>>
>>>  trait MutableTraversableLike extends TraversableMethods
>>>  trait ImmutableTraversableLike extends TraversableMethods
>>>  trait ParallelTraversableLike extends TraversableMethods
>>>
>>> (I won't speculate on whether it should really be
>>> ParallelImmutableTraversableLike and so on or how these might compose unless
>>> I see a glimmer of hope we could go this way.)
>>>
>>>
>>
>>
>>
>> --
>> Bill Venners
>> Artima, Inc.
>> http://www.artima.com
>>
>
>
>
> --
> Bill Venners
> Artima, Inc.
> http://www.artima.com
>

Re: REPL: null-values in parallel ranges?

+1

On Sat, Apr 2, 2011 at 10:49 PM, Paul Phillips <paulp [at] improving [dot] org> wrote:
On 4/2/11 1:00 PM, martin odersky wrote:
But yes, it is a worry that we'll have to think about this issue now
everywhere. Hopefully, it will translate into a much stronger push to be
fully functional.

For the record I don't mind worrying about it everywhere from now on. I'm all for it.  I just don't want to spend the rest of my life fixing bugs in pre-existing code.

If you write

for  { x <- pxs } ...

where pxs is a parallel collection you expect the loop to be parallel.
So foreach needs to be parallel.

That tells me the foreach at the base of the parallel collections has to be parallel, but not that the foreach we already have has to be.  If that's the main motivation then we can re-root the parallel collections around a parallel foreach.  I already pushed foreach backward for TraversableOnce, I'd be glad to push it sideways for this one.

There's another interesting tie-in here, which I already had on my agenda for a while, ever since looking too closely at slice.  There is a real issue with having the same trait provide default implementations for both mutable and immutable collections.  The default implementation is either outright wrong or grossly inefficient half the time.  So the side which was not chosen as the favorite side has to be sure every method implementation is overridden (and many bugs have been the result of not achieving this.) But what is the point of having default implementations if they are menaces which must be guarded against half the time?

Example.  What is the default implementation of tail? There are pretty much two choices.

 // if this is a mutable seq, you're now sharing mutable cells
 def tail = pointer to the second element of this collection
 // if this is an immutable seq, you're now copying a billion
 // elements for no reason
 def tail = copy this entire collection except the first

For almost every method in TraversableLike this same issue arises.  (The methods where one can safely use the same implementation are generally folds and other chew-up-the-sequence methods, and those are in TraversableOnce.) The conclusion I drew is that the mutable and immutable collections should not share any code past TraversableOnce. They should (according to this theory) share a pure interface in TraversableLike and each have their own default implementations.

So back to the parallel foreach problem: same thing.

 trait MutableTraversableLike extends TraversableMethods
 trait ImmutableTraversableLike extends TraversableMethods
 trait ParallelTraversableLike extends TraversableMethods

(I won't speculate on whether it should really be ParallelImmutableTraversableLike and so on or how these might compose unless I see a glimmer of hope we could go this way.)


Re: REPL: null-values in parallel ranges?

On Sat, Apr 2, 2011 at 4:00 PM, martin odersky <martin [dot] odersky [at] epfl [dot] ch> wrote:

On Fri, Apr 1, 2011 at 12:11 AM, Paul Phillips <paulp [at] improving [dot] org> wrote:
Well, I guess it is intentional.  This has rather severe consequences for something like a pre-existing method which is not written expecting that things implementing Seq[T] may now use a random order for foreach.  Maybe I was naive all this time thinking that the reproducible quality of foreach on a sequence was something I could count on, and I should have been calling "iterator" all the time.  Either way, there's a lot of code out there.

True. Parallel collections are much more of a game changer than people realize so far. In particular, we will have to introduce quite a few seq's to make things work as before. I agree that documentation needs to be updated to take this into account. We were thinking for a while on introducing new operations for parallel operations and leaving foreach sequential, but in the end decided not to.

That's unfortunate, because it makes it more difficult to supply parameters regarding the parallelism that you want.  You probably don't want all parallel operations to use the same number of processors if your machine might be doing something else than executing your Scala code, or even if your Scala program might be running several different operation simultaneously on a machine with many cores.  I guess you could make it an argument to .par.
 
If you write

for  { x <- pxs } ...

where pxs is a parallel collection you expect the loop to be parallel.

Agreed, if you know that pxs is a parallel collection.

But if, two years ago, I wrote
  for {x <- xs } ...
and stuck it in a library, I'm pretty sure I was _not_ expecting it to be parallel.

So you cannot possibly trust that code--unless you've inspected it yourself--to work sensibly in a parallel context.  It might break horribly and obviously every time, or it might break subtly on rare occasions, or it might not break at all.  The point is that you have no way to know which it is without looking at every use of the offending operations.

If you don't perform that extensive code review, the burden is _entirely on you_ as the user of the library to remember every single time that you use it that you'd better .seq whatever you pass in or Bad Things might happen.

This is a bad design, unless you're going to rewrite all my old libraries for me.
 
So foreach needs to be parallel.

foreach needs to be _able to be_ parallel, but it need not always have the option of being parallel, if used in the context of sequential collections.
 
Let's face it: We will not get the genefits of automatic multi-core execution by continuing in our old ways.

Agreed.  This is the reason why throwing parallel collections automatically into code written assuming sequential collections is a bad idea, and the library should discourage it.
 
The Fortress language makes every collection parallel by default. Scala is not quite as radical

It, as designed now, basically as radical.  Every base collection is now parallel by default.  If you use standard garden-variety Seq, Iterable, Map, etc., you are required to assume that it is parallel or your code will randomly break when reused.  Unlike with Fortress, however, this new reality is partially disguised--it looks like parallel collections are an optional add-on, not something that must be treated in every case.

If you want, say, a mutable sequence that is sequential, you cannot use collection.mutable.Seq.  You have to decide which implementation you want, and use, say, collection.mutable.ArrayBuffer.
 
, but I believe it's ultimately correct not to sacrifice convenience for the parallel case just so that naive sequential code continues to work unchanged. 

But yes, it is a worry that we'll have to think about this issue now everywhere. Hopefully, it will translate into a much stronger push to be fully functional.

Unfortunately, the library is not nearly powerful enough for side-effect-free code to perform acceptably in very many instances.

For example, Scala's collections have no "filterIndex" method--something which returns not the elements but the indices of the elements.  You might

    xs.zipWithIndex.filter(xi => p(xi._1)).map(_._2)

or you might

    { var i = -1; xs.view.filter(x => {i += 1; p(x) }).map(_ => i).force }

or perhaps

    { var i = 0; val b = xs.genericBuilder[Int]; xs.foreach(x => { if (p(x)) { b += i }; i += 1 }); b.result }

Unfortunately, the second is twice as fast for relatively simple p(x) (like a test of a string length), and the third is three to four times as fast.  So, what do I want to do?  Throw two to four cores at it with the slightly nicer syntax, or be able to write it to use fewer cores?  I'm pretty sure that I'd at least like to be able to _choose_.

So for two very important reasons, namely correctness of running of old code, and performance of new code using parallel-unsafe operations, I think that the collections library should exist in the hierarchy in such a way that you can be sure you're not using parallel collections in any operations where it's easy and useful access shared mutable state or have side effects.

That is,

scala.collection.scalable.Seq  // Use this if parallel collections are ok
  scala.collection.Seq         // We're sure it's sequential
  scala.collection.parallel.ParSeq  // We're sure it's parallel

Or, at the very least,

scala.collection.Seq  // Must assume that it may be parallel!
  scala.collection.sequential.Seq   // Sure to be sequential
  scala.collection.parallel.Seq     // Sure to be parallel

with the latter done in such a way that an import collection.sequential._ will yield the old behavior in all cases.

I think it's better still to use different method names, but I agree that the for case is non-ideal, since although it could use the parallel version if the collection was typed appropriately, it would be too wasteful to go searching for it at runtime.

  --Rex

Re: REPL: null-values in parallel ranges?

On Sat, Apr 2, 2011 at 2:39 PM, Rex Kerr <ichoran [at] gmail [dot] com> wrote:
But if, two years ago, I wrote   for {x <- xs } ...
and stuck it in a library, I'm pretty sure I was _not_ expecting it to be parallel.

So you cannot possibly trust that code--unless you've inspected it yourself--to work sensibly in a parallel context.  It might break horribly and obviously every time, or it might break subtly on rare occasions, or it might not break at all.  The point is that you have no way to know which it is without looking at every use of the offending operations.

This is The Big One as far as I'm concerned.  We may have learned not to care too much about binary compatibility, but we should still care about backward compatibility in its other, easier form.  Grizzled, battle-scarred APIs are sitting out there in the field expecting that when someone sends them a 2003-vintage bottle of Seq it won't explode into a million pieces when you uncork it.
Toning down the metaphor a bit... It's really, really easy to come up with examples of useful, idiomatic methods that accept collections, foreach over them, and expect that actions will be taken in the order of the elements.  I don't think there's much Scala code in the wild that uses iterator instead of foreach specifically to avoid this kind of implementation detail--leaving aside the question of whether that particular distinction is a *good* signal of one's desire for sequentiality.
-0xe1a

Re: REPL: null-values in parallel ranges?

I looked around trunk a little bit for stuff which is broken because of
this. Here are the first couple examples I found.

// more "dynamic" hashcodes
scala> val x = (1 to 10).par
x: scala.collection.parallel.immutable.ParRange = ParRange(1, 2, 3, 4,
5, 6, 7, 8, 9, 10)

scala> x.hashCode
res0: Int = 1137090302

scala> x.hashCode
res1: Int = -555212555

scala> x.hashCode
res2: Int = -1216588130

// grab your partner, do-sa-do
scala> (1 to 10 par, 1 to 10).zipped foreach ((x, y) => println(x + " "
+ y))
3 2
6 3
2 4
7 6
1 1
8 7
4 5
5 9
9 8
10 10

Re: REPL: null-values in parallel ranges?

that is convincing that something has to be fixed. the question is what.
making foreach sequencial pretty much makes any parallel collection
sequencial again because many methods are based on foreach.
creating a branched (singlethreadtravlike, parthreadtravlike) hierachy
would make it impossible to use parallel collections where the api
designer wanted to use single threaded ones but on the other hand
sacrifice compatibility with old code. the old collections would all be
single threaded ones, and in old code you could not use parallel ones,
even if you knew it would work.

the solution could be something like:
some collection methods could be annotated with @OnlySeq (like "zipped"
in the broken example), which would add a compile time conversion and a
runtime assertion. in the case of zipped, it's clear you want to keep
the order of both ranges so you annotate the method with @OnlySeq. the
compiler knows using parallel ranges makes no sense here -> compiler
warning and/or compiler inserted conversion to seq if you do it anyway.
if the collection is not explicitly declared as parallel, the compiler
cannot warn -> runtime type assertion

still , you'd have to be careful, but everyone knows that.

Am 03.04.2011 02:12, schrieb Paul Phillips:
> I looked around trunk a little bit for stuff which is broken because
> of this. Here are the first couple examples I found.
>
> // more "dynamic" hashcodes
> scala> val x = (1 to 10).par
> x: scala.collection.parallel.immutable.ParRange = ParRange(1, 2, 3, 4,
> 5, 6, 7, 8, 9, 10)
>
> scala> x.hashCode
> res0: Int = 1137090302
>
> scala> x.hashCode
> res1: Int = -555212555
>
> scala> x.hashCode
> res2: Int = -1216588130
>
>
> // grab your partner, do-sa-do
> scala> (1 to 10 par, 1 to 10).zipped foreach ((x, y) => println(x + "
> " + y))
> 3 2
> 6 3
> 2 4
> 7 6
> 1 1
> 8 7
> 4 5
> 5 9
> 9 8
> 10 10
>
>

Re: REPL: null-values in parallel ranges?

Hi All,

On Sat, Apr 2, 2011 at 11:55 PM, HamsterofDeath wrote:
> that is convincing that something has to be fixed. the question is what.
> making foreach sequencial pretty much makes any parallel collection
> sequencial again because many methods are based on foreach.
>
I think that's OK, because those implementations can be changed so
they don't rely on foreach, either prior to 2.9.0.final or after. The
penalty before that is they wouldn't be as fast as the parallel
foreach versions, but they'd be as fast as they are now in 2.8. I can
live with that.

I slept on this and fleshed this proposal out a bit more in my mind.
Traversable is at the top of the hierarchy, and its abstract method is
foreach. So there's nothing to define the order other than foreach
itself. So there I think it's contract would need to stay the same:

Applies a function f to all elements of this collection.

Or, state that it will run sequentially not in parallel, but without
specifying any order:

Applies a function f to all elements of this collection sequentially
in an undefined order (but not in parallel).

Iterable however adds the iterator method, and that defines an order.
The contract of foreach in both Iterator could be changed from:

Applies a function f to all values produced by this iterator.

to:

Applies a function f to all values produced by this iterator
sequentially in the order returned by repeated invocations of the next
method.

And the contract of Iterable's foreach could be changed from:

Applies a function f to all elements of this iterable collection.

to:

Applies a function f to all elements of this iterable collection
sequentially in the order returned by repeated invocations of the next
method on the iterator returned by the iterator method.

Add a foreachParallel method to Iterator that has the contract:

Applies a function f to all elements of this collection in parallel.

And add a foreachParallel method to Iterable that has the contract:

Applies a function f to all elements of this iterable collection in parallel.

The rules for for expression translation could be left alone, at least
for now. What that would mean is that anytime you don't have a yield,
which would result in a foreach called, you'd get sequential behavior,
at least for the foreach part. That again could mean some things might
run more slowly than if they were parallelized, but I'm OK with that.
They'd run as fast as they are running now. And if someone wants to
really parallelize a side effecting loop, they could just invoke
foreachParallel passing in a function rather than using a for
expression.

I don't have a problem with map, flatMap, filter, withFilter, and any
other "data transformation" style higher order functions on
collections doing things sequentially or in parallel depending on the
type of the object. Most likely some people have written code that
does a map, for example, and passes a transformation function to it
that has a side effect. That code will break and I think we should
allow that to happen. It should be very little code I'd guess, because
people usually just pass data transformers to methods like map. And it
is worth it.

But foreach is not about data transformation. It's about side effects.
Because it doesn't have a result, if it doesn't have a side effect it
has no impact on the program other than heating up the CPU. And to do
that kind of loop in parallel means you have to write thread safe
code. No one has been doing that, and it would be very hard to do in
the future. And it would be non-intuitive because this looks like
sequential-in-order for loops in any other language anyone would be
coming from:

for (arg <- args)
println(arg)

I think this really ought to keep working like it always has, printing
out args in iterator order, no matter the type of args. Yesterday I
was asking how I would get sequential behavior out of a for
expression. Well the above bit of code should be how. That's how
everyone expects to get an in-order execution.

Narrowing the contracts of foreach in Traversable, Iterator, and
Iterable could break some code, but I doubt much. What it in effect
does is change it to the contract that's been in people's heads, which
they've been programming to. I think much more code could break, and
sometimes in hard -to-catch, subtle ways, if a collection you are
passed does foreach in parallel.

I don't know the details of how parallel collections are implemented,
but I'm hopeful that the places where data transformation methods are
implemented in terms of foreach, an invocation of foreachParallel
could easily be substituted. Then you'd get your same parallel
behavior back for those method that we have now in RC1.

What do folks think? Would that address the problem sufficiently?

Bill

> creating a branched (singlethreadtravlike, parthreadtravlike) hierachy
> would make it impossible to use parallel collections where the api
> designer wanted to use single threaded ones but on the other hand
> sacrifice compatibility with old code. the old collections would all be
> single threaded ones, and in old code you could not use parallel ones,
> even if you knew it would work.
>
> the solution could be something like:
> some collection methods could be annotated with @OnlySeq (like "zipped"
> in the broken example), which would add a compile time conversion and a
> runtime assertion. in the case of zipped, it's clear you want to keep
> the order of both ranges so you annotate the method with @OnlySeq. the
> compiler knows using parallel ranges makes no sense here -> compiler
> warning and/or compiler inserted conversion to seq if you do it anyway.
> if the collection is not explicitly declared as parallel, the compiler
> cannot warn -> runtime type assertion
>
> still , you'd have to be careful, but everyone knows that.
>
>
>
>
> Am 03.04.2011 02:12, schrieb Paul Phillips:
>> I looked around trunk a little bit for stuff which is broken because
>> of this.  Here are the first couple examples I found.
>>
>> // more "dynamic" hashcodes
>> scala> val x = (1 to 10).par
>> x: scala.collection.parallel.immutable.ParRange = ParRange(1, 2, 3, 4,
>> 5, 6, 7, 8, 9, 10)
>>
>> scala> x.hashCode
>> res0: Int = 1137090302
>>
>> scala> x.hashCode
>> res1: Int = -555212555
>>
>> scala> x.hashCode
>> res2: Int = -1216588130
>>
>>
>> // grab your partner, do-sa-do
>> scala> (1 to 10 par, 1 to 10).zipped foreach ((x, y) => println(x + "
>> " + y))
>> 3 2
>> 6 3
>> 2 4
>> 7 6
>> 1 1
>> 8 7
>> 4 5
>> 5 9
>> 9 8
>> 10 10
>>
>>
>
>

Re: REPL: null-values in parallel ranges?

Hi All,

Forgot to mention one thing. The scala.collection.Parallel trait would
declare the foreachParallel method, so that code can call foreach or
foreachParallel, depending on whether the collection is parallel one:

coll match {
case pColl: Parallel => pColl.foreachParallel(threadSafeF)
case _ => coll.foreach(threadSafeF)
}

So in side effecting code, it is explicit whether it is sequential or
parallel, but in data transformation code, you'd just say:

col.map(f)

And frankly, most of the time I think people would say:

col.foreach(f)

Because it is hard to write a thread safe f.

Bill

On Sun, Apr 3, 2011 at 8:20 AM, Bill Venners wrote:
> Hi All,
>
> On Sat, Apr 2, 2011 at 11:55 PM, HamsterofDeath wrote:
>> that is convincing that something has to be fixed. the question is what.
>> making foreach sequencial pretty much makes any parallel collection
>> sequencial again because many methods are based on foreach.
>>
> I think that's OK, because those implementations can be changed so
> they don't rely on foreach, either prior to 2.9.0.final or after. The
> penalty before that is they wouldn't be as fast as the parallel
> foreach versions, but they'd be as fast as they are now in 2.8. I can
> live with that.
>
> I slept on this and fleshed this proposal out a bit more in my mind.
> Traversable is at the top of the hierarchy, and its abstract method is
> foreach. So there's nothing to define the order other than foreach
> itself. So there I think it's contract would need to stay the same:
>
> Applies a function f to all elements of this collection.
>
> Or, state that it will run sequentially not in parallel, but without
> specifying any order:
>
> Applies a function f to all elements of this collection sequentially
> in an undefined order (but not in parallel).
>
> Iterable however adds the iterator method, and that defines an order.
> The contract of foreach in both Iterator could be changed from:
>
> Applies a function f to all values produced by this iterator.
>
> to:
>
> Applies a function f to all values produced by this iterator
> sequentially in the order returned by repeated invocations of the next
> method.
>
> And the contract of Iterable's foreach could be changed from:
>
> Applies a function f to all elements of this iterable collection.
>
> to:
>
> Applies a function f to all elements of this iterable collection
> sequentially in the order returned by repeated invocations of the next
> method on the iterator returned by the iterator method.
>
> Add a foreachParallel method to Iterator that has the contract:
>
> Applies a function f to all elements of this collection in parallel.
>
> And add a foreachParallel method to Iterable that has the contract:
>
> Applies a function f to all elements of this iterable collection in parallel.
>
> The rules for for expression translation could be left alone, at least
> for now. What that would mean is that anytime you don't have a yield,
> which would result in a foreach called, you'd get sequential behavior,
> at least for the foreach part. That again could mean some things might
> run more slowly than if they were parallelized, but I'm OK with that.
> They'd run as fast as they are running now. And if someone wants to
> really parallelize a side effecting loop, they could just invoke
> foreachParallel passing in a function rather than using a for
> expression.
>
> I don't have a problem with map, flatMap, filter, withFilter, and any
> other "data transformation" style higher order functions on
> collections doing things sequentially or in parallel depending on the
> type of the object. Most likely some people have written code that
> does a map, for example, and passes a transformation function to it
> that has a side effect. That code will break and I think we should
> allow that to happen. It should be very little code I'd guess, because
> people usually just pass data transformers to methods like map. And it
> is worth it.
>
> But foreach is not about data transformation. It's about side effects.
> Because it doesn't have a result, if it doesn't have a side effect it
> has no impact on the program other than heating up the CPU. And to do
> that kind of loop in parallel means you have to write thread safe
> code. No one has been doing that, and it would be very hard to do in
> the future. And it would be non-intuitive because this looks like
> sequential-in-order for loops in any other language anyone would be
> coming from:
>
> for (arg <- args)
>  println(arg)
>
> I think this really ought to keep working like it always has, printing
> out args in iterator order, no matter the type of args. Yesterday I
> was asking how I would get sequential behavior out of a for
> expression. Well the above bit of code should be how. That's how
> everyone expects to get an in-order execution.
>
> Narrowing the contracts of foreach in Traversable, Iterator, and
> Iterable could break some code, but I doubt much. What it in effect
> does is change it to the contract that's been in people's heads, which
> they've been programming to. I think much more code could break, and
> sometimes in hard -to-catch, subtle ways, if a collection you are
> passed does foreach in parallel.
>
> I don't know the details of how parallel collections are implemented,
> but I'm hopeful that the places where data transformation methods are
> implemented in terms of foreach, an invocation of foreachParallel
> could easily be substituted. Then you'd get your same parallel
> behavior back for those method that we have now in RC1.
>
> What do folks think? Would that address the problem sufficiently?
>
> Bill
>
>> creating a branched (singlethreadtravlike, parthreadtravlike) hierachy
>> would make it impossible to use parallel collections where the api
>> designer wanted to use single threaded ones but on the other hand
>> sacrifice compatibility with old code. the old collections would all be
>> single threaded ones, and in old code you could not use parallel ones,
>> even if you knew it would work.
>>
>> the solution could be something like:
>> some collection methods could be annotated with @OnlySeq (like "zipped"
>> in the broken example), which would add a compile time conversion and a
>> runtime assertion. in the case of zipped, it's clear you want to keep
>> the order of both ranges so you annotate the method with @OnlySeq. the
>> compiler knows using parallel ranges makes no sense here -> compiler
>> warning and/or compiler inserted conversion to seq if you do it anyway.
>> if the collection is not explicitly declared as parallel, the compiler
>> cannot warn -> runtime type assertion
>>
>> still , you'd have to be careful, but everyone knows that.
>>
>>
>>
>>
>> Am 03.04.2011 02:12, schrieb Paul Phillips:
>>> I looked around trunk a little bit for stuff which is broken because
>>> of this.  Here are the first couple examples I found.
>>>
>>> // more "dynamic" hashcodes
>>> scala> val x = (1 to 10).par
>>> x: scala.collection.parallel.immutable.ParRange = ParRange(1, 2, 3, 4,
>>> 5, 6, 7, 8, 9, 10)
>>>
>>> scala> x.hashCode
>>> res0: Int = 1137090302
>>>
>>> scala> x.hashCode
>>> res1: Int = -555212555
>>>
>>> scala> x.hashCode
>>> res2: Int = -1216588130
>>>
>>>
>>> // grab your partner, do-sa-do
>>> scala> (1 to 10 par, 1 to 10).zipped foreach ((x, y) => println(x + "
>>> " + y))
>>> 3 2
>>> 6 3
>>> 2 4
>>> 7 6
>>> 1 1
>>> 8 7
>>> 4 5
>>> 5 9
>>> 9 8
>>> 10 10
>>>
>>>
>>
>>
>
>
>
> --
> Bill Venners
> Artima, Inc.
> http://www.artima.com
>

Re: REPL: null-values in parallel ranges?

Hi,
I am glad we are discussing all this because I am expecting severe
trouble ahead with the current outline.

> creating a branched (singlethreadtravlike, parthreadtravlike) hierachy
> would make it impossible to use parallel collections where the api
> designer wanted to use single threaded ones but on the other hand

I think it's a good and highly appreciable thing to have easy to use
parallel collections which can be easily introduced instead of
sequencial ones.
For what it's worth I don't think that "transparent" parallel
collections instead of sequencial collections is the way to go.
Sneaking in parallel behaviour means sneaking in errors as the code
being forcefully parallelized might never have been intended for
parallelization.

I would try to go for a design where all hitherto existing ordered /
sequencial traits / classes are a special case of unordered tolerant /
parallelizable collections. That is Seq and co. is narrowing down the
degree of freedom to sequencial behaviour. This way existing code
could continue to operate as expected.

A redesigner could review code in question and relax the type
signature of methods to allow for parallel stuff to come in. That way
the contract is clear and there is an easy road to adopt
parallelization. (I'm sure there are drawback to this approach as well
- but at least there is no automagical parallelization which will
cause automagic indeterministic errors).

The main problem I currently see is that we're a bit late with this
discussion as RC1 is already released. But better late than never...

Greetings
Bernd

Re: REPL: null-values in parallel ranges?

On 03/04/11 16:55, HamsterofDeath wrote:
> something has to be fixed. the question is what.
> ...
> many methods are based on foreach.

I think I know where the problem is.

Tony Morris
http://tmorris.net/

RE: REPL: null-values in parallel ranges?

Scalaz's foreach is called |>|
This could be intuitively extended to a parallel version ("parallelForeach", "foreachParallel" == UGH!) of ||>||
Is it about time we started adding symbolically-named methods into the core library? Often their intent is just as clear as the English equivalent
Chris

> Date: Sun, 3 Apr 2011 17:00:23 +1000
> From: tonymorris [at] gmail [dot] com
> To: scala-user [at] googlegroups [dot] com
> Subject: Re: [scala-user] REPL: null-values in parallel ranges?
>
> On 03/04/11 16:55, HamsterofDeath wrote:
> > something has to be fixed. the question is what.
> > ...
> > many methods are based on foreach.
>
> I think I know where the problem is.
>
>
> Tony Morris
> http://tmorris.net/
>
>

Re: REPL: null-values in parallel ranges?

Hi Chris,

On Mon, Apr 4, 2011 at 12:34 AM, Chris Marshall wrote:
> Scalaz's foreach is called |>|
> This could be intuitively extended to a parallel version ("parallelForeach",
> "foreachParallel" == UGH!) of ||>||
>
"||>||" == &%*#$@%!

There's the comfy with operators sub-culture and the not-as-comfy with
operators subculture. Anyway, as someone else said, I think the
important question is whether there should be another foreach method
at all. Only if the answer is yes does it matter what its name is.

Bill

> Is it about time we started adding symbolically-named methods into the core
> library? Often their intent is just as clear as the English equivalent
> Chris
>
>> Date: Sun, 3 Apr 2011 17:00:23 +1000
>> From: tonymorris [at] gmail [dot] com
>> To: scala-user [at] googlegroups [dot] com
>> Subject: Re: [scala-user] REPL: null-values in parallel ranges?
>>
>> On 03/04/11 16:55, HamsterofDeath wrote:
>> > something has to be fixed. the question is what.
>> > ...
>> > many methods are based on foreach.
>>
>> I think I know where the problem is.
>>
>>
>> Tony Morris
>> http://tmorris.net/
>>
>>
>

RE: REPL: null-values in parallel ranges?

Frankly, I think that a second method's name (probably) having to come from the unwieldy foreachParallel/parallelForeach is a strong indication that there is something wrong with the fundamental design (i.e. if the path of a second method is chosen).

> Date: Mon, 4 Apr 2011 01:03:18 -0700
> Subject: Re: [scala-user] REPL: null-values in parallel ranges?
> From: bill [at] artima [dot] com
>
> Anyway, as someone else said, I think the
> important question is whether there should be another foreach method
> at all. Only if the answer is yes does it matter what its name is.
>
> Bill

Re: REPL: null-values in parallel ranges?

Frankly, I think that's absurd. The second name doesn't "have to" come from anywhere, and the name choice says nothing about the fundamental design.

The reality is that the contract for what is called "foreach", as a matter of expectation and of implementation (prior to parallel collections), even though not explicitly documented, is "forEachInSequence". Had such a name been chosen instead of "foreach", that would not have made the design any more or less wrong, "fundamentally" or otherwise. Ditto for giving what is by contract "forEachWithoutRegardToSequence" an "unwieldy" name or a short one.

If you think there's something wrong with the fundamental design then you should say what, rather than resting on totally bogus inferences.

-- Jim


On Mon, Apr 4, 2011 at 1:19 AM, Chris Marshall <oxbow_lakes [at] hotmail [dot] com> wrote:
Frankly, I think that a second method's name (probably) having to come from the unwieldy foreachParallel/parallelForeach is a strong indication that there is something wrong with the fundamental design (i.e. if the path of a second method is chosen).

> Date: Mon, 4 Apr 2011 01:03:18 -0700
> Subject: Re: [scala-user] REPL: null-values in parallel ranges?
> From: bill [at] artima [dot] com
>
> Anyway, as someone else said, I think the
> important question is whether there should be another foreach method
> at all. Only if the answer is yes does it matter what its name is.
>
> Bill

RE: REPL: null-values in parallel ranges?

My point was that, when we have to distinguish two almost-identical things by clumsy names (foreachParallel is clumsy, I confidently assert!), we might take a step back and ask ourselves why one entity can be asked to do two such almost-identical things in the first place. The problem here is quite that we have ended up with a design whereby we might intuitively expect that a for-comprehension is parallel and yet that we have thus broken (in a non-obvious and non-detectable way) some legacy code (because a parallel collection is also a sequential one by design).
That is, our parallel iterable is doing two things at the same time (it's a separation of concerns problem). I think this is *nicely demonstrated* (or illustrated) by the naming issue of some putative second method (and hence not absurd in the least).
Chris

From: Jim [at] Balter [dot] name
Date: Mon, 4 Apr 2011 02:17:02 -0700
Subject: Re: [scala-user] REPL: null-values in parallel ranges?
To: oxbow_lakes [at] hotmail [dot] com
CC: scala-user [at] googlegroups [dot] com

Frankly, I think that's absurd. The second name doesn't "have to" come from anywhere, and the name choice says nothing about the fundamental design.

The reality is that the contract for what is called "foreach", as a matter of expectation and of implementation (prior to parallel collections), even though not explicitly documented, is "forEachInSequence". Had such a name been chosen instead of "foreach", that would not have made the design any more or less wrong, "fundamentally" or otherwise. Ditto for giving what is by contract "forEachWithoutRegardToSequence" an "unwieldy" name or a short one.

If you think there's something wrong with the fundamental design then you should say what, rather than resting on totally bogus inferences.

-- Jim


On Mon, Apr 4, 2011 at 1:19 AM, Chris Marshall <oxbow_lakes [at] hotmail [dot] com> wrote:
Frankly, I think that a second method's name (probably) having to come from the unwieldy foreachParallel/parallelForeach is a strong indication that there is something wrong with the fundamental design (i.e. if the path of a second method is chosen).

> Date: Mon, 4 Apr 2011 01:03:18 -0700
> Subject: Re: [scala-user] REPL: null-values in parallel ranges?
> From: bill [at] artima [dot] com
>
> Anyway, as someone else said, I think the
> important question is whether there should be another foreach method
> at all. Only if the answer is yes does it matter what its name is.
>
> Bill

Re: REPL: null-values in parallel ranges?

I already addressed all of this handwaving with specifics."forEachInSequence" and "forEachWithoutRegardToSequence" are non-identical in exactly the way that their different names say.

In a side-effect-free, time-order-free universe, we would not have invented necessarily-sequential collections first and then only later realized that we wanted not-necessarily-sequential collections ... we would have invented only the latter and would have defined necessarily-sequential operations for them. But the fact is that we don't live in such a universe so there isn't anything fundamentally wrong with the design of having both sorts of collections -- certainly not inferable from grossly misrepresentative nonsense as you have written.

-- Jim



On Mon, Apr 4, 2011 at 2:32 AM, Chris Marshall <oxbow_lakes [at] hotmail [dot] com> wrote:
My point was that, when we have to distinguish two almost-identical things by clumsy names (foreachParallel is clumsy, I confidently assert!), we might take a step back and ask ourselves why one entity can be asked to do two such almost-identical things in the first place. The problem here is quite that we have ended up with a design whereby we might intuitively expect that a for-comprehension is parallel and yet that we have thus broken (in a non-obvious and non-detectable way) some legacy code (because a parallel collection is also a sequential one by design).
That is, our parallel iterable is doing two things at the same time (it's a separation of concerns problem). I think this is *nicely demonstrated* (or illustrated) by the naming issue of some putative second method (and hence not absurd in the least).
Chris

From: Jim [at] Balter [dot] name
Date: Mon, 4 Apr 2011 02:17:02 -0700
Subject: Re: [scala-user] REPL: null-values in parallel ranges?
To: oxbow_lakes [at] hotmail [dot] com
CC: scala-user [at] googlegroups [dot] com

Frankly, I think that's absurd. The second name doesn't "have to" come from anywhere, and the name choice says nothing about the fundamental design.

The reality is that the contract for what is called "foreach", as a matter of expectation and of implementation (prior to parallel collections), even though not explicitly documented, is "forEachInSequence". Had such a name been chosen instead of "foreach", that would not have made the design any more or less wrong, "fundamentally" or otherwise. Ditto for giving what is by contract "forEachWithoutRegardToSequence" an "unwieldy" name or a short one.

If you think there's something wrong with the fundamental design then you should say what, rather than resting on totally bogus inferences.

-- Jim


On Mon, Apr 4, 2011 at 1:19 AM, Chris Marshall <oxbow_lakes [at] hotmail [dot] com> wrote:
Frankly, I think that a second method's name (probably) having to come from the unwieldy foreachParallel/parallelForeach is a strong indication that there is something wrong with the fundamental design (i.e. if the path of a second method is chosen).

> Date: Mon, 4 Apr 2011 01:03:18 -0700
> Subject: Re: [scala-user] REPL: null-values in parallel ranges?
> From: bill [at] artima [dot] com
>
> Anyway, as someone else said, I think the
> important question is whether there should be another foreach method
> at all. Only if the answer is yes does it matter what its name is.
>
> Bill


RE: REPL: null-values in parallel ranges?

I'm bagging "grossly misrepresentative nonsense" for both my new motto and my epitaph.

From: Jim [at] Balter [dot] name
Date: Mon, 4 Apr 2011 02:55:44 -0700
Subject: Re: [scala-user] REPL: null-values in parallel ranges?
To: oxbow_lakes [at] hotmail [dot] com
CC: scala-user [at] googlegroups [dot] com

I already addressed all of this handwaving with specifics."forEachInSequence" and "forEachWithoutRegardToSequence" are non-identical in exactly the way that their different names say.

In a side-effect-free, time-order-free universe, we would not have invented necessarily-sequential collections first and then only later realized that we wanted not-necessarily-sequential collections ... we would have invented only the latter and would have defined necessarily-sequential operations for them. But the fact is that we don't live in such a universe so there isn't anything fundamentally wrong with the design of having both sorts of collections -- certainly not inferable from grossly misrepresentative nonsense as you have written.

-- Jim



Re: REPL: null-values in parallel ranges?

Seems appropriate to me. Your argument is like saying that the fact that "contents of address register" and "contents of data register" are nearly identical in name and operation demonstrates a fundamental design error in LISP -- a cons cell should not have two such similar operations; separation of concerns and all that. Since you're so focused on the length of names and how "unwieldy" they are, you should be satisfied if Scala goes with feio and feuo.

-- Jim

On Mon, Apr 4, 2011 at 3:32 AM, Chris Marshall <oxbow_lakes [at] hotmail [dot] com> wrote:
I'm bagging "grossly misrepresentative nonsense" for both my new motto and my epitaph.

From: Jim [at] Balter [dot] name
Date: Mon, 4 Apr 2011 02:55:44 -0700
Subject: Re: [scala-user] REPL: null-values in parallel ranges?
To: oxbow_lakes [at] hotmail [dot] com
CC: scala-user [at] googlegroups [dot] com

I already addressed all of this handwaving with specifics."forEachInSequence" and "forEachWithoutRegardToSequence" are non-identical in exactly the way that their different names say.

In a side-effect-free, time-order-free universe, we would not have invented necessarily-sequential collections first and then only later realized that we wanted not-necessarily-sequential collections ... we would have invented only the latter and would have defined necessarily-sequential operations for them. But the fact is that we don't live in such a universe so there isn't anything fundamentally wrong with the design of having both sorts of collections -- certainly not inferable from grossly misrepresentative nonsense as you have written.

-- Jim




Re: REPL: null-values in parallel ranges?

I'm starting to loose track of what the actual issue is in all the proposals for fixing it. To my mind, the problem is two-fold:
1. writing parallel code that touches mutable state and talks to IO is hard
2. lots of existing code exists that assumes (implicitly or explicitly) that  a. foreach will proceed in series  b. foreach will process each element in the same order each time it is called
Now, 1 is one of those things that are fundamental truths about programming. We can do things with a side-effects system, encourage monadic computation and so on, but ultimately we have to put some of the responsibility for getting this right on the coder.
As for 2, it is potentially very serious (silent/pernicious bugs, no obvious cause, faults in legacy code that used to work) but also within our gift to address these through how we structure the APIs. My preferred option would be to inject an extra trait above the existing collections, so that we have a parallel and serial hierarchy, and to keep the two using the exact same method names for map/foreach/... . If this is put into the right place, all existing code not only gets serial instances but the types enforce that they must get serial instances. If we mix in the 'what kind of collection it is' traits correctly, we allow code to be polymorphic over serial/parallel implementations, if that's what the coder asks for. Similarly, the ordering issue should be flagged by a trait - I can well imagine mutable implementations of set/tree where every lookup potentially causes the foreach order to alter as commonly-accessed keys are brought towards the root.
Matthew

-- Matthew Pocockmailto: turingatemyhamster [at] gmail [dot] comgchat: turingatemyhamster [at] gmail [dot] com msn: matthew_pocock [at] yahoo [dot] co [dot] ukirc.freenode.net: drdozer(0191) 2566550

Re: REPL: null-values in parallel ranges?

On 04/04/11 18:19, Chris Marshall wrote:
> indication that there is something wrong with the fundamental design
Just smile Chris :)

Re: REPL: null-values in parallel ranges?

On Mon, Apr 4, 2011 at 9:34 AM, Chris Marshall wrote:
> Scalaz's foreach is called |>|
> This could be intuitively extended to a parallel version ("parallelForeach",
> "foreachParallel" == UGH!) of ||>||
> Is it about time we started adding symbolically-named methods into the core
> library? Often their intent is just as clear as the English equivalent

At this stage, the only interesting question is whether there are one
or two names that encompass the sequential and parallel natures of
foreach. Let's not distract this (important!) discussion with a side
debate over what the names might be and how long they are.

-jason

RE: REPL: null-values in parallel ranges?

But it seems feasible (at least to me) that opinions on that matter are swayed by the possible names that such a secondary method may take!

> Date: Mon, 4 Apr 2011 09:44:50 +0200
> Subject: Re: [scala-user] REPL: null-values in parallel ranges?
> From: jzaugg [at] gmail [dot] com
> To: oxbow_lakes [at] hotmail [dot] com
> CC: tmorris [at] tmorris [dot] net; scala-user [at] googlegroups [dot] com
>
> On Mon, Apr 4, 2011 at 9:34 AM, Chris Marshall <oxbow_lakes [at] hotmail [dot] com> wrote:
> > Scalaz's foreach is called |>|
> > This could be intuitively extended to a parallel version ("parallelForeach",
> > "foreachParallel" == UGH!) of ||>||
> > Is it about time we started adding symbolically-named methods into the core
> > library? Often their intent is just as clear as the English equivalent
>
> At this stage, the only interesting question is whether there are one
> or two names that encompass the sequential and parallel natures of
> foreach. Let's not distract this (important!) discussion with a side
> debate over what the names might be and how long they are.
>
> -jason

Re: REPL: null-values in parallel ranges?

On 02/04/11 23:02, Jonas Bonér wrote:
>
> I don't think parallel collections should "lie" and provide a leaky abstraction over concurrency. When someone asks for parallel execution I think it is best that they get what they are asking for. Sure it might break sequential code, but threads does that; they turn determinism into non-determinism. Let's embrace parallelism at its core and reap the benefits of it rather than trying to force it into something that it is not. It is a matter of education; documentation needs to be a lot better. People needs to understand that concurrency and parallelism is here to stay and a lot of the time it still ain't that pretty.
>

I agree that people should get used to the fact that parallelism is
out there. The only downside here is that if one really wants to be
sure the code always executes correctly, a user has to write:

for (x <- xs.seq) ...

all the time.

The problem is that someone might not know he's asking for parallel
execution.

> ...
> def oldMethod(seq: Seq[String]) { for (e <- seq) println(e) }
>
> But new code that really wants parallel execution could just do .par
> on that Seq[T], which would somehow kick it into parallel execution.
>
> def newMethod(pSeq: Seq[String] with Parallel) { for (e <- pSeq) println(e) }
>
> By saying the method takes a Seq[String] with Parallel, it is
> announcing that it might do foreaches in parallel.
>
> I think it would be fine to let flatMap, filter, withFilter, map be
> rewritten in the same way, because most times people wouldn't have
> side effects in those. But foreach is all about the side effects.
>
> Bill

I agree something like a compiler rewrite might possibly be a good
idea.
However, adding an additional `pareach` method means for-
comprehensions no longer work.
A couple of alternatives I see are:
- have `foreach` being sequential, and add a method `inPar` to
parallel collections, which returns a thin wrapper around the
collection such that its `foreach` is executed in parallel (forwards
to `pareach`). The compiler could then rewrite the for-comprehension
like this if it detects that the generator isInstanceOf[Parallel] at
compile time:

for (x <- xs) { ... }

<>

--> for (x <- xs.inPar) { ... }

(now it calls `foreach` of the thin wrapper which forwards to
`pareach`)

This does not solve the problem with map, flatMap, filter, and
friends, but it is a start - something like this could work for them.

- have a compiler rewrite rule which adds a `.seq` call to the
generator if it knows it is a collection, but not an instance of
`Parallel` at compile time:

for (x <- Iterable(1, 2, 3)) { ... } becomes for (x <- Iterable(1, 2,
3).seq)

but:

for (x <- Iterable(1, 2, 3).par) { ... } stays what it is

- a non-rewriting approach - have a parallel block which envelops the
for-comprehension, ensuring that everything inside executes in
parallel:

parallel {
for ( ... ) {
// ...
}
}

This requires the user to be more explicit. The downside is that
nested methods called in the for-body might be affected.

Re: Re: REPL: null-values in parallel ranges?

Hi Aleksandar,

On Mon, Apr 4, 2011 at 12:51 AM, Aleksandar Prokopec
wrote:
> On 02/04/11 23:02, Jonas Bonér wrote:
>>
>> I don't think parallel collections should "lie" and provide a leaky abstraction over concurrency. When someone asks for parallel execution I think it is best that they get what they are asking for. Sure it might break sequential code, but threads does that; they turn determinism into non-determinism. Let's embrace parallelism at its core and reap the benefits of it rather than trying to force it into something that it is not. It is a matter of education; documentation needs to be a lot better. People needs to understand that concurrency and parallelism is here to stay and a lot of the time it still ain't that pretty.
>>
>
> I agree that people should get used to the fact that parallelism is
> out there. The only downside here is that if one really wants to be
> sure the code always executes correctly, a user has to write:
>
> for (x <- xs.seq) ...
>
> all the time.
>
> The problem is that someone might not know he's asking for parallel
> execution.
>
>> ...
>> def oldMethod(seq: Seq[String]) { for (e <- seq) println(e) }
>>
>> But new code that really wants parallel execution could just do .par
>> on that Seq[T], which would somehow kick it into parallel execution.
>>
>> def newMethod(pSeq: Seq[String] with Parallel) { for (e <- pSeq) println(e) }
>>
>> By saying the method takes a Seq[String] with Parallel, it is
>> announcing that it might do foreaches in parallel.
>>
>> I think it would be fine to let flatMap, filter, withFilter, map be
>> rewritten in the same way, because most times people wouldn't have
>> side effects in those. But foreach is all about the side effects.
>>
>> Bill
>
> I agree something like a compiler rewrite might possibly be a good
> idea.
> However, adding an additional `pareach` method means for-
> comprehensions no longer work.
>
Actually after sleeping on it last night, I emailed a different note
this morning in which I changed my tune. I suggested the rules for
translating for expressions to invocations of foreach, withFilter,
map, and flatMap should stay the same as they are now. For expressions
are about more than just collections.

Now to your suggestion that adding a separate pareach method would
make for expressions not work, I'd say they still work like they
always did, but just that foreach is not something that will be done
in parallel in parallel collections, whereas map, flatMap, and
withFilter (and a bunch of other higher order methods on collections)
will be done in parallel on parallel collections. That would mean you
could never get a parallel foreach out of a for expression, and I
don't think that's so bad. A parallel foreach requires a thread-safe
side-effecting function, which is hard to write. So making the method
name long and (to some) ugly, such as parallelForeach or
foreachParallel, discourages its use for frivolous endeavors, and so
does not providing a way to get a parallel foreach out of a for
expression.

Bill

> A couple of alternatives I see are:
> - have `foreach` being sequential, and add a method `inPar` to
> parallel collections, which returns a thin wrapper around the
> collection such that its `foreach` is executed in parallel (forwards
> to `pareach`). The compiler could then rewrite the for-comprehension
> like this if it detects that the generator isInstanceOf[Parallel] at
> compile time:
>
> for (x <- xs) { ... }
>
> <>
>
> --> for (x <- xs.inPar) { ... }
>
> (now it calls `foreach` of the thin wrapper which forwards to
> `pareach`)
>
> This does not solve the problem with map, flatMap, filter, and
> friends, but it is a start - something like this could work for them.
>
> - have a compiler rewrite rule which adds a `.seq` call to the
> generator if it knows it is a collection, but not an instance of
> `Parallel` at compile time:
>
> for (x <- Iterable(1, 2, 3)) { ... } becomes for (x <- Iterable(1, 2,
> 3).seq)
>
> but:
>
> for (x <- Iterable(1, 2, 3).par) { ... } stays what it is
>
> - a non-rewriting approach - have a parallel block which envelops the
> for-comprehension, ensuring that everything inside executes in
> parallel:
>
> parallel {
>  for ( ... ) {
>    // ...
>  }
> }
>
> This requires the user to be more explicit. The downside is that
> nested methods called in the for-body might be affected.
>
>

Re: REPL: null-values in parallel ranges?

New to Scala. I expect the below suggestion won't be well liked. But
maybe it provokes some better ideas:

If splitting foreach in two, a sequential foreach and a parallel
parforeach, for expressions may be limited in their parallel potential
unless the compiler/generator is too 'magical' for its own good.

If parallelism is meant to be a new fundamental aspect in the
language, I think it needs to be exposed in more than a method name.

you can write:

for (arg <- args)
println(arg)

and your body will have guaranteed sequential execution, and you can
write the below to allow parallel execution but not force it:

pfor (arg <- args)
println(arg)

My experience with concurrent programming suggests that there will be
many times a user wishes to guarantee sequential execution, and others
where they want to get the most parallelism as possible. This is
both for users and library writers. Sometimes as a user, I don't
want to eat up multiple threads, even if it is faster in isolation.
Sometimes sequential execution has higher throughput in aggregate with
other work (but higher latency). Other times, parallelism is
essential.

The above addition to for comprehensions makes the choice very
explicit. How fundamental should this be? I have no idea, but if
concurrency is going to be at the heart of the language then a
language construct might make sense. Underneath the covers, the two
generators could do many things slightly differently -- map and
flatMap as well as foreach.

On Apr 4, 1:12 am, Bill Venners wrote:
> Hi Aleksandar,
>
> On Mon, Apr 4, 2011 at 12:51 AM, Aleksandar Prokopec
>
>
>
>
>
>
>
> wrote:
> > On 02/04/11 23:02, Jonas Bonér wrote:
>
> >> I don't think parallel collections should "lie" and provide a leaky abstraction over concurrency. When someone asks for parallel execution I think it is best that they get what they are asking for. Sure it might break sequential code, but threads does that; they turn determinism into non-determinism. Let's embrace parallelism at its core and reap the benefits of it rather than trying to force it into something that it is not. It is a matter of education; documentation needs to be a lot better. People needs to understand that concurrency and parallelism is here to stay and a lot of the time it still ain't that pretty.
>
> > I agree that people should get used to the fact that parallelism is
> > out there. The only downside here is that if one really wants to be
> > sure the code always executes correctly, a user has to write:
>
> > for (x <- xs.seq) ...
>
> > all the time.
>
> > The problem is that someone might not know he's asking for parallel
> > execution.
>
> >> ...
> >> def oldMethod(seq: Seq[String]) { for (e <- seq) println(e) }
>
> >> But new code that really wants parallel execution could just do .par
> >> on that Seq[T], which would somehow kick it into parallel execution.
>
> >> def newMethod(pSeq: Seq[String] with Parallel) { for (e <- pSeq) println(e) }
>
> >> By saying the method takes a Seq[String] with Parallel, it is
> >> announcing that it might do foreaches in parallel.
>
> >> I think it would be fine to let flatMap, filter, withFilter, map be
> >> rewritten in the same way, because most times people wouldn't have
> >> side effects in those. But foreach is all about the side effects.
>
> >> Bill
>
> > I agree something like a compiler rewrite might possibly be a good
> > idea.
> > However, adding an additional `pareach` method means for-
> > comprehensions no longer work.
>
> Actually after sleeping on it last night, I emailed a different note
> this morning in which I changed my tune. I suggested the rules for
> translating for expressions to invocations of foreach, withFilter,
> map, and flatMap should stay the same as they are now. For expressions
> are about more than just collections.
>
> Now to your suggestion that adding a separate pareach method would
> make for expressions not work, I'd say they still work like they
> always did, but just that foreach is not something that will be done
> in parallel in parallel collections, whereas map, flatMap, and
> withFilter (and a bunch of other higher order methods on collections)
> will be done in parallel on parallel collections. That would mean you
> could never get a parallel foreach out of a for expression, and I
> don't think that's so bad. A parallel foreach requires a thread-safe
> side-effecting function, which is hard to write. So making the method
> name long and (to some) ugly, such as parallelForeach or
> foreachParallel, discourages its use for frivolous endeavors, and so
> does not providing a way to get a parallel foreach out of a for
> expression.
>
> Bill
>
>
>
>
>
>
>
>
>
> > A couple of alternatives I see are:
> > - have `foreach` being sequential, and add a method `inPar` to
> > parallel collections, which returns a thin wrapper around the
> > collection such that its `foreach` is executed in parallel (forwards
> > to `pareach`). The compiler could then rewrite the for-comprehension
> > like this if it detects that the generator isInstanceOf[Parallel] at
> > compile time:
>
> > for (x <- xs) { ... }
>
> > <>
>
> > --> for (x <- xs.inPar) { ... }
>
> > (now it calls `foreach` of the thin wrapper which forwards to
> > `pareach`)
>
> > This does not solve the problem with map, flatMap, filter, and
> > friends, but it is a start - something like this could work for them.
>
> > - have a compiler rewrite rule which adds a `.seq` call to the
> > generator if it knows it is a collection, but not an instance of
> > `Parallel` at compile time:
>
> > for (x <- Iterable(1, 2, 3)) { ... } becomes for (x <- Iterable(1, 2,
> > 3).seq)
>
> > but:
>
> > for (x <- Iterable(1, 2, 3).par) { ... } stays what it is
>
> > - a non-rewriting approach - have a parallel block which envelops the
> > for-comprehension, ensuring that everything inside executes in
> > parallel:
>
> > parallel {
> >  for ( ... ) {
> >    // ...
> >  }
> > }
>
> > This requires the user to be more explicit. The downside is that
> > nested methods called in the for-body might be affected.
>
> --
> Bill Venners
> Artima, Inc.http://www.artima.com

Re: Re: REPL: null-values in parallel ranges?

in scala, using the language itself it preferred to introducing new keywords

-------- Original-Nachricht --------
> Datum: Mon, 4 Apr 2011 21:05:15 -0700 (PDT)
> Von: ScottC
> An: scala-user
> Betreff: [scala-user] Re: REPL: null-values in parallel ranges?

> New to Scala. I expect the below suggestion won't be well liked. But
> maybe it provokes some better ideas:
>
> If splitting foreach in two, a sequential foreach and a parallel
> parforeach, for expressions may be limited in their parallel potential
> unless the compiler/generator is too 'magical' for its own good.
>
> If parallelism is meant to be a new fundamental aspect in the
> language, I think it needs to be exposed in more than a method name.
>
> you can write:
>
> for (arg <- args)
> println(arg)
>
> and your body will have guaranteed sequential execution, and you can
> write the below to allow parallel execution but not force it:
>
> pfor (arg <- args)
> println(arg)
>
> My experience with concurrent programming suggests that there will be
> many times a user wishes to guarantee sequential execution, and others
> where they want to get the most parallelism as possible. This is
> both for users and library writers. Sometimes as a user, I don't
> want to eat up multiple threads, even if it is faster in isolation.
> Sometimes sequential execution has higher throughput in aggregate with
> other work (but higher latency). Other times, parallelism is
> essential.
>
> The above addition to for comprehensions makes the choice very
> explicit. How fundamental should this be? I have no idea, but if
> concurrency is going to be at the heart of the language then a
> language construct might make sense. Underneath the covers, the two
> generators could do many things slightly differently -- map and
> flatMap as well as foreach.
>
>
> On Apr 4, 1:12 am, Bill Venners wrote:
> > Hi Aleksandar,
> >
> > On Mon, Apr 4, 2011 at 12:51 AM, Aleksandar Prokopec
> >
> >
> >
> >
> >
> >
> >
> > wrote:
> > > On 02/04/11 23:02, Jonas Bonér wrote:
> >
> > >> I don't think parallel collections should "lie" and provide a leaky
> abstraction over concurrency. When someone asks for parallel execution I
> think it is best that they get what they are asking for. Sure it might break
> sequential code, but threads does that; they turn determinism into
> non-determinism. Let's embrace parallelism at its core and reap the benefits of it
> rather than trying to force it into something that it is not. It is a matter
> of education; documentation needs to be a lot better. People needs to
> understand that concurrency and parallelism is here to stay and a lot of the
> time it still ain't that pretty.
> >
> > > I agree that people should get used to the fact that parallelism is
> > > out there. The only downside here is that if one really wants to be
> > > sure the code always executes correctly, a user has to write:
> >
> > > for (x <- xs.seq) ...
> >
> > > all the time.
> >
> > > The problem is that someone might not know he's asking for parallel
> > > execution.
> >
> > >> ...
> > >> def oldMethod(seq: Seq[String]) { for (e <- seq) println(e) }
> >
> > >> But new code that really wants parallel execution could just do .par
> > >> on that Seq[T], which would somehow kick it into parallel execution.
> >
> > >> def newMethod(pSeq: Seq[String] with Parallel) { for (e <- pSeq)
> println(e) }
> >
> > >> By saying the method takes a Seq[String] with Parallel, it is
> > >> announcing that it might do foreaches in parallel.
> >
> > >> I think it would be fine to let flatMap, filter, withFilter, map be
> > >> rewritten in the same way, because most times people wouldn't have
> > >> side effects in those. But foreach is all about the side effects.
> >
> > >> Bill
> >
> > > I agree something like a compiler rewrite might possibly be a good
> > > idea.
> > > However, adding an additional `pareach` method means for-
> > > comprehensions no longer work.
> >
> > Actually after sleeping on it last night, I emailed a different note
> > this morning in which I changed my tune. I suggested the rules for
> > translating for expressions to invocations of foreach, withFilter,
> > map, and flatMap should stay the same as they are now. For expressions
> > are about more than just collections.
> >
> > Now to your suggestion that adding a separate pareach method would
> > make for expressions not work, I'd say they still work like they
> > always did, but just that foreach is not something that will be done
> > in parallel in parallel collections, whereas map, flatMap, and
> > withFilter (and a bunch of other higher order methods on collections)
> > will be done in parallel on parallel collections. That would mean you
> > could never get a parallel foreach out of a for expression, and I
> > don't think that's so bad. A parallel foreach requires a thread-safe
> > side-effecting function, which is hard to write. So making the method
> > name long and (to some) ugly, such as parallelForeach or
> > foreachParallel, discourages its use for frivolous endeavors, and so
> > does not providing a way to get a parallel foreach out of a for
> > expression.
> >
> > Bill
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > > A couple of alternatives I see are:
> > > - have `foreach` being sequential, and add a method `inPar` to
> > > parallel collections, which returns a thin wrapper around the
> > > collection such that its `foreach` is executed in parallel (forwards
> > > to `pareach`). The compiler could then rewrite the for-comprehension
> > > like this if it detects that the generator isInstanceOf[Parallel] at
> > > compile time:
> >
> > > for (x <- xs) { ... }
> >
> > > <>
> >
> > > --> for (x <- xs.inPar) { ... }
> >
> > > (now it calls `foreach` of the thin wrapper which forwards to
> > > `pareach`)
> >
> > > This does not solve the problem with map, flatMap, filter, and
> > > friends, but it is a start - something like this could work for them.
> >
> > > - have a compiler rewrite rule which adds a `.seq` call to the
> > > generator if it knows it is a collection, but not an instance of
> > > `Parallel` at compile time:
> >
> > > for (x <- Iterable(1, 2, 3)) { ... } becomes for (x <- Iterable(1, 2,
> > > 3).seq)
> >
> > > but:
> >
> > > for (x <- Iterable(1, 2, 3).par) { ... } stays what it is
> >
> > > - a non-rewriting approach - have a parallel block which envelops the
> > > for-comprehension, ensuring that everything inside executes in
> > > parallel:
> >
> > > parallel {
> > >  for ( ... ) {
> > >    // ...
> > >  }
> > > }
> >
> > > This requires the user to be more explicit. The downside is that
> > > nested methods called in the for-body might be affected.
> >
> > --
> > Bill Venners
> > Artima, Inc.http://www.artima.com

Re: Re: REPL: null-values in parallel ranges?

...to some extent.

The following can easily be implemented in the language as a library,
but are not:
* if/else
* try/catch
* finally
* while
* for, to some degree (it would be much more useful to alter the last
call to flatMap in a comprehension and not map).

On 05/04/11 16:53, Dennis Haupt wrote:
> in scala, using the language itself it preferred to introducing new keywords
>
>
> -------- Original-Nachricht --------
>> Datum: Mon, 4 Apr 2011 21:05:15 -0700 (PDT)
>> Von: ScottC
>> An: scala-user
>> Betreff: [scala-user] Re: REPL: null-values in parallel ranges?
>> New to Scala. I expect the below suggestion won't be well liked. But
>> maybe it provokes some better ideas:
>>
>> If splitting foreach in two, a sequential foreach and a parallel
>> parforeach, for expressions may be limited in their parallel potential
>> unless the compiler/generator is too 'magical' for its own good.
>>
>> If parallelism is meant to be a new fundamental aspect in the
>> language, I think it needs to be exposed in more than a method name.
>>
>> you can write:
>>
>> for (arg <- args)
>> println(arg)
>>
>> and your body will have guaranteed sequential execution, and you can
>> write the below to allow parallel execution but not force it:
>>
>> pfor (arg <- args)
>> println(arg)
>>
>> My experience with concurrent programming suggests that there will be
>> many times a user wishes to guarantee sequential execution, and others
>> where they want to get the most parallelism as possible. This is
>> both for users and library writers. Sometimes as a user, I don't
>> want to eat up multiple threads, even if it is faster in isolation.
>> Sometimes sequential execution has higher throughput in aggregate with
>> other work (but higher latency). Other times, parallelism is
>> essential.
>>
>> The above addition to for comprehensions makes the choice very
>> explicit. How fundamental should this be? I have no idea, but if
>> concurrency is going to be at the heart of the language then a
>> language construct might make sense. Underneath the covers, the two
>> generators could do many things slightly differently -- map and
>> flatMap as well as foreach.
>>
>>
>> On Apr 4, 1:12 am, Bill Venners wrote:
>>> Hi Aleksandar,
>>>
>>> On Mon, Apr 4, 2011 at 12:51 AM, Aleksandar Prokopec
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> wrote:
>>>> On 02/04/11 23:02, Jonas Bonér wrote:
>>>>> I don't think parallel collections should "lie" and provide a leaky
>> abstraction over concurrency. When someone asks for parallel execution I
>> think it is best that they get what they are asking for. Sure it might break
>> sequential code, but threads does that; they turn determinism into
>> non-determinism. Let's embrace parallelism at its core and reap the benefits of it
>> rather than trying to force it into something that it is not. It is a matter
>> of education; documentation needs to be a lot better. People needs to
>> understand that concurrency and parallelism is here to stay and a lot of the
>> time it still ain't that pretty.
>>>> I agree that people should get used to the fact that parallelism is
>>>> out there. The only downside here is that if one really wants to be
>>>> sure the code always executes correctly, a user has to write:
>>>> for (x <- xs.seq) ...
>>>> all the time.
>>>> The problem is that someone might not know he's asking for parallel
>>>> execution.
>>>>> ...
>>>>> def oldMethod(seq: Seq[String]) { for (e <- seq) println(e) }
>>>>> But new code that really wants parallel execution could just do .par
>>>>> on that Seq[T], which would somehow kick it into parallel execution.
>>>>> def newMethod(pSeq: Seq[String] with Parallel) { for (e <- pSeq)
>> println(e) }
>>>>> By saying the method takes a Seq[String] with Parallel, it is
>>>>> announcing that it might do foreaches in parallel.
>>>>> I think it would be fine to let flatMap, filter, withFilter, map be
>>>>> rewritten in the same way, because most times people wouldn't have
>>>>> side effects in those. But foreach is all about the side effects.
>>>>> Bill
>>>> I agree something like a compiler rewrite might possibly be a good
>>>> idea.
>>>> However, adding an additional `pareach` method means for-
>>>> comprehensions no longer work.
>>> Actually after sleeping on it last night, I emailed a different note
>>> this morning in which I changed my tune. I suggested the rules for
>>> translating for expressions to invocations of foreach, withFilter,
>>> map, and flatMap should stay the same as they are now. For expressions
>>> are about more than just collections.
>>>
>>> Now to your suggestion that adding a separate pareach method would
>>> make for expressions not work, I'd say they still work like they
>>> always did, but just that foreach is not something that will be done
>>> in parallel in parallel collections, whereas map, flatMap, and
>>> withFilter (and a bunch of other higher order methods on collections)
>>> will be done in parallel on parallel collections. That would mean you
>>> could never get a parallel foreach out of a for expression, and I
>>> don't think that's so bad. A parallel foreach requires a thread-safe
>>> side-effecting function, which is hard to write. So making the method
>>> name long and (to some) ugly, such as parallelForeach or
>>> foreachParallel, discourages its use for frivolous endeavors, and so
>>> does not providing a way to get a parallel foreach out of a for
>>> expression.
>>>
>>> Bill
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>> A couple of alternatives I see are:
>>>> - have `foreach` being sequential, and add a method `inPar` to
>>>> parallel collections, which returns a thin wrapper around the
>>>> collection such that its `foreach` is executed in parallel (forwards
>>>> to `pareach`). The compiler could then rewrite the for-comprehension
>>>> like this if it detects that the generator isInstanceOf[Parallel] at
>>>> compile time:
>>>> for (x <- xs) { ... }
>>>> <>
>>>> --> for (x <- xs.inPar) { ... }
>>>> (now it calls `foreach` of the thin wrapper which forwards to
>>>> `pareach`)
>>>> This does not solve the problem with map, flatMap, filter, and
>>>> friends, but it is a start - something like this could work for them.
>>>> - have a compiler rewrite rule which adds a `.seq` call to the
>>>> generator if it knows it is a collection, but not an instance of
>>>> `Parallel` at compile time:
>>>> for (x <- Iterable(1, 2, 3)) { ... } becomes for (x <- Iterable(1, 2,
>>>> 3).seq)
>>>> but:
>>>> for (x <- Iterable(1, 2, 3).par) { ... } stays what it is
>>>> - a non-rewriting approach - have a parallel block which envelops the
>>>> for-comprehension, ensuring that everything inside executes in
>>>> parallel:
>>>> parallel {
>>>> for ( ... ) {
>>>> // ...
>>>> }
>>>> }
>>>> This requires the user to be more explicit. The downside is that
>>>> nested methods called in the for-body might be affected.
>>> --
>>> Bill Venners
>>> Artima, Inc.http://www.artima.com

Re: Re: REPL: null-values in parallel ranges?

This is just an idea, but is there value in a parallel-comprehension?

par(x <- xs) x.foo

On Tue, Apr 5, 2011 at 12:14 PM, Tony Morris <tonymorris [at] gmail [dot] com> wrote:
...to some extent.

The following can easily be implemented in the language as a library,
but are not:
* if/else
* try/catch
* finally
* while
* for, to some degree (it would be much more useful to alter the last
call to flatMap in a comprehension and not map).


On 05/04/11 16:53, Dennis Haupt wrote:
> in scala, using the language itself it preferred to introducing new keywords
>
>
> -------- Original-Nachricht --------
>> Datum: Mon, 4 Apr 2011 21:05:15 -0700 (PDT)
>> Von: ScottC <scott [dot] carey [at] gmail [dot] com>
>> An: scala-user <scala-user [at] googlegroups [dot] com>
>> Betreff: [scala-user] Re: REPL: null-values in parallel ranges?
>> New to Scala.  I expect the below suggestion won't be well liked.  But
>> maybe it provokes some better ideas:
>>
>> If splitting foreach in two, a sequential foreach and a parallel
>> parforeach, for expressions may be limited in their parallel potential
>> unless the compiler/generator is too 'magical' for its own good.
>>
>> If parallelism is meant to be a new fundamental aspect in the
>> language, I think it needs to be exposed in more than a method name.
>>
>> you can write:
>>
>>   for (arg <- args)
>>     println(arg)
>>
>> and your body will have guaranteed sequential execution, and you can
>> write the below to allow parallel execution but not force it:
>>
>>   pfor (arg <- args)
>>     println(arg)
>>
>> My experience with concurrent programming suggests that there will be
>> many times a user wishes to guarantee sequential execution, and others
>> where they want to get the most parallelism as possible.   This is
>> both for users and library writers.   Sometimes as a user, I don't
>> want to eat up multiple threads, even if it is faster in isolation.
>> Sometimes sequential execution has higher throughput in aggregate with
>> other work (but higher latency).  Other times, parallelism is
>> essential.
>>
>> The above addition to for comprehensions makes the choice very
>> explicit.  How fundamental should this be?  I have no idea, but if
>> concurrency is going to be at the heart of the language then a
>> language construct might make sense.  Underneath the covers, the two
>> generators could do many things slightly differently -- map and
>> flatMap as well as foreach.
>>
>>
>> On Apr 4, 1:12 am, Bill Venners <b [dot] [dot] [dot] [at] artima [dot] com> wrote:
>>> Hi Aleksandar,
>>>
>>> On Mon, Apr 4, 2011 at 12:51 AM, Aleksandar Prokopec
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> <aleksandar [dot] proko [dot] [dot] [dot] [at] gmail [dot] com> wrote:
>>>> On 02/04/11 23:02, Jonas Bonér wrote:
>>>>> I don't think parallel collections should "lie" and provide a leaky
>> abstraction over concurrency. When someone asks for parallel execution I
>> think it is best that they get what they are asking for. Sure it might break
>> sequential code, but threads does that; they turn determinism into
>> non-determinism. Let's embrace parallelism at its core and reap the benefits of it
>> rather than trying to force it into something that it is not. It is a matter
>> of education; documentation needs to be a lot better. People needs to
>> understand that concurrency and parallelism is here to stay and a lot of the
>> time it still ain't that pretty.
>>>> I agree that people should get used to the fact that parallelism is
>>>> out there. The only downside here is that if one really wants to be
>>>> sure the code always executes correctly, a user has to write:
>>>> for (x <- xs.seq) ...
>>>> all the time.
>>>> The problem is that someone might not know he's asking for parallel
>>>> execution.
>>>>> ...
>>>>> def oldMethod(seq: Seq[String]) { for (e <- seq) println(e) }
>>>>> But new code that really wants parallel execution could just do .par
>>>>> on that Seq[T], which would somehow kick it into parallel execution.
>>>>> def newMethod(pSeq: Seq[String] with Parallel) { for (e <- pSeq)
>> println(e) }
>>>>> By saying the method takes a Seq[String] with Parallel, it is
>>>>> announcing that it might do foreaches in parallel.
>>>>> I think it would be fine to let flatMap, filter, withFilter, map be
>>>>> rewritten in the same way, because most times people wouldn't have
>>>>> side effects in those. But foreach is all about the side effects.
>>>>> Bill
>>>> I agree something like a compiler rewrite might possibly be a good
>>>> idea.
>>>> However, adding an additional `pareach` method means for-
>>>> comprehensions no longer work.
>>> Actually after sleeping on it last night, I emailed a different note
>>> this morning in which I changed my tune. I suggested the rules for
>>> translating for expressions to invocations of foreach, withFilter,
>>> map, and flatMap should stay the same as they are now. For expressions
>>> are about more than just collections.
>>>
>>> Now to your suggestion that adding a separate pareach method would
>>> make for expressions not work, I'd say they still work like they
>>> always did, but just that foreach is not something that will be done
>>> in parallel in parallel collections, whereas map, flatMap, and
>>> withFilter (and a bunch of other higher order methods on collections)
>>> will be done in parallel on parallel collections. That would mean you
>>> could never get a parallel foreach out of a for expression, and I
>>> don't think that's so bad. A parallel foreach requires a thread-safe
>>> side-effecting function, which is hard to write. So making the method
>>> name long and (to some) ugly, such as parallelForeach or
>>> foreachParallel, discourages its use for frivolous endeavors, and so
>>> does not providing a way to get a parallel foreach out of a for
>>> expression.
>>>
>>> Bill
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>> A couple of alternatives I see are:
>>>> - have `foreach` being sequential, and add a method `inPar` to
>>>> parallel collections, which returns a thin wrapper around the
>>>> collection such that its `foreach` is executed in parallel (forwards
>>>> to `pareach`). The compiler could then rewrite the for-comprehension
>>>> like this if it detects that the generator isInstanceOf[Parallel] at
>>>> compile time:
>>>> for (x <- xs) { ... }
>>>> <<if xs is <: Parallel at compile time >>
>>>> --> for (x <- xs.inPar) { ... }
>>>> (now it calls `foreach` of the thin wrapper which forwards to
>>>> `pareach`)
>>>> This does not solve the problem with map, flatMap, filter, and
>>>> friends, but it is a start - something like this could work for them.
>>>> - have a compiler rewrite rule which adds a `.seq` call to the
>>>> generator if it knows it is a collection, but not an instance of
>>>> `Parallel` at compile time:
>>>> for (x <- Iterable(1, 2, 3)) { ... } becomes for (x <- Iterable(1, 2,
>>>> 3).seq)
>>>> but:
>>>> for (x <- Iterable(1, 2, 3).par) { ... } stays what it is
>>>> - a non-rewriting approach - have a parallel block which envelops the
>>>> for-comprehension, ensuring that everything inside executes in
>>>> parallel:
>>>> parallel {
>>>>  for ( ... ) {
>>>>    // ...
>>>>  }
>>>> }
>>>> This requires the user to be more explicit. The downside is that
>>>> nested methods called in the for-body might be affected.
>>> --
>>> Bill Venners
>>> Artima, Inc.http://www.artima.com


--
Tony Morris
http://tmorris.net/





--
Viktor Klang,
Director of Research and Development
Scalable Solutions

Code:   github.com/viktorklang
Follow: twitter.com/viktorklang
Read:   klangism.tumblr.com

Re: Re: REPL: null-values in parallel ranges?

An implicit with a default which communicates what kinds of
auto-parallelizing are acceptable seems like it could have something for
everyone, even if it's kind of scary to contemplate. Scariness aside,
the idea of guiding for-comprehension desugaring with my choice of
imports[1][2] is pretty awesome.

[1] Though not doable as described anytime soon, because right now the
compiler translates for comprehensions during parsing and has no idea
what implicits are applicable.

[2] Also not doable anytime soon because martin is having a mild heart
attack reading this idea.

Re: Re: REPL: null-values in parallel ranges?

Hi, Bill,

> Now to your suggestion that adding a separate pareach method would
> make for expressions not work, I'd say they still work like they
> always did, but just that foreach is not something that will be done
> in parallel in parallel collections, whereas map, flatMap, and
> withFilter (and a bunch of other higher order methods on collections)
> will be done in parallel on parallel collections.

Of course, that's what I've meant - one can still call `pareach`, but
cannot use it in a nice way as before.

> That would mean you
> could never get a parallel foreach out of a for expression, and I
> don't think that's so bad. A parallel foreach requires a thread-safe
> side-effecting function, which is hard to write. So making the method
> name long and (to some) ugly, such as parallelForeach or
> foreachParallel, discourages its use for frivolous endeavors, and so
> does not providing a way to get a parallel foreach out of a for
> expression.
>
> Bill
>

I see merit in this - and I agree. Foreach is about side-effecting, and
most of the time it's sideeffecting in a non-thread-safe way. Just for
the record, the method you mention was already part of the framework,
but was removed some revisions back (used to be called `pforeach`, but I
like `pareach` much more).

But, even changing this, does not mean that the legacy code will be
safe. The `pareach` alternative is still a rule of the thumb - `map`,
`flatMap` and others still only usually do not contain sideeffecting
expressions. In cases where they do, things may still fall apart.

Alex

Re: Re: REPL: null-values in parallel ranges?

I see two possible ways out of this:

1) Fiddle with foreach, so that it and for-loops are sequential for parallel collections. You still need a parallel foreach, for instance in situations like this:

(1 until N) pforeach { i => a(i) = a(i) * scale }

2) Change the inheritance hierarchy. The problem is that our current design weakens the contract for Traversable, Iterable, Seq and friends. Their operations might now be executed in parallel. It's most glaring in foreach, but there are other operations where this can make a difference. What's more, there's no way to specify sequential execution in the type system except going down to actual implementation classes. This is a bit having it backwards. First, sequential Seq is conceptionally a subclass of potentially parallel ParSeq. But right now ParSeq <: Seq. Second, it's far more useful to state that something is sequential than that something is potentially parallel. But right now, we have explicit parallel collections, but no explicit sequential collections.

I see two possible redesigns:

a) Reverse the inheritance order, so that

   Traversable <: ParTraversable
   Iterable <: ParIterable
   Seq <: ParSeq
   ...

This means that all existing code will be safe from surprises, and no existing code can make use of parallel execution. Note that this is conceptually sound, as Par... means "might be run in parallel", not "will be run in parallel". A problem with this approach is that, without further refactorings, sequential collections inherit some unnecessary plumbing operations from parallel collections.

b) Have both implementations side by side. E.g. have a base class collection.Seq with subclasses

  collection.immutable.Seq
  collection.mutable.Seq
  collection.parimmutable.Seq
  collection.parmutable.Seq

There is no inheritance relationship between the 4 different kinds of Seq. In this design, code that requires a collection.Seq can be passed a parallel Seq, but code the requires a mutable.Seq or immutable.Seq cannot.

Something to mull over...

Cheers

 -- Martin


Re: Re: REPL: null-values in parallel ranges?

Hi All,

I'd like to take a step back and state two meta-goals I'd hope for
about the introduction of parallel collections to Scala. One is that
great pain should be taken to avoid breaking existing code, but that
it is OK if some existing code breaks (so long as it isn't much) if
that's unavoidable to do things the "right way." The other is parallel
collections should be done the "right way." I think it is early enough
in Scala's adoption curve to do things the right way, and the feeling
I get from the Scala community is that's what people by and large
would want at this point (so long as it doesn't break much code).

I think the right way is that types Traversable, Iterable, Seq, Map,
and Set should allow sequential or parallel implementations. These are
the types I want to put in my method signatures, and I think the
default for Scala should be that parallel execution of the
higher-order methods is possible. This is what the future needs
because of multi-core.

Sequential should be the special case, not parallel, so the marker
trait should probably be Sequential, not Parallel. Thus if I need a
sequential map to be passed to my method, I'd put Map with Sequential.
But that would and should be rare. Usually I'd just say I want a Map,
and write it to make allow parallel or sequential collections passed.
And I think Scala makes that easy, so long as foreach is not run in
parallel, not matter what type of collection comes in.

Foreach's contract in Traversable should be tightened to say it will
run sequentially and a pforeach added. Since there would be no
Parallel marker trait to declare a pforeach in, you could instead
declare pforeach in Traversable with the contract "might be executed
in parallel." That way no matter what type you get, you can always
call pforeach when you really want that. It would only execute in
parallel if it is a parallel collection. (I.e., it's OK if a
thread-safe, side-effecting function gets executed sequentially from
time to time.)

Use of pforeach would be rare, because most times that's not what
people want. It is OK that you can't get a pforeach invocation out of
a for expression. So the rules for translating for expressions would
not change.

I'm not sure about this one, but it seems consistent to say that by
default we should get a parallel collection out of the concrete types,
and change it to seq only by a call to .seq, rather than the other way
around. So when I say List(1, 2, 3), I'd get a parallel list. To get a
sequential one, I'd say List(1, 2 ,3).seq. Perhaps that's too radical,
but it seems consistent with the goal of making it easier to take
advantage of multi-core.

I think this approach would not break much code because my sense is
that by and large, people don't pass side-effecting functions to
higher-order methods other than foreach. To foreach by and large
people do pass side-effecting functions. It would probably be good for
people to look around at their own code and try and ascertain the
extent to which this is true. But I think it would also be already
considered very unidiomatic to pass a function to map that also has a
side effect. Maybe this kind of code has snuck in inside for
expressions that have a yield yet increment a var counter. It is
possible people have written things like that, but I think we'd all
consider that unidiomatic, and I'd guess it nevertheless would be
rare.

Now, my whole theory is based on the premise that foreach is the
problem. Martin, you mentioned that "It's most glaring in foreach,
there are other operations where [executing in parallel] can make a
difference." What other operations are you thinking of?

Bill

On Mon, Apr 4, 2011 at 1:50 AM, martin odersky wrote:
> I see two possible ways out of this:
>
> 1) Fiddle with foreach, so that it and for-loops are sequential for parallel
> collections. You still need a parallel foreach, for instance in situations
> like this:
>
> (1 until N) pforeach { i => a(i) = a(i) * scale }
>
> 2) Change the inheritance hierarchy. The problem is that our current design
> weakens the contract for Traversable, Iterable, Seq and friends. Their
> operations might now be executed in parallel. It's most glaring in foreach,
> but there are other operations where this can make a difference. What's
> more, there's no way to specify sequential execution in the type system
> except going down to actual implementation classes. This is a bit having it
> backwards. First, sequential Seq is conceptionally a subclass of potentially
> parallel ParSeq. But right now ParSeq <: Seq. Second, it's far more useful
> to state that something is sequential than that something is potentially
> parallel. But right now, we have explicit parallel collections, but no
> explicit sequential collections.
>
> I see two possible redesigns:
>
> a) Reverse the inheritance order, so that
>
>    Traversable <: ParTraversable
>    Iterable <: ParIterable
>    Seq <: ParSeq
>    ...
>
> This means that all existing code will be safe from surprises, and no
> existing code can make use of parallel execution. Note that this is
> conceptually sound, as Par... means "might be run in parallel", not "will be
> run in parallel". A problem with this approach is that, without further
> refactorings, sequential collections inherit some unnecessary plumbing
> operations from parallel collections.
>
> b) Have both implementations side by side. E.g. have a base class
> collection.Seq with subclasses
>
>   collection.immutable.Seq
>   collection.mutable.Seq
>   collection.parimmutable.Seq
>   collection.parmutable.Seq
>
> There is no inheritance relationship between the 4 different kinds of Seq.
> In this design, code that requires a collection.Seq can be passed a parallel
> Seq, but code the requires a mutable.Seq or immutable.Seq cannot.
>
> Something to mull over...
>
> Cheers
>
>  -- Martin
>
>
>

Re: Re: REPL: null-values in parallel ranges?

> I think the right way is that types Traversable, Iterable, Seq, Map,
> and Set should allow sequential or parallel implementations. These are
> the types I want to put in my method signatures, and I think the
> default for Scala should be that parallel execution of the
> higher-order methods is possible. This is what the future needs
> because of multi-core.
>
> Sequential should be the special case, not parallel, so the marker
> trait should probably be Sequential, not Parallel. Thus if I need a
> sequential map to be passed to my method, I'd put Map with Sequential.
> But that would and should be rare. Usually I'd just say I want a Map,
> and write it to make allow parallel or sequential collections passed.
> And I think Scala makes that easy, so long as foreach is not run in
> parallel, not matter what type of collection comes in.
>

I agree on this, but I have the feeling that concepts such as a Map, Seq
or Set are already used too often.
Perhaps a more general concept could envelop both sequential and
parallel collections, e.g. be a supertrait of both Seq-s and ParSeq-s.
If this is the way to go, I lack the vocabulary to think of the name.

> I think this approach would not break much code because my sense is
> that by and large, people don't pass side-effecting functions to
> higher-order methods other than foreach.

From what I've seen, this is true - people don't often use side-effects
for `map`, `flatMap` and others.

However, the issue is not just whether or not it breaks a lot of code,
but in what ways it breaks it. I've had a test case once that it took me
almost a week to debug, before realizing a ListBuffer was being filled
in an innocent looking for loop using `++=`. That was when I added
`pforeach`.
The code may only be broken once in this way, and it will cause a lot of
grief. A singular hidden use of `map` in some obscure piece of code may.
And these kinds of bugs are rarely reproducable.

I believe that if it were about breaking all the existing code out there
provided that the compiler generates an error message for the breakage,
the price would still be much smaller to pay than to introduce a bug
that occurs almost never but is almost impossible to track down.

Also, I've done quite some refactorings of the standard library to
ensure `.seq` gets called in the right places before executing the `for`
loop. And I missed the REPL, and I've never refactored a couple of
occurences of this in the xml package.

> Now, my whole theory is based on the premise that foreach is the
> problem. Martin, you mentioned that "It's most glaring in foreach,
> there are other operations where [executing in parallel] can make a
> difference." What other operations are you thinking of?

Any other operation taking a predicate, mapping function, grouping
function, etc. (find, count, groupBy, flatMap,...) could side-effect and
potentially cause the same kind of nondeterministic bug which can be
detected only at some other point in code.

Re: Re: REPL: null-values in parallel ranges?

On 4/4/11 9:32 AM, Aleksandar Prokopec wrote:
> Also, I've done quite some refactorings of the standard library to
> ensure `.seq` gets called in the right places before executing the `for`
> loop.

I know it. When I went looking for more likely trouble I kept finding
your .seqs bugblocking me.

> And I missed the REPL

I'd say it's a good thing you did, or it'd be that much longer before
this discussion was being had.

Re: Re: REPL: null-values in parallel ranges?

Agreed, and removing the `pforeach` was a good thing for the same reason.

>> And I missed the REPL
>
> I'd say it's a good thing you did, or it'd be that much longer before
> this discussion was being had.
>

Re: Re: REPL: null-values in parallel ranges?

On Mon, Apr 4, 2011 at 11:33 AM, Bill Venners <bill [at] artima [dot] com> wrote:
Now, my whole theory is based on the premise that foreach is the
problem. Martin, you mentioned that "It's most glaring in foreach,
there are other operations where [executing in parallel] can make a
difference." What other operations are you thinking of?

I have a non-negligible amount of code that looks like this:

  class HighPerformance {
    var index: Int = 0
    . . .
  }

  val i = new Incrementer  // You can imagine what this does
  someseq.map(s => { val hp = hpGenerator(s); hp.index = i++; hp })

instead of:

  someseq.zipWithIndex.map{ case (s,i) => val hp = hpGenerator(s); hp.index = i; hp })

though I admit that I could instead:

  val b = new SomethingBuilder
  val i = new Incrementer
  someseq.foreach(s => { val hp = hpGenerator(s); hp.index = i++; b += hp })
  b.result()

I also use this sort of thing with folds in place of

  (0 /: someseq.zipWithIndex)((l,r) => l + r._1.whatever + f(r._2))

which is, sadly, sometimes not very fast.  Instead:

  val i = new Incrementer
  (0 /: someseq)((l,r) => l + r.whatever + f(i++))

which is nearly as fast as and a lot more maintainable than

  var acc = 0
  var i = 0
  val sit = someseq.iterator
  while (sit.hasNext) {
    acc += sit.next.whatever + f(i)
    i += 1
  }

I've previously given an example of filterIndex; you can easily imagine (and I have used) this sort of indexing to achieve a combination of span and take(n); and so on.

You can also use this to gain history-dependence in your operations.  So, for example, if you want to find an item depending on the previous item, you could

  myseq.sliding(2).find(x => p(x(0), x(1)).map(_(1))

(note that zipped doesn't have a find, or you could use it on (myseq.tail, myseq)).  Instead,

  val prev = MOption[TheType](null)  // You can imagine this, too
  myseq.find(x => { val ans = prev.isDefined && p(prev.get,x); prev = x; ans })

which works nearly as well as the corresponding while loop.  (Actually, I have a TestWithPrevious class that I would use instead, but you get the idea.)

And so on.

Now, if you say, "Well, but that's only if you really care about performance!" I would say, "If you don't care about performance _why use parallel collections_!"

Anyway, there are lots of opportunities to use mutable state in collections to improve performance, sometimes by factors of two and sometimes by factors of ten (especially once you start taking garbage collection into account, which you often do, since you can in parallel generate garbage so much faster).  With a few utility classes, you can make this use of mutable state very safe, if not _quite_ as safe as immutable-only, unless the operation is parallelized.  There should be some very inexpensive (or free (i.e. compile-time)) way to safely support these sorts of things.

  --Rex

Re: Re: REPL: null-values in parallel ranges?



On Mon, Apr 4, 2011 at 5:33 PM, Bill Venners <bill [at] artima [dot] com> wrote:
Hi All,

I'd like to take a step back and state two meta-goals I'd hope for
about the introduction of parallel collections to Scala. One is that
great pain should be taken to avoid breaking existing code, but that
it is OK if some existing code breaks (so long as it isn't much) if
that's unavoidable to do things the "right way." The other is parallel
collections should be done the "right way." I think it is early enough
in Scala's adoption curve to do things the right way, and the feeling
I get from the Scala community is that's what people by and large
would want at this point (so long as it doesn't break much code).

I think the right way is that types Traversable, Iterable, Seq, Map,
and Set should allow sequential or parallel implementations. These are
the types I want to put in my method signatures, and I think the
default for Scala should be that parallel execution of the
higher-order methods is possible. This is what the future needs
because of multi-core.


If we were starting out, I'd agree with you. But the fact is that far too much code exists already that treats Seq, Iterable and friends as sequential. I believe we can't change that contract anymore. So we need to live with the naming convention that, e.g., Iterable is always sequential whereas ParIterable can be processed in parallel. I wish I could change that, but believe it's too late for that.

Sequential should be the special case, not parallel, so the marker
trait should probably be Sequential, not Parallel. Thus if I need a
sequential map to be passed to my method, I'd put Map with Sequential. 
But that would and should be rare. Usually I'd just say I want a Map,
and write it to make allow parallel or sequential collections passed.
And I think Scala makes that easy, so long as foreach is not run in
parallel, not matter what type of collection comes in.

Foreach's contract in Traversable should be tightened to say it will
run sequentially and a pforeach added.

But that's what I think is the wrong way. The collections design explicitly says that to get parallel operations, you write cs.par, to get sequential ones you write pxs.seq. It does not introduce any new variants of existing operations that are parallel. Making foreach sequential and introducing pforeach is like admitting that this design works only 95% of the time. And it does not exclude errors because people do use side effects in for expressions as well as in for loops.
For instance:

  var sum = 0
  for (x <- xs) yield { sum += x; sum }

(Yes, I know about scanLeft, but many people will use the for expression instead).

 That is, if we introduce pforeach I guarantee that the next person will immediately demand that we also make map, flatMap, withFilter sequential and have pmap, pflatMap, pWithFilter. So muddling the waters would lead us down a slippery slide.
 
Cheers

 -- Martin

Re: Re: REPL: null-values in parallel ranges?

Hi Martin,

On Mon, Apr 4, 2011 at 9:19 AM, martin odersky wrote:
>
>
> On Mon, Apr 4, 2011 at 5:33 PM, Bill Venners wrote:
>>
>> Hi All,
>>
>> I'd like to take a step back and state two meta-goals I'd hope for
>> about the introduction of parallel collections to Scala. One is that
>> great pain should be taken to avoid breaking existing code, but that
>> it is OK if some existing code breaks (so long as it isn't much) if
>> that's unavoidable to do things the "right way." The other is parallel
>> collections should be done the "right way." I think it is early enough
>> in Scala's adoption curve to do things the right way, and the feeling
>> I get from the Scala community is that's what people by and large
>> would want at this point (so long as it doesn't break much code).
>>
>> I think the right way is that types Traversable, Iterable, Seq, Map,
>> and Set should allow sequential or parallel implementations. These are
>> the types I want to put in my method signatures, and I think the
>> default for Scala should be that parallel execution of the
>> higher-order methods is possible. This is what the future needs
>> because of multi-core.
>>
>
> If we were starting out, I'd agree with you. But the fact is that far too
> much code exists already that treats Seq, Iterable and friends as
> sequential. I believe we can't change that contract anymore. So we need to
> live with the naming convention that, e.g., Iterable is always sequential
> whereas ParIterable can be processed in parallel. I wish I could change
> that, but believe it's too late for that.
>
I wonder if it would help if instead of naming the types like
"scala.collection.ParMap" they were named like
"scala.collection.par.Map." The problem I have with these names as
Aleksandar described them is that they seem non-intuitive, and that's
un-scala-like. scala.collection.Map sounds like the general concept,
whereas scala.collection.ParMap sounds like a specialization, and
moreover the specialization it sounds like is "parallel Map". But Map
means sequential map and ParMap means possibly parallel Map, but could
be sequential. From the Liskov perspective, it does make sense that
Map extends ParMap, but from the English perspective it does not. All
RipeBananas are Bananas, but not all Bananas are RipeBananas. With Map
and ParMap its the opposite. And the type of a definitely parallel Map
I'm guessing would be ParMap with Parallel, which seems redundant and
misleading.

If it is par.Map instead of ParMap, then it sounds more natural from
an English perspective: like this kind of Map is a subtype of that
kind of Map. Also I can import scala.collection._ and use Map and
par.Map in my code, kind of like we can use mutable.Map and
immutable.Map. If I import scala.collection.par._, and I want to say a
definitely parallel Map, I simply write Map with Parallel, which looks
nicer. Actually I'm not sure this is important to be able to write
definitely parallel map, but seems like if I want to use these for
performance optimization, I'd want to be sure I'm actually using a
parallel one. If this were really useful, would it make sense to also
declare scala.collection.par.ParallelMap to mean definitely parallel
map?

The justification given for scala.collection.Map being the default
type can be that sequential collections are much simpler and safer to
use, so they should be the default. (Another big reason is backwards
compatibility, but even without that, I've come around to the view
that sequential should be the starting point for the reasons others on
this list have argued.)

Another concern that has been raised is the problem of type inference.
If I call into a method that was returning a Map and place the result
in a variable with no explicit type declaration, then later that
methods gets changed to return a ParMap (or par.Map), my code will
likely still compile, because the APIs are the same. This is
essentially an unintentional upcasting that to a possibly parallel
Map, which would compile but possibly create those scary bugs we're
worried about. I wonder if the compiler could optionally flag a
warning if a variable is being inferred to a possibly parallel type.
The way to get rid of the warning, other than telling he compiler not
to give it with a compiler option, would be to declare the type
explicitly in the code. That would make it more obvious to the writer
and later readers of the code that it's possibly parallel.

>> Sequential should be the special case, not parallel, so the marker
>> trait should probably be Sequential, not Parallel. Thus if I need a
>> sequential map to be passed to my method, I'd put Map with Sequential.
>>
>> But that would and should be rare. Usually I'd just say I want a Map,
>> and write it to make allow parallel or sequential collections passed.
>> And I think Scala makes that easy, so long as foreach is not run in
>> parallel, not matter what type of collection comes in.
>>
>> Foreach's contract in Traversable should be tightened to say it will
>> run sequentially and a pforeach added.
>
> But that's what I think is the wrong way. The collections design explicitly
> says that to get parallel operations, you write cs.par, to get sequential
> ones you write pxs.seq. It does not introduce any new variants of existing
> operations that are parallel. Making foreach sequential and introducing
> pforeach is like admitting that this design works only 95% of the time. And
> it does not exclude errors because people do use side effects in for
> expressions as well as in for loops.
> For instance:
>
>   var sum = 0
>   for (x <- xs) yield { sum += x; sum }
>
> (Yes, I know about scanLeft, but many people will use the for expression
> instead).
>
>  That is, if we introduce pforeach I guarantee that the next person will
> immediately demand that we also make map, flatMap, withFilter sequential and
> have pmap, pflatMap, pWithFilter. So muddling the waters would lead us down
> a slippery slide.
>
Water slides can be fun, but as BDFL you can also just say no. And I
think you'd hear other voices from the community that would back you
up in your no to pmap. I definitely would not want to see a pmap, even
though I would like to see a pforeach accompanying a sequential
foreach.

Bill

> Cheers
>
>  -- Martin
>
>

Re: REPL: null-values in parallel ranges?

I agree that the name ParMap may be somewhat misleading as a supertype
of Map. However, these are orthogonal to the class hierarchy changes.
Once we decide what is the most appropriate name, we can easily rename
the methods to something else.

As for the compiler warning that a return type is now a parallel
collection, I believe a library designer could use migration warnings
in these cases, or a similar mechanism, along with explicit type
annotations on the client's part.

On Apr 5, 7:39 am, Bill Venners wrote:
> Hi Martin,
>
>
>
> On Mon, Apr 4, 2011 at 9:19 AM, martin odersky wrote:
>
> > On Mon, Apr 4, 2011 at 5:33 PM, Bill Venners wrote:
>
> >> Hi All,
>
> >> I'd like to take a step back and state two meta-goals I'd hope for
> >> about the introduction of parallel collections to Scala. One is that
> >> great pain should be taken to avoid breaking existing code, but that
> >> it is OK if some existing code breaks (so long as it isn't much) if
> >> that's unavoidable to do things the "right way." The other is parallel
> >> collections should be done the "right way." I think it is early enough
> >> in Scala's adoption curve to do things the right way, and the feeling
> >> I get from the Scala community is that's what people by and large
> >> would want at this point (so long as it doesn't break much code).
>
> >> I think the right way is that types Traversable, Iterable, Seq, Map,
> >> and Set should allow sequential or parallel implementations. These are
> >> the types I want to put in my method signatures, and I think the
> >> default for Scala should be that parallel execution of the
> >> higher-order methods is possible. This is what the future needs
> >> because of multi-core.
>
> > If we were starting out, I'd agree with you. But the fact is that far too
> > much code exists already that treats Seq, Iterable and friends as
> > sequential. I believe we can't change that contract anymore. So we need to
> > live with the naming convention that, e.g., Iterable is always sequential
> > whereas ParIterable can be processed in parallel. I wish I could change
> > that, but believe it's too late for that.
>
> I wonder if it would help if instead of naming the types like
> "scala.collection.ParMap" they were named like
> "scala.collection.par.Map." The problem I have with these names as
> Aleksandar described them is that they seem non-intuitive, and that's
> un-scala-like. scala.collection.Map sounds like the general concept,
> whereas scala.collection.ParMap sounds like a specialization, and
> moreover the specialization it sounds like is "parallel Map". But Map
> means sequential map and ParMap means possibly parallel Map, but could
> be sequential. From the Liskov perspective, it does make sense that
> Map extends ParMap, but from the English perspective it does not. All
> RipeBananas are Bananas, but not all Bananas are RipeBananas. With Map
> and ParMap its the opposite. And the type of a definitely parallel Map
> I'm guessing would be ParMap with Parallel, which seems redundant and
> misleading.
>
> If it is par.Map instead of ParMap, then it sounds more natural from
> an English perspective: like this kind of Map is a subtype of that
> kind of Map. Also I can import scala.collection._ and use Map and
> par.Map in my code, kind of like we can use mutable.Map and
> immutable.Map. If I import scala.collection.par._, and I want to say a
> definitely parallel Map, I simply write Map with Parallel, which looks
> nicer. Actually I'm not sure this is important to be able to write
> definitely parallel map, but seems like if I want to use these for
> performance optimization, I'd want to be sure I'm actually using a
> parallel one. If this were really useful, would it make sense to also
> declare scala.collection.par.ParallelMap to mean definitely parallel
> map?
>
> The justification given for scala.collection.Map being the default
> type can be that sequential collections are much simpler and safer to
> use, so they should be the default. (Another big reason is backwards
> compatibility, but even without that, I've come around to the view
> that sequential should be the starting point for the reasons others on
> this list have argued.)
>
> Another concern that has been raised is the problem of type inference.
> If I call into a method that was returning a Map and place the result
> in a variable with no explicit type declaration, then later that
> methods gets changed to return a ParMap (or par.Map), my code will
> likely still compile, because the APIs are the same. This is
> essentially an unintentional upcasting that to a possibly parallel
> Map, which would compile but possibly create those scary bugs we're
> worried about. I wonder if the compiler could optionally flag a
> warning if a variable is being inferred to a possibly parallel type.
> The way to get rid of the warning, other than telling he compiler not
> to give it with a compiler option, would be to declare the type
> explicitly in the code. That would make it more obvious to the writer
> and later readers of the code that it's possibly parallel.
>
>
>
> >> Sequential should be the special case, not parallel, so the marker
> >> trait should probably be Sequential, not Parallel. Thus if I need a
> >> sequential map to be passed to my method, I'd put Map with Sequential.
>
> >> But that would and should be rare. Usually I'd just say I want a Map,
> >> and write it to make allow parallel or sequential collections passed.
> >> And I think Scala makes that easy, so long as foreach is not run in
> >> parallel, not matter what type of collection comes in.
>
> >> Foreach's contract in Traversable should be tightened to say it will
> >> run sequentially and a pforeach added.
>
> > But that's what I think is the wrong way. The collections design explicitly
> > says that to get parallel operations, you write cs.par, to get sequential
> > ones you write pxs.seq. It does not introduce any new variants of existing
> > operations that are parallel. Making foreach sequential and introducing
> > pforeach is like admitting that this design works only 95% of the time. And
> > it does not exclude errors because people do use side effects in for
> > expressions as well as in for loops.
> > For instance:
>
> >   var sum = 0
> >   for (x <- xs) yield { sum += x; sum }
>
> > (Yes, I know about scanLeft, but many people will use the for expression
> > instead).
>
> >  That is, if we introduce pforeach I guarantee that the next person will
> > immediately demand that we also make map, flatMap, withFilter sequential and
> > have pmap, pflatMap, pWithFilter. So muddling the waters would lead us down
> > a slippery slide.
>
> Water slides can be fun, but as BDFL you can also just say no. And I
> think you'd hear other voices from the community that would back you
> up in your no to pmap. I definitely would not want to see a pmap, even
> though I would like to see a pforeach accompanying a sequential
> foreach.
>
> Bill
>
> > Cheers
>
> >  -- Martin
>
> --
> Bill Venners
> Artima, Inc.http://www.artima.com

Re: Re: REPL: null-values in parallel ranges?



On Tue, Apr 5, 2011 at 9:18 AM, Aleksandar Prokopec <aleksandar [dot] prokopec [at] gmail [dot] com> wrote:
I agree that the name ParMap may be somewhat misleading as a supertype
of Map. However, these are orthogonal to the class hierarchy changes.
Once we decide what is the most appropriate name, we can easily rename
the methods to something else.

I think we should find a different name than ParSeq, ParMap for the supertraits of Seq and Map. E.g., ParVector means Vector with a parallel implementation, whereas for Seq, Map we want to say it might be parallel but need not be, which is different. How about PMap, PSeq, ... instead. Where the P could mean "Potentially Parallel". Not sure about that either but it seems an improvement over ParMap. Do people have another suggestions?

Cheers

 -- Martin

Copyright © 2013 École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland