This page is no longer maintained — Please continue to the home page at www.scala-lang.org

REPL: null-values in parallel ranges?

170 replies
odersky
Joined: 2008-07-29,
User offline. Last seen 45 weeks 6 days ago.
Re: Re: REPL: null-values in parallel ranges?


On Mon, Apr 4, 2011 at 4:37 PM, Ismael Juma <ismael [at] juma [dot] me [dot] uk> wrote:
On Mon, Apr 4, 2011 at 1:46 PM, Aleksandar Prokopec
<aleksandar [dot] prokopec [at] gmail [dot] com> wrote:
> (only Seqs shown above, but Sets and Maps follow the same pattern).

I have a question. If I do seq.par.map(identity), will the resulting
Seq be == to the source Seq (i.e. the order of the Seq will be
maintained even if multiple operations are executed in parallel)?

Yes. Orderedness of sequence elements is orthogonal to parallel evaluation.

-- Martin

Ismael Juma 2
Joined: 2011-01-22,
User offline. Last seen 42 years 45 weeks ago.
Re: Re: REPL: null-values in parallel ranges?

On Mon, Apr 4, 2011 at 3:43 PM, martin odersky wrote:
> Yes. Orderedness of sequence elements is orthogonal to parallel evaluation.

Great, that is what I expected.

Best,
Ismael

Aleksandar Prokopec
Joined: 2010-04-04,
User offline. Last seen 42 years 45 weeks ago.
Re: REPL: null-values in parallel ranges?

On 04/04/11 16:00, Matthew Pocock wrote:
> This looks good to me. It will need clearly documenting - users may
> assume that Par* is parallel, but with this hierarchy, Par* has the
> option of being parallel, and it's the implementations that decide. I'd
> personally have introduced another trait for things which definitely are
> going to be parallel with the Par prefix, and named these 'may or may
> not be parallel' things something else. However, I see good modelling
> reasons to keep things as you have them. Either way, I think your
> hierarchy is a reasonable way to provide parallelism without breaking
> every single line of existing collections code in the wild :D
>
> Matthew
>

True, in the diagram ParIterable, ParSeq and others could be renamed to
something more precise, but I'm unsure what the name would be.

Aleksandar Prokopec
Joined: 2010-04-04,
User offline. Last seen 42 years 45 weeks ago.
Re: Re: REPL: null-values in parallel ranges?

> I have a question. If I do seq.par.map(identity), will the resulting
> Seq be == to the source Seq (i.e. the order of the Seq will be
> maintained even if multiple operations are executed in parallel)?
>

Yes, unless there is a bug somewhere I've missed, it most definitely
should. Unit tests were written around this assumption.
In fact, here's a little background - == should pattern match on the
object type to figure out it's a Seq, then call `sameElements`.
Method `sameElements` does the following:
- if the argument is a sequential collection, it can't do anything too
smart, so it matches the elements sequentially
- if the argument is a parallel collection, it can split both
collections in orderly subsets and compare the subsets concurrently

Cheers,
Alex

Aleksandar Prokopec
Joined: 2010-04-04,
User offline. Last seen 42 years 45 weeks ago.
Re: REPL: null-values in parallel ranges?

>
>
> I'm more worried about visibility, races etc.
>
>

Can you elaborate further?

If you are using a collection known to be sequential at compile time,
then there should be no problems with data races.

If the collection is not known to have sequential type at compile time,
that is, it's possibly parallel, then the closures should not have
side-effects, or have thread-safe side-effects.
For instance, instance, it's safe to use a concurrent map insertion in a
foreach, or some other HOF, while it isn't safe to use array buffer
appending.

Furthermore, once the parallel operation completes, threads are
synchronized. So it should be safe to use non-synchronized side-effects
too, as long as the changes monotonic (e.g. writing `true` to a
non-volatile boolean variable). Although I would not recommend this.

Bill Venners
Joined: 2008-12-18,
User offline. Last seen 31 weeks 5 days ago.
Re: Re: REPL: null-values in parallel ranges?

Hi All,

I'd like to take a step back and state two meta-goals I'd hope for
about the introduction of parallel collections to Scala. One is that
great pain should be taken to avoid breaking existing code, but that
it is OK if some existing code breaks (so long as it isn't much) if
that's unavoidable to do things the "right way." The other is parallel
collections should be done the "right way." I think it is early enough
in Scala's adoption curve to do things the right way, and the feeling
I get from the Scala community is that's what people by and large
would want at this point (so long as it doesn't break much code).

I think the right way is that types Traversable, Iterable, Seq, Map,
and Set should allow sequential or parallel implementations. These are
the types I want to put in my method signatures, and I think the
default for Scala should be that parallel execution of the
higher-order methods is possible. This is what the future needs
because of multi-core.

Sequential should be the special case, not parallel, so the marker
trait should probably be Sequential, not Parallel. Thus if I need a
sequential map to be passed to my method, I'd put Map with Sequential.
But that would and should be rare. Usually I'd just say I want a Map,
and write it to make allow parallel or sequential collections passed.
And I think Scala makes that easy, so long as foreach is not run in
parallel, not matter what type of collection comes in.

Foreach's contract in Traversable should be tightened to say it will
run sequentially and a pforeach added. Since there would be no
Parallel marker trait to declare a pforeach in, you could instead
declare pforeach in Traversable with the contract "might be executed
in parallel." That way no matter what type you get, you can always
call pforeach when you really want that. It would only execute in
parallel if it is a parallel collection. (I.e., it's OK if a
thread-safe, side-effecting function gets executed sequentially from
time to time.)

Use of pforeach would be rare, because most times that's not what
people want. It is OK that you can't get a pforeach invocation out of
a for expression. So the rules for translating for expressions would
not change.

I'm not sure about this one, but it seems consistent to say that by
default we should get a parallel collection out of the concrete types,
and change it to seq only by a call to .seq, rather than the other way
around. So when I say List(1, 2, 3), I'd get a parallel list. To get a
sequential one, I'd say List(1, 2 ,3).seq. Perhaps that's too radical,
but it seems consistent with the goal of making it easier to take
advantage of multi-core.

I think this approach would not break much code because my sense is
that by and large, people don't pass side-effecting functions to
higher-order methods other than foreach. To foreach by and large
people do pass side-effecting functions. It would probably be good for
people to look around at their own code and try and ascertain the
extent to which this is true. But I think it would also be already
considered very unidiomatic to pass a function to map that also has a
side effect. Maybe this kind of code has snuck in inside for
expressions that have a yield yet increment a var counter. It is
possible people have written things like that, but I think we'd all
consider that unidiomatic, and I'd guess it nevertheless would be
rare.

Now, my whole theory is based on the premise that foreach is the
problem. Martin, you mentioned that "It's most glaring in foreach,
there are other operations where [executing in parallel] can make a
difference." What other operations are you thinking of?

Bill

On Mon, Apr 4, 2011 at 1:50 AM, martin odersky wrote:
> I see two possible ways out of this:
>
> 1) Fiddle with foreach, so that it and for-loops are sequential for parallel
> collections. You still need a parallel foreach, for instance in situations
> like this:
>
> (1 until N) pforeach { i => a(i) = a(i) * scale }
>
> 2) Change the inheritance hierarchy. The problem is that our current design
> weakens the contract for Traversable, Iterable, Seq and friends. Their
> operations might now be executed in parallel. It's most glaring in foreach,
> but there are other operations where this can make a difference. What's
> more, there's no way to specify sequential execution in the type system
> except going down to actual implementation classes. This is a bit having it
> backwards. First, sequential Seq is conceptionally a subclass of potentially
> parallel ParSeq. But right now ParSeq <: Seq. Second, it's far more useful
> to state that something is sequential than that something is potentially
> parallel. But right now, we have explicit parallel collections, but no
> explicit sequential collections.
>
> I see two possible redesigns:
>
> a) Reverse the inheritance order, so that
>
>    Traversable <: ParTraversable
>    Iterable <: ParIterable
>    Seq <: ParSeq
>    ...
>
> This means that all existing code will be safe from surprises, and no
> existing code can make use of parallel execution. Note that this is
> conceptually sound, as Par... means "might be run in parallel", not "will be
> run in parallel". A problem with this approach is that, without further
> refactorings, sequential collections inherit some unnecessary plumbing
> operations from parallel collections.
>
> b) Have both implementations side by side. E.g. have a base class
> collection.Seq with subclasses
>
>   collection.immutable.Seq
>   collection.mutable.Seq
>   collection.parimmutable.Seq
>   collection.parmutable.Seq
>
> There is no inheritance relationship between the 4 different kinds of Seq.
> In this design, code that requires a collection.Seq can be passed a parallel
> Seq, but code the requires a mutable.Seq or immutable.Seq cannot.
>
> Something to mull over...
>
> Cheers
>
>  -- Martin
>
>
>

odersky
Joined: 2008-07-29,
User offline. Last seen 45 weeks 6 days ago.
Re: Re: REPL: null-values in parallel ranges?


On Mon, Apr 4, 2011 at 5:33 PM, Bill Venners <bill [at] artima [dot] com> wrote:
Hi All,

I'd like to take a step back and state two meta-goals I'd hope for
about the introduction of parallel collections to Scala. One is that
great pain should be taken to avoid breaking existing code, but that
it is OK if some existing code breaks (so long as it isn't much) if
that's unavoidable to do things the "right way." The other is parallel
collections should be done the "right way." I think it is early enough
in Scala's adoption curve to do things the right way, and the feeling
I get from the Scala community is that's what people by and large
would want at this point (so long as it doesn't break much code).

I think the right way is that types Traversable, Iterable, Seq, Map,
and Set should allow sequential or parallel implementations. These are
the types I want to put in my method signatures, and I think the
default for Scala should be that parallel execution of the
higher-order methods is possible. This is what the future needs
because of multi-core.


If we were starting out, I'd agree with you. But the fact is that far too much code exists already that treats Seq, Iterable and friends as sequential. I believe we can't change that contract anymore. So we need to live with the naming convention that, e.g., Iterable is always sequential whereas ParIterable can be processed in parallel. I wish I could change that, but believe it's too late for that.

Sequential should be the special case, not parallel, so the marker
trait should probably be Sequential, not Parallel. Thus if I need a
sequential map to be passed to my method, I'd put Map with Sequential. 
But that would and should be rare. Usually I'd just say I want a Map,
and write it to make allow parallel or sequential collections passed.
And I think Scala makes that easy, so long as foreach is not run in
parallel, not matter what type of collection comes in.

Foreach's contract in Traversable should be tightened to say it will
run sequentially and a pforeach added.

But that's what I think is the wrong way. The collections design explicitly says that to get parallel operations, you write cs.par, to get sequential ones you write pxs.seq. It does not introduce any new variants of existing operations that are parallel. Making foreach sequential and introducing pforeach is like admitting that this design works only 95% of the time. And it does not exclude errors because people do use side effects in for expressions as well as in for loops.
For instance:

  var sum = 0
  for (x <- xs) yield { sum += x; sum }

(Yes, I know about scanLeft, but many people will use the for expression instead).

 That is, if we introduce pforeach I guarantee that the next person will immediately demand that we also make map, flatMap, withFilter sequential and have pmap, pflatMap, pWithFilter. So muddling the waters would lead us down a slippery slide.
 
Cheers

 -- Martin

ichoran
Joined: 2009-08-14,
User offline. Last seen 2 years 3 weeks ago.
Re: Re: REPL: null-values in parallel ranges?
On Mon, Apr 4, 2011 at 11:33 AM, Bill Venners <bill [at] artima [dot] com> wrote:
Now, my whole theory is based on the premise that foreach is the
problem. Martin, you mentioned that "It's most glaring in foreach,
there are other operations where [executing in parallel] can make a
difference." What other operations are you thinking of?

I have a non-negligible amount of code that looks like this:

  class HighPerformance {
    var index: Int = 0
    . . .
  }

  val i = new Incrementer  // You can imagine what this does
  someseq.map(s => { val hp = hpGenerator(s); hp.index = i++; hp })

instead of:

  someseq.zipWithIndex.map{ case (s,i) => val hp = hpGenerator(s); hp.index = i; hp })

though I admit that I could instead:

  val b = new SomethingBuilder
  val i = new Incrementer
  someseq.foreach(s => { val hp = hpGenerator(s); hp.index = i++; b += hp })
  b.result()

I also use this sort of thing with folds in place of

  (0 /: someseq.zipWithIndex)((l,r) => l + r._1.whatever + f(r._2))

which is, sadly, sometimes not very fast.  Instead:

  val i = new Incrementer
  (0 /: someseq)((l,r) => l + r.whatever + f(i++))

which is nearly as fast as and a lot more maintainable than

  var acc = 0
  var i = 0
  val sit = someseq.iterator
  while (sit.hasNext) {
    acc += sit.next.whatever + f(i)
    i += 1
  }

I've previously given an example of filterIndex; you can easily imagine (and I have used) this sort of indexing to achieve a combination of span and take(n); and so on.

You can also use this to gain history-dependence in your operations.  So, for example, if you want to find an item depending on the previous item, you could

  myseq.sliding(2).find(x => p(x(0), x(1)).map(_(1))

(note that zipped doesn't have a find, or you could use it on (myseq.tail, myseq)).  Instead,

  val prev = MOption[TheType](null)  // You can imagine this, too
  myseq.find(x => { val ans = prev.isDefined && p(prev.get,x); prev = x; ans })

which works nearly as well as the corresponding while loop.  (Actually, I have a TestWithPrevious class that I would use instead, but you get the idea.)

And so on.

Now, if you say, "Well, but that's only if you really care about performance!" I would say, "If you don't care about performance _why use parallel collections_!"

Anyway, there are lots of opportunities to use mutable state in collections to improve performance, sometimes by factors of two and sometimes by factors of ten (especially once you start taking garbage collection into account, which you often do, since you can in parallel generate garbage so much faster).  With a few utility classes, you can make this use of mutable state very safe, if not _quite_ as safe as immutable-only, unless the operation is parallelized.  There should be some very inexpensive (or free (i.e. compile-time)) way to safely support these sorts of things.

  --Rex

Aleksandar Prokopec
Joined: 2010-04-04,
User offline. Last seen 42 years 45 weeks ago.
Re: Re: REPL: null-values in parallel ranges?

> I think the right way is that types Traversable, Iterable, Seq, Map,
> and Set should allow sequential or parallel implementations. These are
> the types I want to put in my method signatures, and I think the
> default for Scala should be that parallel execution of the
> higher-order methods is possible. This is what the future needs
> because of multi-core.
>
> Sequential should be the special case, not parallel, so the marker
> trait should probably be Sequential, not Parallel. Thus if I need a
> sequential map to be passed to my method, I'd put Map with Sequential.
> But that would and should be rare. Usually I'd just say I want a Map,
> and write it to make allow parallel or sequential collections passed.
> And I think Scala makes that easy, so long as foreach is not run in
> parallel, not matter what type of collection comes in.
>

I agree on this, but I have the feeling that concepts such as a Map, Seq
or Set are already used too often.
Perhaps a more general concept could envelop both sequential and
parallel collections, e.g. be a supertrait of both Seq-s and ParSeq-s.
If this is the way to go, I lack the vocabulary to think of the name.

> I think this approach would not break much code because my sense is
> that by and large, people don't pass side-effecting functions to
> higher-order methods other than foreach.

From what I've seen, this is true - people don't often use side-effects
for `map`, `flatMap` and others.

However, the issue is not just whether or not it breaks a lot of code,
but in what ways it breaks it. I've had a test case once that it took me
almost a week to debug, before realizing a ListBuffer was being filled
in an innocent looking for loop using `++=`. That was when I added
`pforeach`.
The code may only be broken once in this way, and it will cause a lot of
grief. A singular hidden use of `map` in some obscure piece of code may.
And these kinds of bugs are rarely reproducable.

I believe that if it were about breaking all the existing code out there
provided that the compiler generates an error message for the breakage,
the price would still be much smaller to pay than to introduce a bug
that occurs almost never but is almost impossible to track down.

Also, I've done quite some refactorings of the standard library to
ensure `.seq` gets called in the right places before executing the `for`
loop. And I missed the REPL, and I've never refactored a couple of
occurences of this in the xml package.

> Now, my whole theory is based on the premise that foreach is the
> problem. Martin, you mentioned that "It's most glaring in foreach,
> there are other operations where [executing in parallel] can make a
> difference." What other operations are you thinking of?

Any other operation taking a predicate, mapping function, grouping
function, etc. (find, count, groupBy, flatMap,...) could side-effect and
potentially cause the same kind of nondeterministic bug which can be
detected only at some other point in code.

Bill Venners
Joined: 2008-12-18,
User offline. Last seen 31 weeks 5 days ago.
Re: Re: REPL: null-values in parallel ranges?

Hi Martin,

On Mon, Apr 4, 2011 at 9:19 AM, martin odersky wrote:
>
>
> On Mon, Apr 4, 2011 at 5:33 PM, Bill Venners wrote:
>>
>> Hi All,
>>
>> I'd like to take a step back and state two meta-goals I'd hope for
>> about the introduction of parallel collections to Scala. One is that
>> great pain should be taken to avoid breaking existing code, but that
>> it is OK if some existing code breaks (so long as it isn't much) if
>> that's unavoidable to do things the "right way." The other is parallel
>> collections should be done the "right way." I think it is early enough
>> in Scala's adoption curve to do things the right way, and the feeling
>> I get from the Scala community is that's what people by and large
>> would want at this point (so long as it doesn't break much code).
>>
>> I think the right way is that types Traversable, Iterable, Seq, Map,
>> and Set should allow sequential or parallel implementations. These are
>> the types I want to put in my method signatures, and I think the
>> default for Scala should be that parallel execution of the
>> higher-order methods is possible. This is what the future needs
>> because of multi-core.
>>
>
> If we were starting out, I'd agree with you. But the fact is that far too
> much code exists already that treats Seq, Iterable and friends as
> sequential. I believe we can't change that contract anymore. So we need to
> live with the naming convention that, e.g., Iterable is always sequential
> whereas ParIterable can be processed in parallel. I wish I could change
> that, but believe it's too late for that.
>
Would it be possible to get the old sequential behavior by changing or
adding imports in existing source files? When I upgraded ScalaTest to
2.8 I had to go through every almost source file and change the
imports at the top, because "import org.scalatest.junit", for example,
no longer imported the members of org.scalatest. It was a pain, but I
felt it was worth it to get things done the "right way."

If instead of adding packages like scala.parcollection,
scala.parcollection.mutable, and scala.parcollection.immutable and
placing parallel variants there, could you put those in
scala.collection, scala.collection.mutable and
scala.collection.immutable and put the sequential ones in
scala.seqcollection, scala.seqcollection.mutable, and
scala.seqcollection.immutable. Then all I'd need to do to get things
back the old way is go to every source file and change my import from
scala.collection.Map to scala.seqcollection.Map. If changing or adding
imports like that is all it would take to get my code to run like it
did before and get parallel collections done the "right way," then I'd
see that as worth it. I suspect the community would by and large see
that as worth it, and even those that didn't would eventually still
upgrade (though with some complaining).

>> Sequential should be the special case, not parallel, so the marker
>> trait should probably be Sequential, not Parallel. Thus if I need a
>> sequential map to be passed to my method, I'd put Map with Sequential.
>>
>> But that would and should be rare. Usually I'd just say I want a Map,
>> and write it to make allow parallel or sequential collections passed.
>> And I think Scala makes that easy, so long as foreach is not run in
>> parallel, not matter what type of collection comes in.
>>
>> Foreach's contract in Traversable should be tightened to say it will
>> run sequentially and a pforeach added.
>
> But that's what I think is the wrong way. The collections design explicitly
> says that to get parallel operations, you write cs.par, to get sequential
> ones you write pxs.seq. It does not introduce any new variants of existing
> operations that are parallel. Making foreach sequential and introducing
> pforeach is like admitting that this design works only 95% of the time. And
> it does not exclude errors because people do use side effects in for
> expressions as well as in for loops.
> For instance:
>
>   var sum = 0
>   for (x <- xs) yield { sum += x; sum }
>
> (Yes, I know about scanLeft, but many people will use the for expression
> instead).
>
>  That is, if we introduce pforeach I guarantee that the next person will
> immediately demand that we also make map, flatMap, withFilter sequential and
> have pmap, pflatMap, pWithFilter. So muddling the waters would lead us down
> a slippery slide.
>
Some may ask but I think there is no need for a pmap, pflatMap, etc. I
think foreach is inherently different because its result type is Unit,
so by definition the passed function needs to have a side effect to
have any "effect" on the program (other than heating up the CPU, which
it probably wouldn't even do after Hotspot optimizes it away). The for
expression with a yield that is given a function with a side effect is
probably out there in existing code, but I think it is unidiomatic
already and likely quite rare. That's code I think is OK to break,
especially if it can be fixed with a simple import change.

My sense is that people get the functional style of map and its kin.
It is a bit of a leap at first, but once they get it, they don't do
things like yield a side effect. Scala makes it very easy to avoid
that kind of thing.

Bill

> Cheers
>
>  -- Martin
>
>

Martin S. Weber
Joined: 2008-12-23,
User offline. Last seen 42 years 45 weeks ago.
Re: REPL: null-values in parallel ranges?

On 04/02/11 04:24, Bernd Johannes wrote:
> (...) I.e. if somebody is
> going to write something which accepts a Seq and is going to execute side
> effects on this Seq by means of foreach(...)

given foreach's signature(s):

(f: (A) => Unit) : Unit
[U] (f: (A) => U) : Unit

(mind the trailing ':Unit's) - *every* application of foreach is going to be
invoked for its side-effects.

Regards

Joshua.Suereth
Joined: 2008-09-02,
User offline. Last seen 32 weeks 5 days ago.
Re: REPL: null-values in parallel ranges?

col.foreach(f => ()) // LOOK MAH, NO SIDE EFFECTS!

On Mon, Apr 4, 2011 at 1:00 PM, Martin S. Weber <martin [dot] weber [at] nist [dot] gov> wrote:
On 04/02/11 04:24, Bernd Johannes wrote:
(...) I.e. if somebody is
going to write something which accepts a Seq and is going to execute side
effects on this Seq by means of foreach(...)

given foreach's signature(s):

(f: (A) => Unit) : Unit
[U] (f: (A) => U) : Unit

(mind the trailing ':Unit's) - *every* application of foreach is going to be invoked for its side-effects.

Regards

H-star Development
Joined: 2010-04-14,
User offline. Last seen 2 years 26 weeks ago.
Re: REPL: null-values in parallel ranges?
depends on what you define as side effect. the unit given to foreach might have an effect (rendering stuff on the screen) without changing any mutable states inside the program.
another example is logging. while it does have an effect, the effect does only apply to the "outside world"


Am 04.04.2011 19:14, schrieb Josh Suereth:
BANLkTimozroRra0UTPFxJFEpYV8Xh8U0Sw [at] mail [dot] gmail [dot] com" type="cite">
col.foreach(f => ()) // LOOK MAH, NO SIDE EFFECTS!

On Mon, Apr 4, 2011 at 1:00 PM, Martin S. Weber <martin [dot] weber [at] nist [dot] gov" rel="nofollow">martin [dot] weber [at] nist [dot] gov> wrote:
On 04/02/11 04:24, Bernd Johannes wrote:
(...) I.e. if somebody is
going to write something which accepts a Seq and is going to execute side
effects on this Seq by means of foreach(...)

given foreach's signature(s):

(f: (A) => Unit) : Unit
[U] (f: (A) => U) : Unit

(mind the trailing ':Unit's) - *every* application of foreach is going to be invoked for its side-effects.

Regards


extempore
Joined: 2008-12-17,
User offline. Last seen 35 weeks 3 days ago.
Re: Re: REPL: null-values in parallel ranges?

On 4/4/11 9:32 AM, Aleksandar Prokopec wrote:
> Also, I've done quite some refactorings of the standard library to
> ensure `.seq` gets called in the right places before executing the `for`
> loop.

I know it. When I went looking for more likely trouble I kept finding
your .seqs bugblocking me.

> And I missed the REPL

I'd say it's a good thing you did, or it'd be that much longer before
this discussion was being had.

Aleksandar Prokopec
Joined: 2010-04-04,
User offline. Last seen 42 years 45 weeks ago.
Re: Re: REPL: null-values in parallel ranges?

Agreed, and removing the `pforeach` was a good thing for the same reason.

>> And I missed the REPL
>
> I'd say it's a good thing you did, or it'd be that much longer before
> this discussion was being had.
>

Bernd Johannes
Joined: 2011-01-28,
User offline. Last seen 42 years 45 weeks ago.
Re: REPL: null-values in parallel ranges?

Am Montag, 4. April 2011, 14:46:06 schrieb Aleksandar Prokopec:
> We're in the process of solving this.
...
> If anyone sees a fault with this design, or any serious showstoppers,
> please say so.
>
+1
looks good to me and is conservative (viewed from the "outside" - I can't
comment about scala internals).

Great!
Greetings
Bernd.

Bernd Johannes
Joined: 2011-01-28,
User offline. Last seen 42 years 45 weeks ago.
Re: REPL: null-values in parallel ranges?

Am Montag, 4. April 2011, 16:22:46 schrieb Aleksandar Prokopec:
...
> > 3.) a library writer should inform the users if he changes a return
> >
> > type from e.g. Seq to ParSeq because it can be that the user's program
> > will still be compiling, but now lacks the guarantees the sequential
> > collection variants give
>
> This is a good point.
>
> If he uses a ParSeq as a return type, the code calling a method may
> still compile due to type inference. Note, however, that this does not
> hurt existing libraries, since there aren't any yet. Furthermore, we've
> had issues like this and we've used migration warnings to make them more
> detectable.

Maybe users should be instructed to explicitly provide the type of collection
they intend to use if they don't want to code concurrent aware. Thus
tightening the contract for collection providers into sequencial behaviour.

val myXY: Seq[aType] = someCollectionBuilderWhichMayReturnParallels()
for (e <- myXY) {happily doing sequencial stuff}

This way they have a lifetime guarantee for myXY to never fall into the
concurrent execution trap.

Or am I missing something?

Greetings
Bernd

Ruediger Keller 2
Joined: 2010-04-30,
User offline. Last seen 42 years 45 weeks ago.
Re: Re: REPL: null-values in parallel ranges?

Bill,

it seems you want all Scala programs that don't specify "parallelness"
- and that includes all existing Scala programs - to be parallel.

But I cannot see that as any reasonable goal. Quite the opposite, I
think this is extremely dangerous and should be avoided at all costs.

It is always argued that using parallel collections should be easy,
just as easy as using the existing sequential collections. And I agree
completely. But that is not the same as changing the semantics of all
Scala code ever written. I think that writing something like ParSeq
instead of Seq is still as easy as it can get.

I'm strongly in favor of having to make explicit that parallel
collections are used in the code, by using specific types. Sure, side
effecting code is probably not idiomatic in Scala, but I guarantee
that there is lots and lots of code out there that is side effecting
and more is written with every day. And as Rex Kerr writes, that is
especially true for high performance code that would benefit the most
from the new parallel collections.

I think it is just not an option to break side effecting code without
a compile time error. This is not just a problem for existing code,
but it is also extremely error prone for newly written code, even if
stated a hundred times in the docs. As long as Scala's support for
side effecting code isn't removed I don't see it feasible to make
everything default to parallel. Just recall how hard multi threading
related bugs are to find and debug. This would IMHO lead to an
absolute nightmare and would surely hinder Scala's further adoption.

What is the problem with specifying with the types that you know what
you are doing and that you want parallel execution? Sure, we all want
the free lunch, giving us parallel execution without us thinking about
what we are doing. But I don't see that we will get that free lunch
with Scala as long as it supports side effecting code.

I think that for 95% of the code out there parallel execution for
performance is irrelevant and that will probably never change. Sure,
there is this 5% where it matters all the more, but that is no reason
to make something the default which is irrelevant and even dangerous
for the majority.

Just make the parallel collections as easy to use as possible - but not easier.

Regards,
Rüdiger

2011/4/4 Bill Venners :
> Hi Martin,
>
> On Mon, Apr 4, 2011 at 9:19 AM, martin odersky wrote:
>>
>>
>> On Mon, Apr 4, 2011 at 5:33 PM, Bill Venners wrote:
>>>
>>> Hi All,
>>>
>>> I'd like to take a step back and state two meta-goals I'd hope for
>>> about the introduction of parallel collections to Scala. One is that
>>> great pain should be taken to avoid breaking existing code, but that
>>> it is OK if some existing code breaks (so long as it isn't much) if
>>> that's unavoidable to do things the "right way." The other is parallel
>>> collections should be done the "right way." I think it is early enough
>>> in Scala's adoption curve to do things the right way, and the feeling
>>> I get from the Scala community is that's what people by and large
>>> would want at this point (so long as it doesn't break much code).
>>>
>>> I think the right way is that types Traversable, Iterable, Seq, Map,
>>> and Set should allow sequential or parallel implementations. These are
>>> the types I want to put in my method signatures, and I think the
>>> default for Scala should be that parallel execution of the
>>> higher-order methods is possible. This is what the future needs
>>> because of multi-core.
>>>
>>
>> If we were starting out, I'd agree with you. But the fact is that far too
>> much code exists already that treats Seq, Iterable and friends as
>> sequential. I believe we can't change that contract anymore. So we need to
>> live with the naming convention that, e.g., Iterable is always sequential
>> whereas ParIterable can be processed in parallel. I wish I could change
>> that, but believe it's too late for that.
>>
> Would it be possible to get the old sequential behavior by changing or
> adding imports in existing source files? When I upgraded ScalaTest to
> 2.8 I had to go through every almost source file and change the
> imports at the top, because "import org.scalatest.junit", for example,
> no longer imported the members of org.scalatest. It was a pain, but I
> felt it was worth it to get things done the "right way."
>
> If instead of adding packages like scala.parcollection,
> scala.parcollection.mutable, and scala.parcollection.immutable and
> placing parallel variants there, could you put those in
> scala.collection, scala.collection.mutable and
> scala.collection.immutable and put the sequential ones in
> scala.seqcollection, scala.seqcollection.mutable, and
> scala.seqcollection.immutable. Then all I'd need to do to get things
> back the old way is go to every source file and change my import from
> scala.collection.Map to scala.seqcollection.Map. If changing or adding
> imports like that is all it would take to get my code to run like it
> did before and get parallel collections done the "right way," then I'd
> see that as worth it. I suspect the community would by and large see
> that as worth it, and even those that didn't would eventually still
> upgrade (though with some complaining).
>
>>> Sequential should be the special case, not parallel, so the marker
>>> trait should probably be Sequential, not Parallel. Thus if I need a
>>> sequential map to be passed to my method, I'd put Map with Sequential.
>>>
>>> But that would and should be rare. Usually I'd just say I want a Map,
>>> and write it to make allow parallel or sequential collections passed.
>>> And I think Scala makes that easy, so long as foreach is not run in
>>> parallel, not matter what type of collection comes in.
>>>
>>> Foreach's contract in Traversable should be tightened to say it will
>>> run sequentially and a pforeach added.
>>
>> But that's what I think is the wrong way. The collections design explicitly
>> says that to get parallel operations, you write cs.par, to get sequential
>> ones you write pxs.seq. It does not introduce any new variants of existing
>> operations that are parallel. Making foreach sequential and introducing
>> pforeach is like admitting that this design works only 95% of the time. And
>> it does not exclude errors because people do use side effects in for
>> expressions as well as in for loops.
>> For instance:
>>
>>   var sum = 0
>>   for (x <- xs) yield { sum += x; sum }
>>
>> (Yes, I know about scanLeft, but many people will use the for expression
>> instead).
>>
>>  That is, if we introduce pforeach I guarantee that the next person will
>> immediately demand that we also make map, flatMap, withFilter sequential and
>> have pmap, pflatMap, pWithFilter. So muddling the waters would lead us down
>> a slippery slide.
>>
> Some may ask but I think there is no need for a pmap, pflatMap, etc. I
> think foreach is inherently different because its result type is Unit,
> so by definition the passed function needs to have a side effect to
> have any "effect" on the program (other than heating up the CPU, which
> it probably wouldn't even do after Hotspot optimizes it away). The for
> expression with a yield that is given a function with a side effect is
> probably out there in existing code, but I think it is unidiomatic
> already and likely quite rare. That's code I think is OK to break,
> especially if it can be fixed with a simple import change.
>
> My sense is that people get the functional style of map and its kin.
> It is a bit of a leap at first, but once they get it, they don't do
> things like yield a side effect. Scala makes it very easy to avoid
> that kind of thing.
>
> Bill
>
>> Cheers
>>
>>  -- Martin
>>
>>
>
>
>
> --
> Bill Venners
> Artima, Inc.
> http://www.artima.com
>

Bill Venners
Joined: 2008-12-18,
User offline. Last seen 31 weeks 5 days ago.
Re: Re: REPL: null-values in parallel ranges?

Hi Ruediger,

On Mon, Apr 4, 2011 at 11:10 AM, Ruediger Keller
wrote:
> Bill,
>
> it seems you want all Scala programs that don't specify "parallelness"
> - and that includes all existing Scala programs - to be parallel.
>
Actually it was if you don't specify "sequentialness" explicitly, such
as in a method parameter type, you could get something that is either
parallel or sequential. In other words, type scala.collection.Map
would be a map that could be mutable or immutable, sequential or
parallel. The reason is simply that seems the most general level of
abstraction for Map types, and scala.collection.Map is the most
general Map type. If you needed something that is definitely
Sequential, you could say scala.collection.Map with Sequential in your
method parameter type, or import scala.seqcollection._ and just say
Map. Something like that.

To me if we were starting from scratch, that would be a sensible way
to structure it if sequential and parallel are in the same hierarcy.
The problem with this is it means existing code could break. If it
were as simple as going through and changing imports, as I suggested
earlier, then it might be OK. But I think it isn't that simple on
further reflection, because existing APIs take all these general
scala.collection types and aren't prepared for parallel. They'd have
to go through all those methods and add a .seq to everything at the
beginning. Or lower the signature to be sequential, which would break
client code (which could be fixed with more import changes in the
client code, etc.) The import changes we had to do for 2.8 did not
ripple out to break client code. So nevermind on that idea.

> But I cannot see that as any reasonable goal. Quite the opposite, I
> think this is extremely dangerous and should be avoided at all costs.
>
Your argument is well reasoned. What it sounds like you're saying,
though, is that the "place" for parallel collections is as performance
optimizations. Usually 5% to 10% of a programs code accounts for 80%
of the execution time, so there's no point worrying about using
parallel collections except in that 5 to 10%. When you have a
performance problem, you would find that 5 to 10%, and plug in a few
parallel collections there to try and speed things up, measure the
result, and switch back to sequential if you don't get a performance
improvement. I think that should be an easy way to use parallel
collections at least, but I had the feeling people would want to use
them more generally.

If that's the case, then perhaps the lingua franca of APIs would be
the plain old sequential types that we're all using now, and inside a
method, if you want parallel collections for performance you call .par
on them right there. There could be an implicit conversion from each
parallel collection type to a sequential counterpart (but not in the
other direction), that would make it easy to pass a parallel
collection to a method that requires a sequential one. Parallel and
sequential collections need not even be in the same hierarchy in that
case. They could be in a (no pun intended) parallel hierarchy.

I just looked through some of my own Scala code at for expressions,
trying to find one that would break if done in parallel. I found
surprisingly few because most had yield and the function didn't have
any (obvious) side effect. Some that didn't have yield were doing
things that were (or appeared) thread safe, like firing a message to
an actor. A few would break. But I'll tell you what I didn't like is
that I had to think about whether the passed function was thread-safe.
In general we wouldn't want people to have to think about that. It
think it is most often easy to write thread-safe functions for methods
like map and filter, because you just write a function with no side
effects. But as soon as your function calls into something else you're
not 100% familiar with, you need to think about whether that thing you
call has side effects. It's hard.

And that brings me back to my concern about a parallel foreach. When I
do switch to using a parallel collection, and I know I'm doing it, it
is much easier for me to make a thread safe function to pass to
something like map than foreach. Because foreach functions need to
have a side effect, I have to write thread-safe code in the
old-fashioned way. Import java.util.concurrent and wrap a counter in
an AtomicInt, etc. So what that would cause me to do is try and
remember to almost never use foreach with a parallel collection, and
that I think makes them harder and more error prone to use.

Bill

> It is always argued that using parallel collections should be easy,
> just as easy as using the existing sequential collections. And I agree
> completely. But that is not the same as changing the semantics of all
> Scala code ever written. I think that writing something like ParSeq
> instead of Seq is still as easy as it can get.
>
> I'm strongly in favor of having to make explicit that parallel
> collections are used in the code, by using specific types. Sure, side
> effecting code is probably not idiomatic in Scala, but I guarantee
> that there is lots and lots of code out there that is side effecting
> and more is written with every day. And as Rex Kerr writes, that is
> especially true for high performance code that would benefit the most
> from the new parallel collections.
>
> I think it is just not an option to break side effecting code without
> a compile time error. This is not just a problem for existing code,
> but it is also extremely error prone for newly written code, even if
> stated a hundred times in the docs. As long as Scala's support for
> side effecting code isn't removed I don't see it feasible to make
> everything default to parallel. Just recall how hard multi threading
> related bugs are to find and debug. This would IMHO lead to an
> absolute nightmare and would surely hinder Scala's further adoption.
>
> What is the problem with specifying with the types that you know what
> you are doing and that you want parallel execution? Sure, we all want
> the free lunch, giving us parallel execution without us thinking about
> what we are doing. But I don't see that we will get that free lunch
> with Scala as long as it supports side effecting code.
>
> I think that for 95% of the code out there parallel execution for
> performance is irrelevant and that will probably never change. Sure,
> there is this 5% where it matters all the more, but that is no reason
> to make something the default which is irrelevant and even dangerous
> for the majority.
>
> Just make the parallel collections as easy to use as possible - but not easier.
>
> Regards,
> Rüdiger
>
>
> 2011/4/4 Bill Venners :
>> Hi Martin,
>>
>> On Mon, Apr 4, 2011 at 9:19 AM, martin odersky wrote:
>>>
>>>
>>> On Mon, Apr 4, 2011 at 5:33 PM, Bill Venners wrote:
>>>>
>>>> Hi All,
>>>>
>>>> I'd like to take a step back and state two meta-goals I'd hope for
>>>> about the introduction of parallel collections to Scala. One is that
>>>> great pain should be taken to avoid breaking existing code, but that
>>>> it is OK if some existing code breaks (so long as it isn't much) if
>>>> that's unavoidable to do things the "right way." The other is parallel
>>>> collections should be done the "right way." I think it is early enough
>>>> in Scala's adoption curve to do things the right way, and the feeling
>>>> I get from the Scala community is that's what people by and large
>>>> would want at this point (so long as it doesn't break much code).
>>>>
>>>> I think the right way is that types Traversable, Iterable, Seq, Map,
>>>> and Set should allow sequential or parallel implementations. These are
>>>> the types I want to put in my method signatures, and I think the
>>>> default for Scala should be that parallel execution of the
>>>> higher-order methods is possible. This is what the future needs
>>>> because of multi-core.
>>>>
>>>
>>> If we were starting out, I'd agree with you. But the fact is that far too
>>> much code exists already that treats Seq, Iterable and friends as
>>> sequential. I believe we can't change that contract anymore. So we need to
>>> live with the naming convention that, e.g., Iterable is always sequential
>>> whereas ParIterable can be processed in parallel. I wish I could change
>>> that, but believe it's too late for that.
>>>
>> Would it be possible to get the old sequential behavior by changing or
>> adding imports in existing source files? When I upgraded ScalaTest to
>> 2.8 I had to go through every almost source file and change the
>> imports at the top, because "import org.scalatest.junit", for example,
>> no longer imported the members of org.scalatest. It was a pain, but I
>> felt it was worth it to get things done the "right way."
>>
>> If instead of adding packages like scala.parcollection,
>> scala.parcollection.mutable, and scala.parcollection.immutable and
>> placing parallel variants there, could you put those in
>> scala.collection, scala.collection.mutable and
>> scala.collection.immutable and put the sequential ones in
>> scala.seqcollection, scala.seqcollection.mutable, and
>> scala.seqcollection.immutable. Then all I'd need to do to get things
>> back the old way is go to every source file and change my import from
>> scala.collection.Map to scala.seqcollection.Map. If changing or adding
>> imports like that is all it would take to get my code to run like it
>> did before and get parallel collections done the "right way," then I'd
>> see that as worth it. I suspect the community would by and large see
>> that as worth it, and even those that didn't would eventually still
>> upgrade (though with some complaining).
>>
>>>> Sequential should be the special case, not parallel, so the marker
>>>> trait should probably be Sequential, not Parallel. Thus if I need a
>>>> sequential map to be passed to my method, I'd put Map with Sequential.
>>>>
>>>> But that would and should be rare. Usually I'd just say I want a Map,
>>>> and write it to make allow parallel or sequential collections passed.
>>>> And I think Scala makes that easy, so long as foreach is not run in
>>>> parallel, not matter what type of collection comes in.
>>>>
>>>> Foreach's contract in Traversable should be tightened to say it will
>>>> run sequentially and a pforeach added.
>>>
>>> But that's what I think is the wrong way. The collections design explicitly
>>> says that to get parallel operations, you write cs.par, to get sequential
>>> ones you write pxs.seq. It does not introduce any new variants of existing
>>> operations that are parallel. Making foreach sequential and introducing
>>> pforeach is like admitting that this design works only 95% of the time. And
>>> it does not exclude errors because people do use side effects in for
>>> expressions as well as in for loops.
>>> For instance:
>>>
>>>   var sum = 0
>>>   for (x <- xs) yield { sum += x; sum }
>>>
>>> (Yes, I know about scanLeft, but many people will use the for expression
>>> instead).
>>>
>>>  That is, if we introduce pforeach I guarantee that the next person will
>>> immediately demand that we also make map, flatMap, withFilter sequential and
>>> have pmap, pflatMap, pWithFilter. So muddling the waters would lead us down
>>> a slippery slide.
>>>
>> Some may ask but I think there is no need for a pmap, pflatMap, etc. I
>> think foreach is inherently different because its result type is Unit,
>> so by definition the passed function needs to have a side effect to
>> have any "effect" on the program (other than heating up the CPU, which
>> it probably wouldn't even do after Hotspot optimizes it away). The for
>> expression with a yield that is given a function with a side effect is
>> probably out there in existing code, but I think it is unidiomatic
>> already and likely quite rare. That's code I think is OK to break,
>> especially if it can be fixed with a simple import change.
>>
>> My sense is that people get the functional style of map and its kin.
>> It is a bit of a leap at first, but once they get it, they don't do
>> things like yield a side effect. Scala makes it very easy to avoid
>> that kind of thing.
>>
>> Bill
>>
>>> Cheers
>>>
>>>  -- Martin
>>>
>>>
>>
>>
>>
>> --
>> Bill Venners
>> Artima, Inc.
>> http://www.artima.com
>>
>

Ruediger Keller 2
Joined: 2010-04-30,
User offline. Last seen 42 years 45 weeks ago.
Re: Re: REPL: null-values in parallel ranges?

Hi Bill,

I agree that a parallel foreach is dangerous because of its inherent
side effectfulness. I would also prefer a separate pareach method.
Making foreach run in parallel is just asking for trouble.

Regards,
Rüdiger

2011/4/4 Bill Venners :
> Hi Ruediger,
>
> On Mon, Apr 4, 2011 at 11:10 AM, Ruediger Keller
> wrote:
>> Bill,
>>
>> it seems you want all Scala programs that don't specify "parallelness"
>> - and that includes all existing Scala programs - to be parallel.
>>
> Actually it was if you don't specify "sequentialness" explicitly, such
> as in a method parameter type, you could get something that is either
> parallel or sequential. In other words, type scala.collection.Map
> would be a map that could be mutable or immutable, sequential or
> parallel. The reason is simply that seems the most general level of
> abstraction for Map types, and scala.collection.Map is the most
> general Map type. If you needed something that is definitely
> Sequential, you could say scala.collection.Map with Sequential in your
> method parameter type, or import scala.seqcollection._ and just say
> Map. Something like that.
>
> To me if we were starting from scratch, that would be a sensible way
> to structure it if sequential and parallel are in the same hierarcy.
> The problem with this is it means existing code could break. If it
> were as simple as going through and changing imports, as I suggested
> earlier, then it might be OK. But I think it isn't that simple on
> further reflection, because existing APIs take all these general
> scala.collection types and aren't prepared for parallel. They'd have
> to go through all those methods and add a .seq to everything at the
> beginning. Or lower the signature to be sequential, which would break
> client code (which could be fixed with more import changes in the
> client code, etc.) The import changes we had to do for 2.8 did not
> ripple out to break client code. So nevermind on that idea.
>
>> But I cannot see that as any reasonable goal. Quite the opposite, I
>> think this is extremely dangerous and should be avoided at all costs.
>>
> Your argument is well reasoned. What it sounds like you're saying,
> though, is that the "place" for parallel collections is as performance
> optimizations. Usually 5% to 10% of a programs code accounts for 80%
> of the execution time, so there's no point worrying about using
> parallel collections except in that 5 to 10%. When you have a
> performance problem, you would find that 5 to 10%, and plug in a few
> parallel collections there to try and speed things up, measure the
> result, and switch back to sequential if you don't get a performance
> improvement. I think that should be an easy way to use parallel
> collections at least, but I had the feeling people would want to use
> them more generally.
>
> If that's the case, then perhaps the lingua franca of APIs would be
> the plain old sequential types that we're all using now, and inside a
> method, if you want parallel collections for performance you call .par
> on them right there. There could be an implicit conversion from each
> parallel collection type to a sequential counterpart (but not in the
> other direction), that would make it easy to pass a parallel
> collection to a method that requires a sequential one. Parallel and
> sequential collections need not even be in the same hierarchy in that
> case. They could be in a (no pun intended) parallel hierarchy.
>
> I just looked through some of my own Scala code at for expressions,
> trying to find one that would break if done in parallel. I found
> surprisingly few because most had yield and the function didn't have
> any (obvious) side effect. Some that didn't have yield were doing
> things that were (or appeared) thread safe, like firing a message to
> an actor. A few would break. But I'll tell you what I didn't like is
> that I had to think about whether the passed function was thread-safe.
> In general we wouldn't want people to have to think about that. It
> think it is most often easy to write thread-safe functions for methods
> like map and filter, because you just write a function with no side
> effects. But as soon as your function calls into something else you're
> not 100% familiar with, you need to think about whether that thing you
> call has side effects. It's hard.
>
> And that brings me back to my concern about a parallel foreach. When I
> do switch to using a parallel collection, and I know I'm doing it, it
> is much easier for me to make a thread safe function to pass to
> something like map than foreach. Because foreach functions need to
> have a side effect, I have to write thread-safe code in the
> old-fashioned way. Import java.util.concurrent and wrap a counter in
> an AtomicInt, etc. So what that would cause me to do is try and
> remember to almost never use foreach with a parallel collection, and
> that I think makes them harder and more error prone to use.
>
> Bill
>
>> It is always argued that using parallel collections should be easy,
>> just as easy as using the existing sequential collections. And I agree
>> completely. But that is not the same as changing the semantics of all
>> Scala code ever written. I think that writing something like ParSeq
>> instead of Seq is still as easy as it can get.
>>
>> I'm strongly in favor of having to make explicit that parallel
>> collections are used in the code, by using specific types. Sure, side
>> effecting code is probably not idiomatic in Scala, but I guarantee
>> that there is lots and lots of code out there that is side effecting
>> and more is written with every day. And as Rex Kerr writes, that is
>> especially true for high performance code that would benefit the most
>> from the new parallel collections.
>>
>> I think it is just not an option to break side effecting code without
>> a compile time error. This is not just a problem for existing code,
>> but it is also extremely error prone for newly written code, even if
>> stated a hundred times in the docs. As long as Scala's support for
>> side effecting code isn't removed I don't see it feasible to make
>> everything default to parallel. Just recall how hard multi threading
>> related bugs are to find and debug. This would IMHO lead to an
>> absolute nightmare and would surely hinder Scala's further adoption.
>>
>> What is the problem with specifying with the types that you know what
>> you are doing and that you want parallel execution? Sure, we all want
>> the free lunch, giving us parallel execution without us thinking about
>> what we are doing. But I don't see that we will get that free lunch
>> with Scala as long as it supports side effecting code.
>>
>> I think that for 95% of the code out there parallel execution for
>> performance is irrelevant and that will probably never change. Sure,
>> there is this 5% where it matters all the more, but that is no reason
>> to make something the default which is irrelevant and even dangerous
>> for the majority.
>>
>> Just make the parallel collections as easy to use as possible - but not easier.
>>
>> Regards,
>> Rüdiger
>>
>>
>> 2011/4/4 Bill Venners :
>>> Hi Martin,
>>>
>>> On Mon, Apr 4, 2011 at 9:19 AM, martin odersky wrote:
>>>>
>>>>
>>>> On Mon, Apr 4, 2011 at 5:33 PM, Bill Venners wrote:
>>>>>
>>>>> Hi All,
>>>>>
>>>>> I'd like to take a step back and state two meta-goals I'd hope for
>>>>> about the introduction of parallel collections to Scala. One is that
>>>>> great pain should be taken to avoid breaking existing code, but that
>>>>> it is OK if some existing code breaks (so long as it isn't much) if
>>>>> that's unavoidable to do things the "right way." The other is parallel
>>>>> collections should be done the "right way." I think it is early enough
>>>>> in Scala's adoption curve to do things the right way, and the feeling
>>>>> I get from the Scala community is that's what people by and large
>>>>> would want at this point (so long as it doesn't break much code).
>>>>>
>>>>> I think the right way is that types Traversable, Iterable, Seq, Map,
>>>>> and Set should allow sequential or parallel implementations. These are
>>>>> the types I want to put in my method signatures, and I think the
>>>>> default for Scala should be that parallel execution of the
>>>>> higher-order methods is possible. This is what the future needs
>>>>> because of multi-core.
>>>>>
>>>>
>>>> If we were starting out, I'd agree with you. But the fact is that far too
>>>> much code exists already that treats Seq, Iterable and friends as
>>>> sequential. I believe we can't change that contract anymore. So we need to
>>>> live with the naming convention that, e.g., Iterable is always sequential
>>>> whereas ParIterable can be processed in parallel. I wish I could change
>>>> that, but believe it's too late for that.
>>>>
>>> Would it be possible to get the old sequential behavior by changing or
>>> adding imports in existing source files? When I upgraded ScalaTest to
>>> 2.8 I had to go through every almost source file and change the
>>> imports at the top, because "import org.scalatest.junit", for example,
>>> no longer imported the members of org.scalatest. It was a pain, but I
>>> felt it was worth it to get things done the "right way."
>>>
>>> If instead of adding packages like scala.parcollection,
>>> scala.parcollection.mutable, and scala.parcollection.immutable and
>>> placing parallel variants there, could you put those in
>>> scala.collection, scala.collection.mutable and
>>> scala.collection.immutable and put the sequential ones in
>>> scala.seqcollection, scala.seqcollection.mutable, and
>>> scala.seqcollection.immutable. Then all I'd need to do to get things
>>> back the old way is go to every source file and change my import from
>>> scala.collection.Map to scala.seqcollection.Map. If changing or adding
>>> imports like that is all it would take to get my code to run like it
>>> did before and get parallel collections done the "right way," then I'd
>>> see that as worth it. I suspect the community would by and large see
>>> that as worth it, and even those that didn't would eventually still
>>> upgrade (though with some complaining).
>>>
>>>>> Sequential should be the special case, not parallel, so the marker
>>>>> trait should probably be Sequential, not Parallel. Thus if I need a
>>>>> sequential map to be passed to my method, I'd put Map with Sequential.
>>>>>
>>>>> But that would and should be rare. Usually I'd just say I want a Map,
>>>>> and write it to make allow parallel or sequential collections passed.
>>>>> And I think Scala makes that easy, so long as foreach is not run in
>>>>> parallel, not matter what type of collection comes in.
>>>>>
>>>>> Foreach's contract in Traversable should be tightened to say it will
>>>>> run sequentially and a pforeach added.
>>>>
>>>> But that's what I think is the wrong way. The collections design explicitly
>>>> says that to get parallel operations, you write cs.par, to get sequential
>>>> ones you write pxs.seq. It does not introduce any new variants of existing
>>>> operations that are parallel. Making foreach sequential and introducing
>>>> pforeach is like admitting that this design works only 95% of the time. And
>>>> it does not exclude errors because people do use side effects in for
>>>> expressions as well as in for loops.
>>>> For instance:
>>>>
>>>>   var sum = 0
>>>>   for (x <- xs) yield { sum += x; sum }
>>>>
>>>> (Yes, I know about scanLeft, but many people will use the for expression
>>>> instead).
>>>>
>>>>  That is, if we introduce pforeach I guarantee that the next person will
>>>> immediately demand that we also make map, flatMap, withFilter sequential and
>>>> have pmap, pflatMap, pWithFilter. So muddling the waters would lead us down
>>>> a slippery slide.
>>>>
>>> Some may ask but I think there is no need for a pmap, pflatMap, etc. I
>>> think foreach is inherently different because its result type is Unit,
>>> so by definition the passed function needs to have a side effect to
>>> have any "effect" on the program (other than heating up the CPU, which
>>> it probably wouldn't even do after Hotspot optimizes it away). The for
>>> expression with a yield that is given a function with a side effect is
>>> probably out there in existing code, but I think it is unidiomatic
>>> already and likely quite rare. That's code I think is OK to break,
>>> especially if it can be fixed with a simple import change.
>>>
>>> My sense is that people get the functional style of map and its kin.
>>> It is a bit of a leap at first, but once they get it, they don't do
>>> things like yield a side effect. Scala makes it very easy to avoid
>>> that kind of thing.
>>>
>>> Bill
>>>
>>>> Cheers
>>>>
>>>>  -- Martin
>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> Bill Venners
>>> Artima, Inc.
>>> http://www.artima.com
>>>
>>
>
>
>
> --
> Bill Venners
> Artima, Inc.
> http://www.artima.com
>

jibal
Joined: 2010-12-01,
User offline. Last seen 1 year 45 weeks ago.
Re: Re: REPL: null-values in parallel ranges?
On Mon, Apr 4, 2011 at 9:19 AM, martin odersky <martin [dot] odersky [at] epfl [dot] ch> wrote:


On Mon, Apr 4, 2011 at 5:33 PM, Bill Venners <bill [at] artima [dot] com> wrote:
Hi All,

I'd like to take a step back and state two meta-goals I'd hope for
about the introduction of parallel collections to Scala. One is that
great pain should be taken to avoid breaking existing code, but that
it is OK if some existing code breaks (so long as it isn't much) if
that's unavoidable to do things the "right way." The other is parallel
collections should be done the "right way." I think it is early enough
in Scala's adoption curve to do things the right way, and the feeling
I get from the Scala community is that's what people by and large
would want at this point (so long as it doesn't break much code).

I think the right way is that types Traversable, Iterable, Seq, Map,
and Set should allow sequential or parallel implementations. These are
the types I want to put in my method signatures, and I think the
default for Scala should be that parallel execution of the
higher-order methods is possible. This is what the future needs
because of multi-core.


If we were starting out, I'd agree with you. But the fact is that far too much code exists already that treats Seq, Iterable and friends as sequential. I believe we can't change that contract anymore. So we need to live with the naming convention that, e.g., Iterable is always sequential whereas ParIterable can be processed in parallel. I wish I could change that, but believe it's too late for that.

Sequential should be the special case, not parallel, so the marker
trait should probably be Sequential, not Parallel. Thus if I need a
sequential map to be passed to my method, I'd put Map with Sequential. 
But that would and should be rare. Usually I'd just say I want a Map,
and write it to make allow parallel or sequential collections passed.
And I think Scala makes that easy, so long as foreach is not run in
parallel, not matter what type of collection comes in.

Foreach's contract in Traversable should be tightened to say it will
run sequentially and a pforeach added.

But that's what I think is the wrong way.

Unless I'm misunderstanding the tenses here, foreach's contract already says that, as I just quoted from the documentation a little while ago:

" If a collection is an instance of an ordered collection class, traversing its elements with foreach will always visit elements in the same order, even for different runs of the program. If the class is not ordered, foreach can visit elements in different orders for different runs (but it will keep the same order in the same run).'"

A lot of code depends on that contract.

-- Jim

 
The collections design explicitly says that to get parallel operations, you write cs.par, to get sequential ones you write pxs.seq. It does not introduce any new variants of existing operations that are parallel. Making foreach sequential and introducing pforeach is like admitting that this design works only 95% of the time. And it does not exclude errors because people do use side effects in for expressions as well as in for loops.
For instance:

  var sum = 0
  for (x <- xs) yield { sum += x; sum }

(Yes, I know about scanLeft, but many people will use the for expression instead).

 That is, if we introduce pforeach I guarantee that the next person will immediately demand that we also make map, flatMap, withFilter sequential and have pmap, pflatMap, pWithFilter. So muddling the waters would lead us down a slippery slide.
 
Cheers

 -- Martin


ScottC 2
Joined: 2011-01-30,
User offline. Last seen 42 years 45 weeks ago.
Re: REPL: null-values in parallel ranges?

New to Scala. I expect the below suggestion won't be well liked. But
maybe it provokes some better ideas:

If splitting foreach in two, a sequential foreach and a parallel
parforeach, for expressions may be limited in their parallel potential
unless the compiler/generator is too 'magical' for its own good.

If parallelism is meant to be a new fundamental aspect in the
language, I think it needs to be exposed in more than a method name.

you can write:

for (arg <- args)
println(arg)

and your body will have guaranteed sequential execution, and you can
write the below to allow parallel execution but not force it:

pfor (arg <- args)
println(arg)

My experience with concurrent programming suggests that there will be
many times a user wishes to guarantee sequential execution, and others
where they want to get the most parallelism as possible. This is
both for users and library writers. Sometimes as a user, I don't
want to eat up multiple threads, even if it is faster in isolation.
Sometimes sequential execution has higher throughput in aggregate with
other work (but higher latency). Other times, parallelism is
essential.

The above addition to for comprehensions makes the choice very
explicit. How fundamental should this be? I have no idea, but if
concurrency is going to be at the heart of the language then a
language construct might make sense. Underneath the covers, the two
generators could do many things slightly differently -- map and
flatMap as well as foreach.

On Apr 4, 1:12 am, Bill Venners wrote:
> Hi Aleksandar,
>
> On Mon, Apr 4, 2011 at 12:51 AM, Aleksandar Prokopec
>
>
>
>
>
>
>
> wrote:
> > On 02/04/11 23:02, Jonas Bonér wrote:
>
> >> I don't think parallel collections should "lie" and provide a leaky abstraction over concurrency. When someone asks for parallel execution I think it is best that they get what they are asking for. Sure it might break sequential code, but threads does that; they turn determinism into non-determinism. Let's embrace parallelism at its core and reap the benefits of it rather than trying to force it into something that it is not. It is a matter of education; documentation needs to be a lot better. People needs to understand that concurrency and parallelism is here to stay and a lot of the time it still ain't that pretty.
>
> > I agree that people should get used to the fact that parallelism is
> > out there. The only downside here is that if one really wants to be
> > sure the code always executes correctly, a user has to write:
>
> > for (x <- xs.seq) ...
>
> > all the time.
>
> > The problem is that someone might not know he's asking for parallel
> > execution.
>
> >> ...
> >> def oldMethod(seq: Seq[String]) { for (e <- seq) println(e) }
>
> >> But new code that really wants parallel execution could just do .par
> >> on that Seq[T], which would somehow kick it into parallel execution.
>
> >> def newMethod(pSeq: Seq[String] with Parallel) { for (e <- pSeq) println(e) }
>
> >> By saying the method takes a Seq[String] with Parallel, it is
> >> announcing that it might do foreaches in parallel.
>
> >> I think it would be fine to let flatMap, filter, withFilter, map be
> >> rewritten in the same way, because most times people wouldn't have
> >> side effects in those. But foreach is all about the side effects.
>
> >> Bill
>
> > I agree something like a compiler rewrite might possibly be a good
> > idea.
> > However, adding an additional `pareach` method means for-
> > comprehensions no longer work.
>
> Actually after sleeping on it last night, I emailed a different note
> this morning in which I changed my tune. I suggested the rules for
> translating for expressions to invocations of foreach, withFilter,
> map, and flatMap should stay the same as they are now. For expressions
> are about more than just collections.
>
> Now to your suggestion that adding a separate pareach method would
> make for expressions not work, I'd say they still work like they
> always did, but just that foreach is not something that will be done
> in parallel in parallel collections, whereas map, flatMap, and
> withFilter (and a bunch of other higher order methods on collections)
> will be done in parallel on parallel collections. That would mean you
> could never get a parallel foreach out of a for expression, and I
> don't think that's so bad. A parallel foreach requires a thread-safe
> side-effecting function, which is hard to write. So making the method
> name long and (to some) ugly, such as parallelForeach or
> foreachParallel, discourages its use for frivolous endeavors, and so
> does not providing a way to get a parallel foreach out of a for
> expression.
>
> Bill
>
>
>
>
>
>
>
>
>
> > A couple of alternatives I see are:
> > - have `foreach` being sequential, and add a method `inPar` to
> > parallel collections, which returns a thin wrapper around the
> > collection such that its `foreach` is executed in parallel (forwards
> > to `pareach`). The compiler could then rewrite the for-comprehension
> > like this if it detects that the generator isInstanceOf[Parallel] at
> > compile time:
>
> > for (x <- xs) { ... }
>
> > <>
>
> > --> for (x <- xs.inPar) { ... }
>
> > (now it calls `foreach` of the thin wrapper which forwards to
> > `pareach`)
>
> > This does not solve the problem with map, flatMap, filter, and
> > friends, but it is a start - something like this could work for them.
>
> > - have a compiler rewrite rule which adds a `.seq` call to the
> > generator if it knows it is a collection, but not an instance of
> > `Parallel` at compile time:
>
> > for (x <- Iterable(1, 2, 3)) { ... } becomes for (x <- Iterable(1, 2,
> > 3).seq)
>
> > but:
>
> > for (x <- Iterable(1, 2, 3).par) { ... } stays what it is
>
> > - a non-rewriting approach - have a parallel block which envelops the
> > for-comprehension, ensuring that everything inside executes in
> > parallel:
>
> > parallel {
> >  for ( ... ) {
> >    // ...
> >  }
> > }
>
> > This requires the user to be more explicit. The downside is that
> > nested methods called in the for-body might be affected.
>
> --
> Bill Venners
> Artima, Inc.http://www.artima.com

extempore
Joined: 2008-12-17,
User offline. Last seen 35 weeks 3 days ago.
Re: Re: REPL: null-values in parallel ranges?

An implicit with a default which communicates what kinds of
auto-parallelizing are acceptable seems like it could have something for
everyone, even if it's kind of scary to contemplate. Scariness aside,
the idea of guiding for-comprehension desugaring with my choice of
imports[1][2] is pretty awesome.

[1] Though not doable as described anytime soon, because right now the
compiler translates for comprehensions during parsing and has no idea
what implicits are applicable.

[2] Also not doable anytime soon because martin is having a mild heart
attack reading this idea.

Bill Venners
Joined: 2008-12-18,
User offline. Last seen 31 weeks 5 days ago.
Re: Re: REPL: null-values in parallel ranges?

Hi Martin,

On Mon, Apr 4, 2011 at 9:19 AM, martin odersky wrote:
>
>
> On Mon, Apr 4, 2011 at 5:33 PM, Bill Venners wrote:
>>
>> Hi All,
>>
>> I'd like to take a step back and state two meta-goals I'd hope for
>> about the introduction of parallel collections to Scala. One is that
>> great pain should be taken to avoid breaking existing code, but that
>> it is OK if some existing code breaks (so long as it isn't much) if
>> that's unavoidable to do things the "right way." The other is parallel
>> collections should be done the "right way." I think it is early enough
>> in Scala's adoption curve to do things the right way, and the feeling
>> I get from the Scala community is that's what people by and large
>> would want at this point (so long as it doesn't break much code).
>>
>> I think the right way is that types Traversable, Iterable, Seq, Map,
>> and Set should allow sequential or parallel implementations. These are
>> the types I want to put in my method signatures, and I think the
>> default for Scala should be that parallel execution of the
>> higher-order methods is possible. This is what the future needs
>> because of multi-core.
>>
>
> If we were starting out, I'd agree with you. But the fact is that far too
> much code exists already that treats Seq, Iterable and friends as
> sequential. I believe we can't change that contract anymore. So we need to
> live with the naming convention that, e.g., Iterable is always sequential
> whereas ParIterable can be processed in parallel. I wish I could change
> that, but believe it's too late for that.
>
I wonder if it would help if instead of naming the types like
"scala.collection.ParMap" they were named like
"scala.collection.par.Map." The problem I have with these names as
Aleksandar described them is that they seem non-intuitive, and that's
un-scala-like. scala.collection.Map sounds like the general concept,
whereas scala.collection.ParMap sounds like a specialization, and
moreover the specialization it sounds like is "parallel Map". But Map
means sequential map and ParMap means possibly parallel Map, but could
be sequential. From the Liskov perspective, it does make sense that
Map extends ParMap, but from the English perspective it does not. All
RipeBananas are Bananas, but not all Bananas are RipeBananas. With Map
and ParMap its the opposite. And the type of a definitely parallel Map
I'm guessing would be ParMap with Parallel, which seems redundant and
misleading.

If it is par.Map instead of ParMap, then it sounds more natural from
an English perspective: like this kind of Map is a subtype of that
kind of Map. Also I can import scala.collection._ and use Map and
par.Map in my code, kind of like we can use mutable.Map and
immutable.Map. If I import scala.collection.par._, and I want to say a
definitely parallel Map, I simply write Map with Parallel, which looks
nicer. Actually I'm not sure this is important to be able to write
definitely parallel map, but seems like if I want to use these for
performance optimization, I'd want to be sure I'm actually using a
parallel one. If this were really useful, would it make sense to also
declare scala.collection.par.ParallelMap to mean definitely parallel
map?

The justification given for scala.collection.Map being the default
type can be that sequential collections are much simpler and safer to
use, so they should be the default. (Another big reason is backwards
compatibility, but even without that, I've come around to the view
that sequential should be the starting point for the reasons others on
this list have argued.)

Another concern that has been raised is the problem of type inference.
If I call into a method that was returning a Map and place the result
in a variable with no explicit type declaration, then later that
methods gets changed to return a ParMap (or par.Map), my code will
likely still compile, because the APIs are the same. This is
essentially an unintentional upcasting that to a possibly parallel
Map, which would compile but possibly create those scary bugs we're
worried about. I wonder if the compiler could optionally flag a
warning if a variable is being inferred to a possibly parallel type.
The way to get rid of the warning, other than telling he compiler not
to give it with a compiler option, would be to declare the type
explicitly in the code. That would make it more obvious to the writer
and later readers of the code that it's possibly parallel.

>> Sequential should be the special case, not parallel, so the marker
>> trait should probably be Sequential, not Parallel. Thus if I need a
>> sequential map to be passed to my method, I'd put Map with Sequential.
>>
>> But that would and should be rare. Usually I'd just say I want a Map,
>> and write it to make allow parallel or sequential collections passed.
>> And I think Scala makes that easy, so long as foreach is not run in
>> parallel, not matter what type of collection comes in.
>>
>> Foreach's contract in Traversable should be tightened to say it will
>> run sequentially and a pforeach added.
>
> But that's what I think is the wrong way. The collections design explicitly
> says that to get parallel operations, you write cs.par, to get sequential
> ones you write pxs.seq. It does not introduce any new variants of existing
> operations that are parallel. Making foreach sequential and introducing
> pforeach is like admitting that this design works only 95% of the time. And
> it does not exclude errors because people do use side effects in for
> expressions as well as in for loops.
> For instance:
>
>   var sum = 0
>   for (x <- xs) yield { sum += x; sum }
>
> (Yes, I know about scanLeft, but many people will use the for expression
> instead).
>
>  That is, if we introduce pforeach I guarantee that the next person will
> immediately demand that we also make map, flatMap, withFilter sequential and
> have pmap, pflatMap, pWithFilter. So muddling the waters would lead us down
> a slippery slide.
>
Water slides can be fun, but as BDFL you can also just say no. And I
think you'd hear other voices from the community that would back you
up in your no to pmap. I definitely would not want to see a pmap, even
though I would like to see a pforeach accompanying a sequential
foreach.

Bill

> Cheers
>
>  -- Martin
>
>

H-star Development
Joined: 2010-04-14,
User offline. Last seen 2 years 26 weeks ago.
Re: Re: REPL: null-values in parallel ranges?

in scala, using the language itself it preferred to introducing new keywords

-------- Original-Nachricht --------
> Datum: Mon, 4 Apr 2011 21:05:15 -0700 (PDT)
> Von: ScottC
> An: scala-user
> Betreff: [scala-user] Re: REPL: null-values in parallel ranges?

> New to Scala. I expect the below suggestion won't be well liked. But
> maybe it provokes some better ideas:
>
> If splitting foreach in two, a sequential foreach and a parallel
> parforeach, for expressions may be limited in their parallel potential
> unless the compiler/generator is too 'magical' for its own good.
>
> If parallelism is meant to be a new fundamental aspect in the
> language, I think it needs to be exposed in more than a method name.
>
> you can write:
>
> for (arg <- args)
> println(arg)
>
> and your body will have guaranteed sequential execution, and you can
> write the below to allow parallel execution but not force it:
>
> pfor (arg <- args)
> println(arg)
>
> My experience with concurrent programming suggests that there will be
> many times a user wishes to guarantee sequential execution, and others
> where they want to get the most parallelism as possible. This is
> both for users and library writers. Sometimes as a user, I don't
> want to eat up multiple threads, even if it is faster in isolation.
> Sometimes sequential execution has higher throughput in aggregate with
> other work (but higher latency). Other times, parallelism is
> essential.
>
> The above addition to for comprehensions makes the choice very
> explicit. How fundamental should this be? I have no idea, but if
> concurrency is going to be at the heart of the language then a
> language construct might make sense. Underneath the covers, the two
> generators could do many things slightly differently -- map and
> flatMap as well as foreach.
>
>
> On Apr 4, 1:12 am, Bill Venners wrote:
> > Hi Aleksandar,
> >
> > On Mon, Apr 4, 2011 at 12:51 AM, Aleksandar Prokopec
> >
> >
> >
> >
> >
> >
> >
> > wrote:
> > > On 02/04/11 23:02, Jonas Bonér wrote:
> >
> > >> I don't think parallel collections should "lie" and provide a leaky
> abstraction over concurrency. When someone asks for parallel execution I
> think it is best that they get what they are asking for. Sure it might break
> sequential code, but threads does that; they turn determinism into
> non-determinism. Let's embrace parallelism at its core and reap the benefits of it
> rather than trying to force it into something that it is not. It is a matter
> of education; documentation needs to be a lot better. People needs to
> understand that concurrency and parallelism is here to stay and a lot of the
> time it still ain't that pretty.
> >
> > > I agree that people should get used to the fact that parallelism is
> > > out there. The only downside here is that if one really wants to be
> > > sure the code always executes correctly, a user has to write:
> >
> > > for (x <- xs.seq) ...
> >
> > > all the time.
> >
> > > The problem is that someone might not know he's asking for parallel
> > > execution.
> >
> > >> ...
> > >> def oldMethod(seq: Seq[String]) { for (e <- seq) println(e) }
> >
> > >> But new code that really wants parallel execution could just do .par
> > >> on that Seq[T], which would somehow kick it into parallel execution.
> >
> > >> def newMethod(pSeq: Seq[String] with Parallel) { for (e <- pSeq)
> println(e) }
> >
> > >> By saying the method takes a Seq[String] with Parallel, it is
> > >> announcing that it might do foreaches in parallel.
> >
> > >> I think it would be fine to let flatMap, filter, withFilter, map be
> > >> rewritten in the same way, because most times people wouldn't have
> > >> side effects in those. But foreach is all about the side effects.
> >
> > >> Bill
> >
> > > I agree something like a compiler rewrite might possibly be a good
> > > idea.
> > > However, adding an additional `pareach` method means for-
> > > comprehensions no longer work.
> >
> > Actually after sleeping on it last night, I emailed a different note
> > this morning in which I changed my tune. I suggested the rules for
> > translating for expressions to invocations of foreach, withFilter,
> > map, and flatMap should stay the same as they are now. For expressions
> > are about more than just collections.
> >
> > Now to your suggestion that adding a separate pareach method would
> > make for expressions not work, I'd say they still work like they
> > always did, but just that foreach is not something that will be done
> > in parallel in parallel collections, whereas map, flatMap, and
> > withFilter (and a bunch of other higher order methods on collections)
> > will be done in parallel on parallel collections. That would mean you
> > could never get a parallel foreach out of a for expression, and I
> > don't think that's so bad. A parallel foreach requires a thread-safe
> > side-effecting function, which is hard to write. So making the method
> > name long and (to some) ugly, such as parallelForeach or
> > foreachParallel, discourages its use for frivolous endeavors, and so
> > does not providing a way to get a parallel foreach out of a for
> > expression.
> >
> > Bill
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > > A couple of alternatives I see are:
> > > - have `foreach` being sequential, and add a method `inPar` to
> > > parallel collections, which returns a thin wrapper around the
> > > collection such that its `foreach` is executed in parallel (forwards
> > > to `pareach`). The compiler could then rewrite the for-comprehension
> > > like this if it detects that the generator isInstanceOf[Parallel] at
> > > compile time:
> >
> > > for (x <- xs) { ... }
> >
> > > <>
> >
> > > --> for (x <- xs.inPar) { ... }
> >
> > > (now it calls `foreach` of the thin wrapper which forwards to
> > > `pareach`)
> >
> > > This does not solve the problem with map, flatMap, filter, and
> > > friends, but it is a start - something like this could work for them.
> >
> > > - have a compiler rewrite rule which adds a `.seq` call to the
> > > generator if it knows it is a collection, but not an instance of
> > > `Parallel` at compile time:
> >
> > > for (x <- Iterable(1, 2, 3)) { ... } becomes for (x <- Iterable(1, 2,
> > > 3).seq)
> >
> > > but:
> >
> > > for (x <- Iterable(1, 2, 3).par) { ... } stays what it is
> >
> > > - a non-rewriting approach - have a parallel block which envelops the
> > > for-comprehension, ensuring that everything inside executes in
> > > parallel:
> >
> > > parallel {
> > >  for ( ... ) {
> > >    // ...
> > >  }
> > > }
> >
> > > This requires the user to be more explicit. The downside is that
> > > nested methods called in the for-body might be affected.
> >
> > --
> > Bill Venners
> > Artima, Inc.http://www.artima.com

Aleksandar Prokopec
Joined: 2010-04-04,
User offline. Last seen 42 years 45 weeks ago.
Re: REPL: null-values in parallel ranges?

I agree that the name ParMap may be somewhat misleading as a supertype
of Map. However, these are orthogonal to the class hierarchy changes.
Once we decide what is the most appropriate name, we can easily rename
the methods to something else.

As for the compiler warning that a return type is now a parallel
collection, I believe a library designer could use migration warnings
in these cases, or a similar mechanism, along with explicit type
annotations on the client's part.

On Apr 5, 7:39 am, Bill Venners wrote:
> Hi Martin,
>
>
>
> On Mon, Apr 4, 2011 at 9:19 AM, martin odersky wrote:
>
> > On Mon, Apr 4, 2011 at 5:33 PM, Bill Venners wrote:
>
> >> Hi All,
>
> >> I'd like to take a step back and state two meta-goals I'd hope for
> >> about the introduction of parallel collections to Scala. One is that
> >> great pain should be taken to avoid breaking existing code, but that
> >> it is OK if some existing code breaks (so long as it isn't much) if
> >> that's unavoidable to do things the "right way." The other is parallel
> >> collections should be done the "right way." I think it is early enough
> >> in Scala's adoption curve to do things the right way, and the feeling
> >> I get from the Scala community is that's what people by and large
> >> would want at this point (so long as it doesn't break much code).
>
> >> I think the right way is that types Traversable, Iterable, Seq, Map,
> >> and Set should allow sequential or parallel implementations. These are
> >> the types I want to put in my method signatures, and I think the
> >> default for Scala should be that parallel execution of the
> >> higher-order methods is possible. This is what the future needs
> >> because of multi-core.
>
> > If we were starting out, I'd agree with you. But the fact is that far too
> > much code exists already that treats Seq, Iterable and friends as
> > sequential. I believe we can't change that contract anymore. So we need to
> > live with the naming convention that, e.g., Iterable is always sequential
> > whereas ParIterable can be processed in parallel. I wish I could change
> > that, but believe it's too late for that.
>
> I wonder if it would help if instead of naming the types like
> "scala.collection.ParMap" they were named like
> "scala.collection.par.Map." The problem I have with these names as
> Aleksandar described them is that they seem non-intuitive, and that's
> un-scala-like. scala.collection.Map sounds like the general concept,
> whereas scala.collection.ParMap sounds like a specialization, and
> moreover the specialization it sounds like is "parallel Map". But Map
> means sequential map and ParMap means possibly parallel Map, but could
> be sequential. From the Liskov perspective, it does make sense that
> Map extends ParMap, but from the English perspective it does not. All
> RipeBananas are Bananas, but not all Bananas are RipeBananas. With Map
> and ParMap its the opposite. And the type of a definitely parallel Map
> I'm guessing would be ParMap with Parallel, which seems redundant and
> misleading.
>
> If it is par.Map instead of ParMap, then it sounds more natural from
> an English perspective: like this kind of Map is a subtype of that
> kind of Map. Also I can import scala.collection._ and use Map and
> par.Map in my code, kind of like we can use mutable.Map and
> immutable.Map. If I import scala.collection.par._, and I want to say a
> definitely parallel Map, I simply write Map with Parallel, which looks
> nicer. Actually I'm not sure this is important to be able to write
> definitely parallel map, but seems like if I want to use these for
> performance optimization, I'd want to be sure I'm actually using a
> parallel one. If this were really useful, would it make sense to also
> declare scala.collection.par.ParallelMap to mean definitely parallel
> map?
>
> The justification given for scala.collection.Map being the default
> type can be that sequential collections are much simpler and safer to
> use, so they should be the default. (Another big reason is backwards
> compatibility, but even without that, I've come around to the view
> that sequential should be the starting point for the reasons others on
> this list have argued.)
>
> Another concern that has been raised is the problem of type inference.
> If I call into a method that was returning a Map and place the result
> in a variable with no explicit type declaration, then later that
> methods gets changed to return a ParMap (or par.Map), my code will
> likely still compile, because the APIs are the same. This is
> essentially an unintentional upcasting that to a possibly parallel
> Map, which would compile but possibly create those scary bugs we're
> worried about. I wonder if the compiler could optionally flag a
> warning if a variable is being inferred to a possibly parallel type.
> The way to get rid of the warning, other than telling he compiler not
> to give it with a compiler option, would be to declare the type
> explicitly in the code. That would make it more obvious to the writer
> and later readers of the code that it's possibly parallel.
>
>
>
> >> Sequential should be the special case, not parallel, so the marker
> >> trait should probably be Sequential, not Parallel. Thus if I need a
> >> sequential map to be passed to my method, I'd put Map with Sequential.
>
> >> But that would and should be rare. Usually I'd just say I want a Map,
> >> and write it to make allow parallel or sequential collections passed.
> >> And I think Scala makes that easy, so long as foreach is not run in
> >> parallel, not matter what type of collection comes in.
>
> >> Foreach's contract in Traversable should be tightened to say it will
> >> run sequentially and a pforeach added.
>
> > But that's what I think is the wrong way. The collections design explicitly
> > says that to get parallel operations, you write cs.par, to get sequential
> > ones you write pxs.seq. It does not introduce any new variants of existing
> > operations that are parallel. Making foreach sequential and introducing
> > pforeach is like admitting that this design works only 95% of the time. And
> > it does not exclude errors because people do use side effects in for
> > expressions as well as in for loops.
> > For instance:
>
> >   var sum = 0
> >   for (x <- xs) yield { sum += x; sum }
>
> > (Yes, I know about scanLeft, but many people will use the for expression
> > instead).
>
> >  That is, if we introduce pforeach I guarantee that the next person will
> > immediately demand that we also make map, flatMap, withFilter sequential and
> > have pmap, pflatMap, pWithFilter. So muddling the waters would lead us down
> > a slippery slide.
>
> Water slides can be fun, but as BDFL you can also just say no. And I
> think you'd hear other voices from the community that would back you
> up in your no to pmap. I definitely would not want to see a pmap, even
> though I would like to see a pforeach accompanying a sequential
> foreach.
>
> Bill
>
> > Cheers
>
> >  -- Martin
>
> --
> Bill Venners
> Artima, Inc.http://www.artima.com

Tony Morris 2
Joined: 2009-03-20,
User offline. Last seen 42 years 45 weeks ago.
Re: Re: REPL: null-values in parallel ranges?

...to some extent.

The following can easily be implemented in the language as a library,
but are not:
* if/else
* try/catch
* finally
* while
* for, to some degree (it would be much more useful to alter the last
call to flatMap in a comprehension and not map).

On 05/04/11 16:53, Dennis Haupt wrote:
> in scala, using the language itself it preferred to introducing new keywords
>
>
> -------- Original-Nachricht --------
>> Datum: Mon, 4 Apr 2011 21:05:15 -0700 (PDT)
>> Von: ScottC
>> An: scala-user
>> Betreff: [scala-user] Re: REPL: null-values in parallel ranges?
>> New to Scala. I expect the below suggestion won't be well liked. But
>> maybe it provokes some better ideas:
>>
>> If splitting foreach in two, a sequential foreach and a parallel
>> parforeach, for expressions may be limited in their parallel potential
>> unless the compiler/generator is too 'magical' for its own good.
>>
>> If parallelism is meant to be a new fundamental aspect in the
>> language, I think it needs to be exposed in more than a method name.
>>
>> you can write:
>>
>> for (arg <- args)
>> println(arg)
>>
>> and your body will have guaranteed sequential execution, and you can
>> write the below to allow parallel execution but not force it:
>>
>> pfor (arg <- args)
>> println(arg)
>>
>> My experience with concurrent programming suggests that there will be
>> many times a user wishes to guarantee sequential execution, and others
>> where they want to get the most parallelism as possible. This is
>> both for users and library writers. Sometimes as a user, I don't
>> want to eat up multiple threads, even if it is faster in isolation.
>> Sometimes sequential execution has higher throughput in aggregate with
>> other work (but higher latency). Other times, parallelism is
>> essential.
>>
>> The above addition to for comprehensions makes the choice very
>> explicit. How fundamental should this be? I have no idea, but if
>> concurrency is going to be at the heart of the language then a
>> language construct might make sense. Underneath the covers, the two
>> generators could do many things slightly differently -- map and
>> flatMap as well as foreach.
>>
>>
>> On Apr 4, 1:12 am, Bill Venners wrote:
>>> Hi Aleksandar,
>>>
>>> On Mon, Apr 4, 2011 at 12:51 AM, Aleksandar Prokopec
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> wrote:
>>>> On 02/04/11 23:02, Jonas Bonér wrote:
>>>>> I don't think parallel collections should "lie" and provide a leaky
>> abstraction over concurrency. When someone asks for parallel execution I
>> think it is best that they get what they are asking for. Sure it might break
>> sequential code, but threads does that; they turn determinism into
>> non-determinism. Let's embrace parallelism at its core and reap the benefits of it
>> rather than trying to force it into something that it is not. It is a matter
>> of education; documentation needs to be a lot better. People needs to
>> understand that concurrency and parallelism is here to stay and a lot of the
>> time it still ain't that pretty.
>>>> I agree that people should get used to the fact that parallelism is
>>>> out there. The only downside here is that if one really wants to be
>>>> sure the code always executes correctly, a user has to write:
>>>> for (x <- xs.seq) ...
>>>> all the time.
>>>> The problem is that someone might not know he's asking for parallel
>>>> execution.
>>>>> ...
>>>>> def oldMethod(seq: Seq[String]) { for (e <- seq) println(e) }
>>>>> But new code that really wants parallel execution could just do .par
>>>>> on that Seq[T], which would somehow kick it into parallel execution.
>>>>> def newMethod(pSeq: Seq[String] with Parallel) { for (e <- pSeq)
>> println(e) }
>>>>> By saying the method takes a Seq[String] with Parallel, it is
>>>>> announcing that it might do foreaches in parallel.
>>>>> I think it would be fine to let flatMap, filter, withFilter, map be
>>>>> rewritten in the same way, because most times people wouldn't have
>>>>> side effects in those. But foreach is all about the side effects.
>>>>> Bill
>>>> I agree something like a compiler rewrite might possibly be a good
>>>> idea.
>>>> However, adding an additional `pareach` method means for-
>>>> comprehensions no longer work.
>>> Actually after sleeping on it last night, I emailed a different note
>>> this morning in which I changed my tune. I suggested the rules for
>>> translating for expressions to invocations of foreach, withFilter,
>>> map, and flatMap should stay the same as they are now. For expressions
>>> are about more than just collections.
>>>
>>> Now to your suggestion that adding a separate pareach method would
>>> make for expressions not work, I'd say they still work like they
>>> always did, but just that foreach is not something that will be done
>>> in parallel in parallel collections, whereas map, flatMap, and
>>> withFilter (and a bunch of other higher order methods on collections)
>>> will be done in parallel on parallel collections. That would mean you
>>> could never get a parallel foreach out of a for expression, and I
>>> don't think that's so bad. A parallel foreach requires a thread-safe
>>> side-effecting function, which is hard to write. So making the method
>>> name long and (to some) ugly, such as parallelForeach or
>>> foreachParallel, discourages its use for frivolous endeavors, and so
>>> does not providing a way to get a parallel foreach out of a for
>>> expression.
>>>
>>> Bill
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>> A couple of alternatives I see are:
>>>> - have `foreach` being sequential, and add a method `inPar` to
>>>> parallel collections, which returns a thin wrapper around the
>>>> collection such that its `foreach` is executed in parallel (forwards
>>>> to `pareach`). The compiler could then rewrite the for-comprehension
>>>> like this if it detects that the generator isInstanceOf[Parallel] at
>>>> compile time:
>>>> for (x <- xs) { ... }
>>>> <>
>>>> --> for (x <- xs.inPar) { ... }
>>>> (now it calls `foreach` of the thin wrapper which forwards to
>>>> `pareach`)
>>>> This does not solve the problem with map, flatMap, filter, and
>>>> friends, but it is a start - something like this could work for them.
>>>> - have a compiler rewrite rule which adds a `.seq` call to the
>>>> generator if it knows it is a collection, but not an instance of
>>>> `Parallel` at compile time:
>>>> for (x <- Iterable(1, 2, 3)) { ... } becomes for (x <- Iterable(1, 2,
>>>> 3).seq)
>>>> but:
>>>> for (x <- Iterable(1, 2, 3).par) { ... } stays what it is
>>>> - a non-rewriting approach - have a parallel block which envelops the
>>>> for-comprehension, ensuring that everything inside executes in
>>>> parallel:
>>>> parallel {
>>>> for ( ... ) {
>>>> // ...
>>>> }
>>>> }
>>>> This requires the user to be more explicit. The downside is that
>>>> nested methods called in the for-body might be affected.
>>> --
>>> Bill Venners
>>> Artima, Inc.http://www.artima.com

Viktor Klang
Joined: 2008-12-17,
User offline. Last seen 1 year 27 weeks ago.
Re: Re: REPL: null-values in parallel ranges?
This is just an idea, but is there value in a parallel-comprehension?

par(x <- xs) x.foo

On Tue, Apr 5, 2011 at 12:14 PM, Tony Morris <tonymorris [at] gmail [dot] com> wrote:
...to some extent.

The following can easily be implemented in the language as a library,
but are not:
* if/else
* try/catch
* finally
* while
* for, to some degree (it would be much more useful to alter the last
call to flatMap in a comprehension and not map).


On 05/04/11 16:53, Dennis Haupt wrote:
> in scala, using the language itself it preferred to introducing new keywords
>
>
> -------- Original-Nachricht --------
>> Datum: Mon, 4 Apr 2011 21:05:15 -0700 (PDT)
>> Von: ScottC <scott [dot] carey [at] gmail [dot] com>
>> An: scala-user <scala-user [at] googlegroups [dot] com>
>> Betreff: [scala-user] Re: REPL: null-values in parallel ranges?
>> New to Scala.  I expect the below suggestion won't be well liked.  But
>> maybe it provokes some better ideas:
>>
>> If splitting foreach in two, a sequential foreach and a parallel
>> parforeach, for expressions may be limited in their parallel potential
>> unless the compiler/generator is too 'magical' for its own good.
>>
>> If parallelism is meant to be a new fundamental aspect in the
>> language, I think it needs to be exposed in more than a method name.
>>
>> you can write:
>>
>>   for (arg <- args)
>>     println(arg)
>>
>> and your body will have guaranteed sequential execution, and you can
>> write the below to allow parallel execution but not force it:
>>
>>   pfor (arg <- args)
>>     println(arg)
>>
>> My experience with concurrent programming suggests that there will be
>> many times a user wishes to guarantee sequential execution, and others
>> where they want to get the most parallelism as possible.   This is
>> both for users and library writers.   Sometimes as a user, I don't
>> want to eat up multiple threads, even if it is faster in isolation.
>> Sometimes sequential execution has higher throughput in aggregate with
>> other work (but higher latency).  Other times, parallelism is
>> essential.
>>
>> The above addition to for comprehensions makes the choice very
>> explicit.  How fundamental should this be?  I have no idea, but if
>> concurrency is going to be at the heart of the language then a
>> language construct might make sense.  Underneath the covers, the two
>> generators could do many things slightly differently -- map and
>> flatMap as well as foreach.
>>
>>
>> On Apr 4, 1:12 am, Bill Venners <b [dot] [dot] [dot] [at] artima [dot] com> wrote:
>>> Hi Aleksandar,
>>>
>>> On Mon, Apr 4, 2011 at 12:51 AM, Aleksandar Prokopec
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> <aleksandar [dot] proko [dot] [dot] [dot] [at] gmail [dot] com> wrote:
>>>> On 02/04/11 23:02, Jonas Bonér wrote:
>>>>> I don't think parallel collections should "lie" and provide a leaky
>> abstraction over concurrency. When someone asks for parallel execution I
>> think it is best that they get what they are asking for. Sure it might break
>> sequential code, but threads does that; they turn determinism into
>> non-determinism. Let's embrace parallelism at its core and reap the benefits of it
>> rather than trying to force it into something that it is not. It is a matter
>> of education; documentation needs to be a lot better. People needs to
>> understand that concurrency and parallelism is here to stay and a lot of the
>> time it still ain't that pretty.
>>>> I agree that people should get used to the fact that parallelism is
>>>> out there. The only downside here is that if one really wants to be
>>>> sure the code always executes correctly, a user has to write:
>>>> for (x <- xs.seq) ...
>>>> all the time.
>>>> The problem is that someone might not know he's asking for parallel
>>>> execution.
>>>>> ...
>>>>> def oldMethod(seq: Seq[String]) { for (e <- seq) println(e) }
>>>>> But new code that really wants parallel execution could just do .par
>>>>> on that Seq[T], which would somehow kick it into parallel execution.
>>>>> def newMethod(pSeq: Seq[String] with Parallel) { for (e <- pSeq)
>> println(e) }
>>>>> By saying the method takes a Seq[String] with Parallel, it is
>>>>> announcing that it might do foreaches in parallel.
>>>>> I think it would be fine to let flatMap, filter, withFilter, map be
>>>>> rewritten in the same way, because most times people wouldn't have
>>>>> side effects in those. But foreach is all about the side effects.
>>>>> Bill
>>>> I agree something like a compiler rewrite might possibly be a good
>>>> idea.
>>>> However, adding an additional `pareach` method means for-
>>>> comprehensions no longer work.
>>> Actually after sleeping on it last night, I emailed a different note
>>> this morning in which I changed my tune. I suggested the rules for
>>> translating for expressions to invocations of foreach, withFilter,
>>> map, and flatMap should stay the same as they are now. For expressions
>>> are about more than just collections.
>>>
>>> Now to your suggestion that adding a separate pareach method would
>>> make for expressions not work, I'd say they still work like they
>>> always did, but just that foreach is not something that will be done
>>> in parallel in parallel collections, whereas map, flatMap, and
>>> withFilter (and a bunch of other higher order methods on collections)
>>> will be done in parallel on parallel collections. That would mean you
>>> could never get a parallel foreach out of a for expression, and I
>>> don't think that's so bad. A parallel foreach requires a thread-safe
>>> side-effecting function, which is hard to write. So making the method
>>> name long and (to some) ugly, such as parallelForeach or
>>> foreachParallel, discourages its use for frivolous endeavors, and so
>>> does not providing a way to get a parallel foreach out of a for
>>> expression.
>>>
>>> Bill
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>> A couple of alternatives I see are:
>>>> - have `foreach` being sequential, and add a method `inPar` to
>>>> parallel collections, which returns a thin wrapper around the
>>>> collection such that its `foreach` is executed in parallel (forwards
>>>> to `pareach`). The compiler could then rewrite the for-comprehension
>>>> like this if it detects that the generator isInstanceOf[Parallel] at
>>>> compile time:
>>>> for (x <- xs) { ... }
>>>> <<if xs is <: Parallel at compile time >>
>>>> --> for (x <- xs.inPar) { ... }
>>>> (now it calls `foreach` of the thin wrapper which forwards to
>>>> `pareach`)
>>>> This does not solve the problem with map, flatMap, filter, and
>>>> friends, but it is a start - something like this could work for them.
>>>> - have a compiler rewrite rule which adds a `.seq` call to the
>>>> generator if it knows it is a collection, but not an instance of
>>>> `Parallel` at compile time:
>>>> for (x <- Iterable(1, 2, 3)) { ... } becomes for (x <- Iterable(1, 2,
>>>> 3).seq)
>>>> but:
>>>> for (x <- Iterable(1, 2, 3).par) { ... } stays what it is
>>>> - a non-rewriting approach - have a parallel block which envelops the
>>>> for-comprehension, ensuring that everything inside executes in
>>>> parallel:
>>>> parallel {
>>>>  for ( ... ) {
>>>>    // ...
>>>>  }
>>>> }
>>>> This requires the user to be more explicit. The downside is that
>>>> nested methods called in the for-body might be affected.
>>> --
>>> Bill Venners
>>> Artima, Inc.http://www.artima.com


--
Tony Morris
http://tmorris.net/





--
Viktor Klang,
Director of Research and Development
Scalable Solutions

Code:   github.com/viktorklang
Follow: twitter.com/viktorklang
Read:   klangism.tumblr.com

odersky
Joined: 2008-07-29,
User offline. Last seen 45 weeks 6 days ago.
Re: Re: REPL: null-values in parallel ranges?


On Tue, Apr 5, 2011 at 9:18 AM, Aleksandar Prokopec <aleksandar [dot] prokopec [at] gmail [dot] com> wrote:
I agree that the name ParMap may be somewhat misleading as a supertype
of Map. However, these are orthogonal to the class hierarchy changes.
Once we decide what is the most appropriate name, we can easily rename
the methods to something else.

I think we should find a different name than ParSeq, ParMap for the supertraits of Seq and Map. E.g., ParVector means Vector with a parallel implementation, whereas for Seq, Map we want to say it might be parallel but need not be, which is different. How about PMap, PSeq, ... instead. Where the P could mean "Potentially Parallel". Not sure about that either but it seems an improvement over ParMap. Do people have another suggestions?

Cheers

 -- Martin

Peter C. Chapin 2
Joined: 2011-01-07,
User offline. Last seen 42 years 45 weeks ago.
Re: Re: REPL: null-values in parallel ranges?

On Tue, 5 Apr 2011, martin odersky wrote:

> I think we should find a different name than ParSeq, ParMap for the
> supertraits of Seq and Map. E.g., ParVector means Vector with a parallel
> implementation, whereas for Seq, Map we want to say it might be parallel
> but need not be, which is different. How about PMap, PSeq, ... instead.
> Where the P could mean "Potentially Parallel". Not sure about that either
> but it seems an improvement over ParMap. Do people have another
> suggestions?

Would some name involving "non deterministic" (or an abbreviation) make
sense? That doesn't seem quite right either although I suppose it would
depend on just what the semantic contract of the new traits becomes.

Peter

Viktor Klang
Joined: 2008-12-17,
User offline. Last seen 1 year 27 weeks ago.
Re: Re: REPL: null-values in parallel ranges?


On Tue, Apr 5, 2011 at 11:25 PM, martin odersky <martin [dot] odersky [at] epfl [dot] ch> wrote:


On Tue, Apr 5, 2011 at 9:18 AM, Aleksandar Prokopec <aleksandar [dot] prokopec [at] gmail [dot] com> wrote:
I agree that the name ParMap may be somewhat misleading as a supertype
of Map. However, these are orthogonal to the class hierarchy changes.
Once we decide what is the most appropriate name, we can easily rename
the methods to something else.

I think we should find a different name than ParSeq, ParMap for the supertraits of Seq and Map. E.g., ParVector means Vector with a parallel implementation, whereas for Seq, Map we want to say it might be parallel but need not be, which is different. How about PMap, PSeq, ... instead. Where the P could mean "Potentially Parallel". Not sure about that either but it seems an improvement over ParMap. Do people have another suggestions?

Parallelizable?
 

Cheers

 -- Martin




--
Viktor Klang,
Director of Research and Development
Scalable Solutions

Code:   github.com/viktorklang
Follow: twitter.com/viktorklang
Read:   klangism.tumblr.com

Bill Venners
Joined: 2008-12-18,
User offline. Last seen 31 weeks 5 days ago.
Re: Re: REPL: null-values in parallel ranges?

Hi Martin,

On Tue, Apr 5, 2011 at 2:25 PM, martin odersky wrote:
>
>
> On Tue, Apr 5, 2011 at 9:18 AM, Aleksandar Prokopec
> wrote:
>>
>> I agree that the name ParMap may be somewhat misleading as a supertype
>> of Map. However, these are orthogonal to the class hierarchy changes.
>> Once we decide what is the most appropriate name, we can easily rename
>> the methods to something else.
>>
> I think we should find a different name than ParSeq, ParMap for the
> supertraits of Seq and Map. E.g., ParVector means Vector with a parallel
> implementation, whereas for Seq, Map we want to say it might be parallel but
> need not be, which is different. How about PMap, PSeq, ... instead. Where
> the P could mean "Potentially Parallel". Not sure about that either but it
> seems an improvement over ParMap. Do people have another suggestions?
>
I'm not sure if you noticed my email from last night (if not, search
for RipeBananas). I suggested:

scala.collection.par.Map // possibly parallel map
scala.collection.par.ParMap // definitely parallel map, extends par.Map
scala.collection.Map // sequential map, extends par.Map

Not sure if you want a type that means definitely parallel map, or to
introduce another package name. But the concern I had about the
possibly parallel map having prefix like Par or P was that ParMap from
the English perspective sounds like it would extend Map, but it's the
other way around. The same would be true for PMap. I think if you put
whatever you come up with to mean possibly parallel as a subpackage
name that might help improve that.

Bill

> Cheers
>
>  -- Martin
>
>

Bill Venners
Joined: 2008-12-18,
User offline. Last seen 31 weeks 5 days ago.
Re: Re: REPL: null-values in parallel ranges - Seq.foreach acti

Hi Aleksandar,

I have an off-the-wall idea for parallel foreach. I do like the
consistency that letting foreach be parallel like all the other higher
order methods on parallel collections has, but I worry it is error
prone. I think it would be safer if by default foreach was sequential,
but there was some other way to get a parallel foreach if you want
one. pforeach is one way, but it feels inconsistent. The other problem
is pforeach can't be used from a for expression.

What if parallel collections by default did a sequential foreach, but
there was a method on parallel collections that you call to get
another parallel collection with everything the same but a parallel
foreach. So given a parallel array a, this would iterate sequentially:

for (i <- 1 until N)
println(a(i))

But this would execute a parallel foreach:

for (i <- (1 until N).pfor)
a(i) = a(i) * scale

Something like that. Sequential would still be the default on parallel
collections, which I think is frankly what people would really want
from foreach. It's simple to do the safe thing. But when you really
want a parallel foreach you can get it, and you don't have to tighten
the contract of foreach in Traversable or add pforeach to it. (Instead
you'd add a method like "pfor".) And you can use for expressions to
get parallel foreach.

Bill

On Mon, Apr 4, 2011 at 3:41 AM, Aleksandar Prokopec
wrote:
> Just to recap:
>
> Also, it additionally implicitly promises that 2 elements will not be
> traversed at the same time.
>
> ParSeq's are special in the sense that while they have an order
> implied by element indices, they process them in no prespecified
> order, and concurrently.
> Various methods, such as firstIndexOf, take that ordering into account
> (the first element in the ordering is found).
>
> Alex
>
>
> On Apr 4, 12:35 pm, Jim Balter wrote:
>> And I should have included the rest of it:
>>
>> "If the class is not ordered, foreach can visit elements in different orders
>> for different runs (but it will keep the same order in the same run)."
>>
>> So `foreach` promises to be deterministic within a run of a program, even
>> when no order is defined on the collection.
>>

Seth Tisue
Joined: 2008-12-16,
User offline. Last seen 34 weeks 3 days ago.
Re: Re: REPL: null-values in parallel ranges?

>>>>> "martin" == martin odersky writes:

martin> I think we should find a different name than ParSeq, ParMap for
martin> the supertraits of Seq and Map. [...] Do people have another
martin> suggestions?

AnySeq, AnyMap

Bill Venners
Joined: 2008-12-18,
User offline. Last seen 31 weeks 5 days ago.
Re: Re: REPL: null-values in parallel ranges?

Hi Seth,

On Tue, Apr 5, 2011 at 4:10 PM, Seth Tisue wrote:
>>>>>> "martin" == martin odersky writes:
>
>  martin> I think we should find a different name than ParSeq, ParMap for
>  martin> the supertraits of Seq and Map. [...]  Do people have another
>  martin> suggestions?
>
> AnySeq, AnyMap
>
These are nice. I think I like them better than my suggestion. ParMap
could still be the definitely parallel map, if they are going to have
such a type. Both Map and ParMap could extend AnyMap, all members of
the same package.

Bill

> --
> Seth Tisue | Northwestern University | http://tisue.net
> lead developer, NetLogo: http://ccl.northwestern.edu/netlogo/
>

Bernd Johannes
Joined: 2011-01-28,
User offline. Last seen 42 years 45 weeks ago.
Re: REPL: null-values in parallel ranges?

Am Mittwoch, 6. April 2011, 01:10:44 schrieb Seth Tisue:
> >>>>> "martin" == martin odersky writes:
> martin> I think we should find a different name than ParSeq, ParMap for
> martin> the supertraits of Seq and Map. [...] Do people have another
> martin> suggestions?
>
> AnySeq, AnyMap

+1
That one is good. It's intention is more obvious than PSeq and not as
misleading as ParSeq.

Greetings
Bernd

Naftoli Gugenheim
Joined: 2008-12-17,
User offline. Last seen 42 years 45 weeks ago.
Re: Re: REPL: null-values in parallel ranges?
I think someone had mentioned the word "scalable" as meaning "potentially parallel." Though it's a bit long.

2011/4/5 √iktor Ҡlang <viktor [dot] klang [at] gmail [dot] com>


On Tue, Apr 5, 2011 at 11:25 PM, martin odersky <martin [dot] odersky [at] epfl [dot] ch> wrote:


On Tue, Apr 5, 2011 at 9:18 AM, Aleksandar Prokopec <aleksandar [dot] prokopec [at] gmail [dot] com> wrote:
I agree that the name ParMap may be somewhat misleading as a supertype
of Map. However, these are orthogonal to the class hierarchy changes.
Once we decide what is the most appropriate name, we can easily rename
the methods to something else.

I think we should find a different name than ParSeq, ParMap for the supertraits of Seq and Map. E.g., ParVector means Vector with a parallel implementation, whereas for Seq, Map we want to say it might be parallel but need not be, which is different. How about PMap, PSeq, ... instead. Where the P could mean "Potentially Parallel". Not sure about that either but it seems an improvement over ParMap. Do people have another suggestions?

Parallelizable?
 

Cheers

 -- Martin




--
Viktor Klang,
Director of Research and Development
Scalable Solutions

Code:   github.com/viktorklang
Follow: twitter.com/viktorklang
Read:   klangism.tumblr.com


Aleksandar Prokopec
Joined: 2010-04-04,
User offline. Last seen 42 years 45 weeks ago.
Re: Re: REPL: null-values in parallel ranges?

+1

That sounds nice and familiar.

> martin> I think we should find a different name than ParSeq, ParMap for
> martin> the supertraits of Seq and Map. [...] Do people have another
> martin> suggestions?
>
> AnySeq, AnyMap
>

Aleksandar Prokopec
Joined: 2010-04-04,
User offline. Last seen 42 years 45 weeks ago.
Re: Re: REPL: null-values in parallel ranges - Seq.foreach acti
On 06/04/11 00:32, Bill Venners wrote:
tPJS5RTw [at] mail [dot] gmail [dot] com" type="cite">
Hi Aleksandar,

I have an off-the-wall idea for parallel foreach. I do like the
consistency that letting foreach be parallel like all the other higher
order methods on parallel collections has, but I worry it is error
prone. I think it would be safer if by default foreach was sequential,
but there was some other way to get a parallel foreach if you want
one. pforeach is one way, but it feels inconsistent. The other problem
is pforeach can't be used from a for expression.

What if parallel collections by default did a sequential foreach, but
there was a method on parallel collections that you call to get
another parallel collection with everything the same but a parallel
foreach. So given a parallel array a, this would iterate sequentially:

for (i <- 1 until N)
  println(a(i))

But this would execute a parallel foreach:

for (i <- (1 until N).pfor)
  a(i) = a(i) * scale

Something like that. Sequential would still be the default on parallel
collections, which I think is frankly what people would really want
from foreach. It's simple to do the safe thing. But when you really
want a parallel foreach you can get it, and you don't have to tighten
the contract of foreach in Traversable or add pforeach to it. (Instead
you'd add a method like "pfor".) And you can use for expressions to
get parallel foreach.

Hi Bill,

A parallel foreach does make sense in quite a few situations, and it can be used within a `for`. For example:

val cmap = new java.util.concurrent.ConcurrentHashMap[Int, Int]

for (i <- (0 until N).par) { // this iterates over a ParRange
  cmap.put(i, i)
}

or:

@volatile val flag = false

for (i <- (0 until N).par) {
  if (check(i)) flag = true
}

In fact, any kind of properly synchronized side-effect access is ok, provided that the side-effect is commutative and associative. In the above example, that is the case. Having an AtomicInteger, AtomicDouble, ..., and incrementing it; adding a disjunct set of items in a set that is properly synchronized, removing a disjunct set of items to filter out some collection, doing a query and logging it somewhere within a synchronized block ... these are all examples where you want to traverse things in parallel.

The issue with a parallel foreach is not so much the issue of the out-of-order execution, as it is the issue of concurrent execution.
So, there is a very good use case for a parallel foreach - not all side-effects are exclusive with parallelism.

The problem, of course, as you've put it is in the most common use-case. People have the mindset that foreach can be used with any kind of a side-effecting closure.
I am still unsure whether this use case is common enough to introduce a non-uniformity which says that foreach behave differently.

Alex

Maarten Hazewinkel
Joined: 2011-04-06,
User offline. Last seen 42 years 45 weeks ago.
Re: Re: REPL: null-values in parallel ranges?

> martin> I think we should find a different name than ParSeq, ParMap for
> martin> the supertraits of Seq and Map. [...] Do people have another
> martin> suggestions?
>
> AnySeq, AnyMap

I really like these names. Concise, meaningful and it reads nicely in the code as well.
+5

H-star Development
Joined: 2010-04-14,
User offline. Last seen 2 years 26 weeks ago.
Re: Re: REPL: null-values in parallel ranges?

it should be a name that a non-adept should understand and not be scared away. if i were new, i would wonder "what might a PMap be? and why is my perfectly normal map a pmap?"
but i don't have a better suggestion either :/

i assume we cannot name the single threaded collections S* and the parallel ones P*?

-------- Original-Nachricht --------
> Datum: Tue, 5 Apr 2011 23:25:06 +0200
> Von: martin odersky
> An: Aleksandar Prokopec
> CC: scala-user
> Betreff: Re: [scala-user] Re: REPL: null-values in parallel ranges?

> On Tue, Apr 5, 2011 at 9:18 AM, Aleksandar Prokopec <
> aleksandar [dot] prokopec [at] gmail [dot] com> wrote:
>
> > I agree that the name ParMap may be somewhat misleading as a supertype
> > of Map. However, these are orthogonal to the class hierarchy changes.
> > Once we decide what is the most appropriate name, we can easily rename
> > the methods to something else.
> >
> > I think we should find a different name than ParSeq, ParMap for the
> supertraits of Seq and Map. E.g., ParVector means Vector with a parallel
> implementation, whereas for Seq, Map we want to say it might be parallel
> but
> need not be, which is different. How about PMap, PSeq, ... instead. Where
> the P could mean "Potentially Parallel". Not sure about that either but it
> seems an improvement over ParMap. Do people have another suggestions?
>
> Cheers
>

Bill Venners
Joined: 2008-12-18,
User offline. Last seen 31 weeks 5 days ago.
Re: Re: REPL: null-values in parallel ranges - Seq.foreach acti

Hi Alex,

On Wed, Apr 6, 2011 at 12:03 AM, Aleksandar Prokopec
wrote:
> On 06/04/11 00:32, Bill Venners wrote:
>
> Hi Aleksandar,
>
> I have an off-the-wall idea for parallel foreach. I do like the
> consistency that letting foreach be parallel like all the other higher
> order methods on parallel collections has, but I worry it is error
> prone. I think it would be safer if by default foreach was sequential,
> but there was some other way to get a parallel foreach if you want
> one. pforeach is one way, but it feels inconsistent. The other problem
> is pforeach can't be used from a for expression.
>
> What if parallel collections by default did a sequential foreach, but
> there was a method on parallel collections that you call to get
> another parallel collection with everything the same but a parallel
> foreach. So given a parallel array a, this would iterate sequentially:
>
> for (i <- 1 until N)
> println(a(i))
>
> But this would execute a parallel foreach:
>
> for (i <- (1 until N).pfor)
> a(i) = a(i) * scale
>
> Something like that. Sequential would still be the default on parallel
> collections, which I think is frankly what people would really want
> from foreach. It's simple to do the safe thing. But when you really
> want a parallel foreach you can get it, and you don't have to tighten
> the contract of foreach in Traversable or add pforeach to it. (Instead
> you'd add a method like "pfor".) And you can use for expressions to
> get parallel foreach.
>
> Hi Bill,
>
> A parallel foreach does make sense in quite a few situations, and it can be
> used within a `for`. For example:
>
> val cmap = new java.util.concurrent.ConcurrentHashMap[Int, Int]
>
> for (i <- (0 until N).par) { // this iterates over a ParRange
>   cmap.put(i, i)
> }
>
> or:
>
> @volatile val flag = false
>
> for (i <- (0 until N).par) {
>   if (check(i)) flag = true
> }
>
> In fact, any kind of properly synchronized side-effect access is ok,
> provided that the side-effect is commutative and associative. In the above
> example, that is the case. Having an AtomicInteger, AtomicDouble, ..., and
> incrementing it; adding a disjunct set of items in a set that is properly
> synchronized, removing a disjunct set of items to filter out some
> collection, doing a query and logging it somewhere within a synchronized
> block ... these are all examples where you want to traverse things in
> parallel.
>
> The issue with a parallel foreach is not so much the issue of the
> out-of-order execution, as it is the issue of concurrent execution.
> So, there is a very good use case for a parallel foreach - not all
> side-effects are exclusive with parallelism.
>
> The problem, of course, as you've put it is in the most common use-case.
> People have the mindset that foreach can be used with any kind of a
> side-effecting closure.
> I am still unsure whether this use case is common enough to introduce a
> non-uniformity which says that foreach behave differently.
>
I'm not sure either. I have notions of how I'd like to use parallel
collections, but I don't know what I'll really think about how I want
to use them until I get some experience with them. Nor do I have any
idea how the community at large will end up wanting to use them.

One way I think people will want to use them is as performance
optimizations applied when there's a performance problem. In this use
case I think a parallel foreach like you have now is probably fine,
because the .par's would be applied very locally and rarely. This is
probably how I'll start out, by dipping my toe in by seeing if I can
improve performance issues.

But I have felt the promise of parallel collections is that they make
it much easier to write programs that take performance advantage of
multiple cores by using them more generally. Starting out I won't
change my Seq's to AnySeq's in type signatures. Until experience
proves that's the right thing to do I'd rather keep sequential types
the lingua franca of interfaces, and maybe we'll discover sequential
the right way to do signtatures. Don't know yet. But inside methods,
I'd like to try using parallel collections generally (a little bit at
a time, to see how it goes).

The reason is that (foreach excepted) I already program in a
thread-safe style when I call higher order methods on collections. I
didn't do it to be thread-safe, but just because that's how I think
about those methods. They are data transformers, and I pass in a
function that takes one kind of data and transforms it to another,
with no side effects. Or I pass in a predicate that evaluates input
and returns true or false, with no side effects. Now I may have in the
past written some existing code passed to a map, or in a foreach with
a yield, that has a side effect, because I wasn't really trying to
make sure I had no side effects before. It's possible, but if it
happened at all it didn't happen much. And I think it wouldn't be that
difficult for me to be careful not to have side effects in my
passed-in functions once I start using parallel collections more
generally (other than to foreach).

I do fairly regularly use for without yield (a for "loop"), because
sometimes I need to program side effects, and I want to avoid indexing
with while, or avoid recursion, etc. So I use a for loop, and in that
case, I'd prefer to not have to say .seq because my guess is that this
is the most common thing I'll want to do with a for loop. I.e., if
foreach is parallel like the other methods, the way it is in .RC1, I
can just do this when I want a sequential foreach on an otherwise
parallel collection:

for (e <- pList.seq)
println(e)

That's not so terrible, but if a sequential foreach that's what I want
to do most of the time, and a parallel foreach rarely, then I'd rather
have a way to get a type that has all higher order functions other
than foreach run in parallel, and foreach run sequentially, with a
method like .pfor I could call to get a fully parallel variant. And
the latter I would use more as a performance optimization:

@volatile val flag = false

// performance optimization
for (i <- (0 until N).par.pfor) {
if (check(i)) flag = true
}

Otherwise, when I'm using parallel collections more generally,
whenever I do a for loop I'd need to either call .seq or use
java.util.concurrent, or volatile, etc. I think what I'm after as a
user is that I can program in the style I have been programming, just
being more careful to be sure I don't have side effects in functions
passed to non-foreach higher order functions, using parallel
collections instead of the regular ones. That also means that I don't
have to change how I think about for loops. I.e., I don't have to
worry about making those thread safe, unless I say something extra
like .pfor.

Bill
----
Bill Venners
Artima, Inc.
http://www.artima.com

Aleksandar Prokopec
Joined: 2010-04-04,
User offline. Last seen 42 years 45 weeks ago.
Re: Re: REPL: null-values in parallel ranges - Seq.foreach acti

Hi Bill,

These are good points about the uses of parallel collections.

Well, I guess the semantics of the `foreach` in parallel collections is
for the users to decide in the end.

From a uniformity and principle of least surprise point of view, I
think it's somewhat non-uniform.
From a use case point of view, you're right, `for` is mostly used for
side-effects.
From my point of view, this is a trivial refactoring, and it can be
changed anytime.

Cheers,
Alex

> I'm not sure either. I have notions of how I'd like to use parallel
> collections, but I don't know what I'll really think about how I want
> to use them until I get some experience with them. Nor do I have any
> idea how the community at large will end up wanting to use them.
>
> One way I think people will want to use them is as performance
> optimizations applied when there's a performance problem. In this use
> case I think a parallel foreach like you have now is probably fine,
> because the .par's would be applied very locally and rarely. This is
> probably how I'll start out, by dipping my toe in by seeing if I can
> improve performance issues.
>
> But I have felt the promise of parallel collections is that they make
> it much easier to write programs that take performance advantage of
> multiple cores by using them more generally. Starting out I won't
> change my Seq's to AnySeq's in type signatures. Until experience
> proves that's the right thing to do I'd rather keep sequential types
> the lingua franca of interfaces, and maybe we'll discover sequential
> the right way to do signtatures. Don't know yet. But inside methods,
> I'd like to try using parallel collections generally (a little bit at
> a time, to see how it goes).
>
> The reason is that (foreach excepted) I already program in a
> thread-safe style when I call higher order methods on collections. I
> didn't do it to be thread-safe, but just because that's how I think
> about those methods. They are data transformers, and I pass in a
> function that takes one kind of data and transforms it to another,
> with no side effects. Or I pass in a predicate that evaluates input
> and returns true or false, with no side effects. Now I may have in the
> past written some existing code passed to a map, or in a foreach with
> a yield, that has a side effect, because I wasn't really trying to
> make sure I had no side effects before. It's possible, but if it
> happened at all it didn't happen much. And I think it wouldn't be that
> difficult for me to be careful not to have side effects in my
> passed-in functions once I start using parallel collections more
> generally (other than to foreach).
>
> I do fairly regularly use for without yield (a for "loop"), because
> sometimes I need to program side effects, and I want to avoid indexing
> with while, or avoid recursion, etc. So I use a for loop, and in that
> case, I'd prefer to not have to say .seq because my guess is that this
> is the most common thing I'll want to do with a for loop. I.e., if
> foreach is parallel like the other methods, the way it is in .RC1, I
> can just do this when I want a sequential foreach on an otherwise
> parallel collection:
>
> for (e<- pList.seq)
> println(e)
>
> That's not so terrible, but if a sequential foreach that's what I want
> to do most of the time, and a parallel foreach rarely, then I'd rather
> have a way to get a type that has all higher order functions other
> than foreach run in parallel, and foreach run sequentially, with a
> method like .pfor I could call to get a fully parallel variant. And
> the latter I would use more as a performance optimization:
>
> @volatile val flag = false
>
> // performance optimization
> for (i<- (0 until N).par.pfor) {
> if (check(i)) flag = true
> }
>
> Otherwise, when I'm using parallel collections more generally,
> whenever I do a for loop I'd need to either call .seq or use
> java.util.concurrent, or volatile, etc. I think what I'm after as a
> user is that I can program in the style I have been programming, just
> being more careful to be sure I don't have side effects in functions
> passed to non-foreach higher order functions, using parallel
> collections instead of the regular ones. That also means that I don't
> have to change how I think about for loops. I.e., I don't have to
> worry about making those thread safe, unless I say something extra
> like .pfor.
>
> Bill
> ----
> Bill Venners
> Artima, Inc.
> http://www.artima.com
>

Bill Venners
Joined: 2008-12-18,
User offline. Last seen 31 weeks 5 days ago.
Re: Re: REPL: null-values in parallel ranges - Seq.foreach acti

Hi Alex,

One other way you could consider doing it is letting .par mean
parallel everything, including foreach, then offer an sfor method on
that to get a parallel collection with sequential foreach. So what I'd
have to say to get what (I think) I want is:

mySet.par.sfor

Bill

On Thu, Apr 7, 2011 at 1:22 AM, Aleksandar Prokopec
wrote:
> Hi Bill,
>
> These are good points about the uses of parallel collections.
>
> Well, I guess the semantics of the `foreach` in parallel collections is for
> the users to decide in the end.
>
> From a uniformity and principle of least surprise point of view, I think
> it's somewhat non-uniform.
> From a use case point of view, you're right, `for` is mostly used for
> side-effects.
> From my point of view, this is a trivial refactoring, and it can be changed
> anytime.
>
> Cheers,
> Alex
>
>
>> I'm not sure either. I have notions of how I'd like to use parallel
>> collections, but I don't know what I'll really think about how I want
>> to use them until I get some experience with them. Nor do I have any
>> idea how the community at large will end up wanting to use them.
>>
>> One way I think people will want to use them is as performance
>> optimizations applied when there's a performance problem. In this use
>> case I think a parallel foreach like you have now is probably fine,
>> because the .par's would be applied very locally and rarely. This is
>> probably how I'll start out, by dipping my toe in by seeing if I can
>> improve performance issues.
>>
>> But I have felt the promise of parallel collections is that they make
>> it much easier to write programs that take performance advantage of
>> multiple cores by using them more generally. Starting out I won't
>> change my Seq's to AnySeq's in type signatures. Until experience
>> proves that's the right thing to do I'd rather keep sequential types
>> the lingua franca of interfaces, and maybe we'll discover sequential
>> the right way to do signtatures. Don't know yet. But inside methods,
>> I'd like to try using parallel collections generally (a little bit at
>> a time, to see how it goes).
>>
>> The reason is that (foreach excepted) I already program in a
>> thread-safe style when I call higher order methods on collections. I
>> didn't do it to be thread-safe, but just because that's how I think
>> about those methods. They are data transformers, and I pass in a
>> function that takes one kind of data and transforms it to another,
>> with no side effects. Or I pass in a predicate that evaluates input
>> and returns true or false, with no side effects. Now I may have in the
>> past written some existing code passed to a map, or in a foreach with
>> a yield, that has a side effect, because I wasn't really trying to
>> make sure I had no side effects before. It's possible, but if it
>> happened at all it didn't happen much. And I think it wouldn't be that
>> difficult for me to be careful not to have side effects in my
>> passed-in functions once I start using parallel collections more
>> generally (other than to foreach).
>>
>> I do fairly regularly use for without yield (a for "loop"), because
>> sometimes I need to program side effects, and I want to avoid indexing
>> with while, or avoid recursion, etc. So I use a for loop, and in that
>> case, I'd prefer to not have to say .seq because my guess is that this
>> is the most common thing I'll want to do with a for loop. I.e., if
>> foreach is parallel like the other methods, the way it is in .RC1, I
>> can just do this when I want a sequential foreach on an otherwise
>> parallel collection:
>>
>> for (e<- pList.seq)
>>   println(e)
>>
>> That's not so terrible, but if a sequential foreach that's what I want
>> to do most of the time, and a parallel foreach rarely, then I'd rather
>> have a way to get a type that has all higher order functions other
>> than foreach run in parallel, and foreach run sequentially, with a
>> method like .pfor I could call to get a fully parallel variant. And
>> the latter I would use more as a performance optimization:
>>
>> @volatile val flag = false
>>
>> // performance optimization
>> for (i<- (0 until N).par.pfor) {
>>   if (check(i)) flag = true
>> }
>>
>> Otherwise, when I'm using parallel collections more generally,
>> whenever I do a for loop I'd need to either call .seq or use
>> java.util.concurrent, or volatile, etc. I think what I'm after as a
>> user is that I can program in the style I have been programming, just
>> being more careful to be sure I don't have side effects in functions
>> passed to non-foreach higher order functions, using parallel
>> collections instead of the regular ones. That also means that I don't
>> have to change how I think about for loops. I.e., I don't have to
>> worry about making those thread safe, unless I say something extra
>> like .pfor.
>>
>> Bill
>> ----
>> Bill Venners
>> Artima, Inc.
>> http://www.artima.com
>>
>
>

Aleksandar Prokopec
Joined: 2010-04-04,
User offline. Last seen 42 years 45 weeks ago.
Re: Re: REPL: null-values in parallel ranges - Seq.foreach acti

On 07/04/11 16:17, Bill Venners wrote:
> Hi Alex,
>
> One other way you could consider doing it is letting .par mean
> parallel everything, including foreach, then offer an sfor method on
> that to get a parallel collection with sequential foreach. So what I'd
> have to say to get what (I think) I want is:
>
> mySet.par.sfor
>
> Bill
>

Hi Bill,

That's certainly something I believe can be done, though it may require
a bit of plumbing to always have a proper return type of `sfor`.

Alex

Bill Venners
Joined: 2008-12-18,
User offline. Last seen 31 weeks 5 days ago.
Re: Re: REPL: null-values in parallel ranges - Seq.foreach acti

Hi Alex,

On Thu, Apr 7, 2011 at 8:50 AM, Aleksandar Prokopec
wrote:
> On 07/04/11 16:17, Bill Venners wrote:
>>
>> Hi Alex,
>>
>> One other way you could consider doing it is letting .par mean
>> parallel everything, including foreach, then offer an sfor method on
>> that to get a parallel collection with sequential foreach. So what I'd
>> have to say to get what (I think) I want is:
>>
>> mySet.par.sfor
>>
>> Bill
>>
>
> Hi Bill,
>
> That's certainly something I believe can be done, though it may require a
> bit of plumbing to always have a proper return type of `sfor`.
>
I could live with this. I think from the clarity perspective this is
better than pfor, because both .par and .par.sfor would mean exactly
what they look like. Downside is to some extent .par would be like
giving someone a loaded gun with the safety off by default.

Bill
----
Bill Venners
Artima, Inc.
http://www.artima.com

Naftoli Gugenheim
Joined: 2008-12-17,
User offline. Last seen 42 years 45 weeks ago.
Re: Re: REPL: null-values in parallel ranges - Seq.foreach acti


On Thu, Apr 7, 2011 at 4:22 AM, Aleksandar Prokopec <aleksandar [dot] prokopec [at] gmail [dot] com> wrote:
From a uniformity and principle of least surprise point of view, I think it's somewhatnon-uniform.

What is the purpose of uniformity?
In terms of least surprise, I think a sequential foreach gives the least surprise.

From a use case point of view, you're right, `for` is mostly used for side-effects.

How about in 2.9 be more on the conservative side and "leave" foreach sequential, and if there is demand for a parallel foreach (e.g. people are surprised by the non-uniformity) then change it post-2.9.
If you take a risk of upsetting people, make it parallel, and find out that it did indeed upset people, it will be much more of a deal to change it back. Plus it could give the whole "parallel collections experiment" a bad name (perhaps among uninformed people, but that is still a factor).
In other words, since there is a doubt what to do, which means that either way we go we may potentially have regret, better to risk regretting conservativeness and correct it (make the change later), than risk regretting haste.
Alex Cruise
Joined: 2008-12-17,
User offline. Last seen 2 years 26 weeks ago.
Re: Re: REPL: null-values in parallel ranges - Seq.foreach acti
On Thu, Apr 7, 2011 at 6:01 PM, Naftoli Gugenheim <naftoligug [at] gmail [dot] com> wrote:
How about in 2.9 be more on the conservative side and "leave" foreach sequential, and if there is demand for a parallel foreach (e.g. people are surprised by the non-uniformity) then change it post-2.9.
If you take a risk of upsetting people, make it parallel, and find out that it did indeed upset people, it will be much more of a deal to change it back. Plus it could give the whole "parallel collections experiment" a bad name (perhaps among uninformed people, but that is still a factor).
In other words, since there is a doubt what to do, which means that either way we go we may potentially have regret, better to risk regretting conservativeness and correct it (make the change later), than risk regretting haste.

Very well put, agreed without reservation.
-0xe1a 
Bill Venners
Joined: 2008-12-18,
User offline. Last seen 31 weeks 5 days ago.
Re: Re: REPL: null-values in parallel ranges - Seq.foreach acti

Hi Naftoli,

On Thu, Apr 7, 2011 at 6:01 PM, Naftoli Gugenheim wrote:
>
>
> On Thu, Apr 7, 2011 at 4:22 AM, Aleksandar Prokopec
> wrote:
>>
>> From a uniformity and principle of least surprise point of view, I think
>> it's somewhatnon-uniform.
>
> What is the purpose of uniformity?
> In terms of least surprise, I think a sequential foreach gives the least
> surprise.
>
I think the value of uniformity is that the code does what it looks
like it does. If you say .par, you get parallel. For example, I think
this code looks like it would do a parallel foreach:

for (i <- (1 until 10000).par)
a(i) = a(i) * scale

>> From a use case point of view, you're right, `for` is mostly used for
>> side-effects.
>
> How about in 2.9 be more on the conservative side and "leave" foreach
> sequential, and if there is demand for a parallel foreach (e.g. people are
> surprised by the non-uniformity) then change it post-2.9.
> If you take a risk of upsetting people, make it parallel, and find out that
> it did indeed upset people, it will be much more of a deal to change it
> back. Plus it could give the whole "parallel collections experiment" a bad
> name (perhaps among uninformed people, but that is still a factor).
> In other words, since there is a doubt what to do, which means that either
> way we go we may potentially have regret, better to risk regretting
> conservativeness and correct it (make the change later), than risk
> regretting haste.
>
Isn't it the other way around? If .par gives a sequential foreach in
2.9, then changing that to a parallel foreach later would break all
code written under the assumption foreach is sequential. But if .par
gives a parallel foreach in 2.9 and that turns out to be too
error-prone in practice, it could be changed to a sequential foreach
later without breaking code (other than slowing things down in some
cases). Maybe slowing things down would qualify as breaking code in
some situations, but that's far better than introducing concurrency
bugs. You really could never go from a sequential foreach to a
parallel one later with the same method name.

But I think the best thing to do in 2.9 is find some way to give
people both. For example, either let .par give a sequential foreach
with a .pfor method to get a parallel foreach, Or let .par give a
parallel foreach with a .sfor method to get a sequential foreach. If
that's the direction the hard question is which should be the default.

I think .par meaning parallel everything provides the clearest code.
That for expression I showed earlier would do exactly what it looks
like it does, run that for loop in parallel. There's value in that.
And "coll.par.sfor" look like it means "a parallel collection with a
sequential foreach" to me, which is exactly what it would mean. The
downside of that is the default you get with .par is the less safe of
the two variants. It is also possibly the less commonly desired of the
two. I think most times I use parallel collections I'll probably want
the sequential foreach, though I won't know for sure until I've
actually used these things for a good while.

If there's a solution that provides the clarity of .par giving
parallel everything including foreach with a subsequent .sfor giving a
parallel collection with a sequential foreach, but gives the
sequential one as the default, that would be ideal. Maybe there are
two method names, one of which would take you right to a parallel
collection with parallel foreach and the other to a parallel
collection with a sequential foreach, but I can't think of anything as
clear as .par.sfor. And frankly I think we could live with the default
being parallel everything so long as there's another step that would
produce the sequential variant. I think people would learn it pretty
quickly, and probably not forget to call an extra method. But the best
would be a solution that gives both clear code and a safer default.

Bill
----
Bill Venners
Artima, Inc.
http://www.artima.com

dcsobral
Joined: 2009-04-23,
User offline. Last seen 38 weeks 5 days ago.
Re: Re: REPL: null-values in parallel ranges - Seq.foreach acti
On Thu, Apr 7, 2011 at 22:01, Naftoli Gugenheim <naftoligug [at] gmail [dot] com> wrote:


On Thu, Apr 7, 2011 at 4:22 AM, Aleksandar Prokopec <aleksandar [dot] prokopec [at] gmail [dot] com> wrote:
From a uniformity and principle of least surprise point of view, I think it's somewhatnon-uniform.

What is the purpose of uniformity?
In terms of least surprise, I think a sequential foreach gives the least surprise.

From a use case point of view, you're right, `for` is mostly used for side-effects.

How about in 2.9 be more on the conservative side and "leave" foreach sequential, and if there is demand for a parallel foreach (e.g. people are surprised by the non-uniformity) then change it post-2.9.
If you take a risk of upsetting people, make it parallel, and find out that it did indeed upset people, it will be much more of a deal to change it back. Plus it could give the whole "parallel collections experiment" a bad name (perhaps among uninformed people, but that is still a factor).
In other words, since there is a doubt what to do, which means that either way we go we may potentially have regret, better to risk regretting conservativeness and correct it (make the change later), than risk regretting haste.
 Now that old code can't be passed a parallel collection, I think it is a mistake to let programmers who are explicitly going for parallel code to get away with incorrect code. If a parallel sequence ever promises a sequential foreach, it will never be reverted.
So, what is the _gain_ of making foreach sequential? That it will allow people who explicitly ask for a parallel collection to do non-parallel things with it without asking for a non-parallel collection first? I fail to see the use in that.
It seems people are afraid that people trying to write parallel code will make mistakes, and that the solution to that is make their code non-parallel.


--
Daniel C. Sobral

I travel to the future all the time.

Copyright © 2012 École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland