This page is no longer maintained — Please continue to the home page at www.scala-lang.org

2-ary grouping a la span (chop? fragment? groupedWhile?)

10 replies
ichoran
Joined: 2009-08-14,
User offline. Last seen 2 years 3 weeks ago.
I very frequently find myself needing to chop a collection into pieces based on adjacent elements.  One could, with list c and indicator function p(_,_) that returns true if the two elements belong together and false if not,

  (List(c.take(1)) /: (c zip c.tail))((l,r) => r match {
    case (i,j) => if (p(i,j)) (j :: l.head) :: l.tail else List(j) :: l
  })

which works like so (would be nicer if I reverseMap/reversed it):

scala> val c = List(1,2,3,4,1,2,3,1,2,1)
c: List[Int] = List(1, 2, 3, 4, 1, 2, 3, 1, 2, 1)

scala> def p(i: Int, j: Int) = (i <= j)
p: (i: Int,j: Int)Boolean

scala> val chopped = /* code above */
chopped: List[List[Int]] = List(List(1), List(2, 1), List(3, 2, 1), List(4, 3, 2, 1))

but this gets old very quickly, and isn't particularly efficient.

Do we think we can add this functionality to the library?  It's in the same spirit as grouped, but instead of cutting based on zero arguments (like grouped) or one argument (like recursive span/dropWhile, which is normally called "split" (e.g. with strings) but for some reason is missing from the library), it cuts based on the properties of the boundary (i.e. the two surrounding elements).

I have four questions:

(1) Whether or not it can go into the library, what should it be called?  (If I add the method via pimping, it'd be nice for it to have a consensus name.)

(2) What should the signature be?  I'd suggest (f: (A,A) => Boolean): C[C[A]] but one could think about making it more fold-like by dragging along some history variable that you could set to the previous element if that's what you wanted, but to something else (e.g. count) if that's what you'd prefer.  In that case, grouped would be a special case of the fold-with-grouping also.

(3) Can it be added to the library?

(4) Does the answer to (3) change if I submit a patch that adds it?  (Which I am willing to do, since I have written this so many times longhand already that writing a patch would actually save me time.)

If there's some level of consensus on (1)/(2) and no answer forthcoming for (3)/(4), I'll submit an enhancement request rather than asking here.

  --Rex

Kevin Wright 2
Joined: 2010-05-30,
User offline. Last seen 26 weeks 4 days ago.
Re: 2-ary grouping a la span (chop? fragment? groupedWhile?)
I've called this one shard in the past, and would have it partition a sequence into groups of elements that match some predicate.
So "aaaabbcdeefffgg".shard(_==_) would yield Seq("aaaa", "bb", "c", "d", "ee", "fff", "gg")
Must say though, I *do* like the name "chop", where I'd write my above operation as "....".chop(_ != _)

On 23 March 2011 17:34, Rex Kerr <ichoran [at] gmail [dot] com> wrote:
I very frequently find myself needing to chop a collection into pieces based on adjacent elements.  One could, with list c and indicator function p(_,_) that returns true if the two elements belong together and false if not,

  (List(c.take(1)) /: (c zip c.tail))((l,r) => r match {
    case (i,j) => if (p(i,j)) (j :: l.head) :: l.tail else List(j) :: l
  })

which works like so (would be nicer if I reverseMap/reversed it):

scala> val c = List(1,2,3,4,1,2,3,1,2,1)
c: List[Int] = List(1, 2, 3, 4, 1, 2, 3, 1, 2, 1)

scala> def p(i: Int, j: Int) = (i <= j)
p: (i: Int,j: Int)Boolean

scala> val chopped = /* code above */
chopped: List[List[Int]] = List(List(1), List(2, 1), List(3, 2, 1), List(4, 3, 2, 1))

but this gets old very quickly, and isn't particularly efficient.

Do we think we can add this functionality to the library?  It's in the same spirit as grouped, but instead of cutting based on zero arguments (like grouped) or one argument (like recursive span/dropWhile, which is normally called "split" (e.g. with strings) but for some reason is missing from the library), it cuts based on the properties of the boundary (i.e. the two surrounding elements).

I have four questions:

(1) Whether or not it can go into the library, what should it be called?  (If I add the method via pimping, it'd be nice for it to have a consensus name.)

(2) What should the signature be?  I'd suggest (f: (A,A) => Boolean): C[C[A]] but one could think about making it more fold-like by dragging along some history variable that you could set to the previous element if that's what you wanted, but to something else (e.g. count) if that's what you'd prefer.  In that case, grouped would be a special case of the fold-with-grouping also.

(3) Can it be added to the library?

(4) Does the answer to (3) change if I submit a patch that adds it?  (Which I am willing to do, since I have written this so many times longhand already that writing a patch would actually save me time.)

If there's some level of consensus on (1)/(2) and no answer forthcoming for (3)/(4), I'll submit an enhancement request rather than asking here.

  --Rex




--
Kevin Wright

gtalk / msn : kev [dot] lee [dot] wright [at] gmail [dot] comkev [dot] lee [dot] wright [at] gmail [dot] commail: kevin [dot] wright [at] scalatechnology [dot] com
vibe / skype: kev.lee.wrightquora: http://www.quora.com/Kevin-Wright
twitter: @thecoda

"My point today is that, if we wish to count lines of code, we should not regard them as "lines produced" but as "lines spent": the current conventional wisdom is so foolish as to book that count on the wrong side of the ledger" ~ Dijkstra
Jason Zaugg
Joined: 2009-05-18,
User offline. Last seen 38 weeks 5 days ago.
Re: 2-ary grouping a la span (chop? fragment? groupedWhile?)

On Wed, Mar 23, 2011 at 6:34 PM, Rex Kerr wrote:
> I very frequently find myself needing to chop a collection into pieces based
> on adjacent elements.  One could, with list c and indicator function p(_,_)
> that returns true if the two elements belong together and false if not,
>
>   (List(c.take(1)) /: (c zip c.tail))((l,r) => r match {
>     case (i,j) => if (p(i,j)) (j :: l.head) :: l.tail else List(j) :: l
>   })
>
> which works like so (would be nicer if I reverseMap/reversed it):
>
> scala> val c = List(1,2,3,4,1,2,3,1,2,1)
> c: List[Int] = List(1, 2, 3, 4, 1, 2, 3, 1, 2, 1)
>
> scala> def p(i: Int, j: Int) = (i <= j)
> p: (i: Int,j: Int)Boolean
>
> scala> val chopped = /* code above */
> chopped: List[List[Int]] = List(List(1), List(2, 1), List(3, 2, 1), List(4,
> 3, 2, 1))
>
> but this gets old very quickly, and isn't particularly efficient.
>
> Do we think we can add this functionality to the library?  It's in the same
> spirit as grouped, but instead of cutting based on zero arguments (like
> grouped) or one argument (like recursive span/dropWhile, which is normally
> called "split" (e.g. with strings) but for some reason is missing from the
> library), it cuts based on the properties of the boundary (i.e. the two
> surrounding elements).

In haskell, it's called groupBy, but this name is used for a different
method in Scala. Maybe 'clusterBy' could be used instead?

Prelude Data.List> let is = [1, 1, 2, 2, 2, 1]
Prelude Data.List> groupBy (==) is
[[1,1],[2,2,2],[1]]

In Scalaz, this is generalized to ListW#groupByM

def groupByM[M[_] : Monad](p: (A, A) => M[Boolean]): M[List[List[A]]]

If M == Identity, this should be the same as Haskell's groupBy.

You can also choose M == State, which would let you pass a counter,
along the lines of
http://stackoverflow.com/questions/4562850/basic-scalaz-state-question/4...

Unfortunately it's also buggy. We'll fix that soon:
https://github.com/scalaz/scalaz/issues/14

-jason

ichoran
Joined: 2009-08-14,
User offline. Last seen 2 years 3 weeks ago.
Re: 2-ary grouping a la span (chop? fragment? groupedWhile?)
On Wed, Mar 23, 2011 at 3:34 PM, Jason Zaugg <jzaugg [at] gmail [dot] com> wrote:

In Scalaz, this is generalized to ListW#groupByM

def groupByM[M[_] : Monad](p: (A, A) => M[Boolean]): M[List[List[A]]]

If M == Identity, this should be the same as Haskell's groupBy.

You can also choose M == State, which would let you pass a counter,
along the lines of
http://stackoverflow.com/questions/4562850/basic-scalaz-state-question/4566585#4566585

Unfortunately it's also buggy. We'll fix that soon:
https://github.com/scalaz/scalaz/issues/14


That's a nice one!  Unfortunately, this commits one to having monads around, which apparently the Scala standard library is not.  I've been playing with fold-like things with signatures similar to

  (b0: => B)(iterate: (B,A) => B)(p: (B,A) => Boolean)

where iterate is used to advance b0 until p fails; then it starts with b0 again.

Then you can recover clusterBy with
  (throw new Exception)((l,r) => r)(p)
and grouped with
  (0)((l,r) => l+1)((l,r) < n)
and can of course do various other things.  But it's enough harder to understand that I think the haskellian groupBy is probably a better library candidate.

I've so far been mostly unwilling to render my code incomprehensible to that many more people by using monads extensively; various tricks like

  var i = 0
  c.clusterBy((l,r) => {
    i += 1
    if (i<n) true
    else { i=0; false }
  })

seem to be enough for now.

  --Rex

Kevin Wright 2
Joined: 2010-05-30,
User offline. Last seen 26 weeks 4 days ago.
Re: 2-ary grouping a la span (chop? fragment? groupedWhile?)


On 23 Mar 2011 19:34, "Jason Zaugg" <jzaugg [at] gmail [dot] com> wrote:
>
> On Wed, Mar 23, 2011 at 6:34 PM, Rex Kerr <ichoran [at] gmail [dot] com> wrote:
> > I very frequently find myself needing to chop a collection into pieces based
> > on adjacent elements.  One could, with list c and indicator function p(_,_)
> > that returns true if the two elements belong together and false if not,
> >
> >   (List(c.take(1)) /: (c zip c.tail))((l,r) => r match {
> >     case (i,j) => if (p(i,j)) (j :: l.head) :: l.tail else List(j) :: l
> >   })
> >
> > which works like so (would be nicer if I reverseMap/reversed it):
> >
> > scala> val c = List(1,2,3,4,1,2,3,1,2,1)
> > c: List[Int] = List(1, 2, 3, 4, 1, 2, 3, 1, 2, 1)
> >
> > scala> def p(i: Int, j: Int) = (i <= j)
> > p: (i: Int,j: Int)Boolean
> >
> > scala> val chopped = /* code above */
> > chopped: List[List[Int]] = List(List(1), List(2, 1), List(3, 2, 1), List(4,
> > 3, 2, 1))
> >
> > but this gets old very quickly, and isn't particularly efficient.
> >
> > Do we think we can add this functionality to the library?  It's in the same
> > spirit as grouped, but instead of cutting based on zero arguments (like
> > grouped) or one argument (like recursive span/dropWhile, which is normally
> > called "split" (e.g. with strings) but for some reason is missing from the
> > library), it cuts based on the properties of the boundary (i.e. the two
> > surrounding elements).
>
> In haskell, it's called groupBy, but this name is used for a different
> method in Scala. Maybe 'clusterBy' could be used instead?
>

ooh, even better!  I'm sure I've seen the name mentioned before somewhere...

> Prelude Data.List> let is = [1, 1, 2, 2, 2, 1]
> Prelude Data.List> groupBy (==) is
> [[1,1],[2,2,2],[1]]
>
> In Scalaz, this is generalized to ListW#groupByM
>
> def groupByM[M[_] : Monad](p: (A, A) => M[Boolean]): M[List[List[A]]]
>
> If M == Identity, this should be the same as Haskell's groupBy.
>
> You can also choose M == State, which would let you pass a counter,
> along the lines of
> http://stackoverflow.com/questions/4562850/basic-scalaz-state-question/4566585#4566585
>
> Unfortunately it's also buggy. We'll fix that soon:
> https://github.com/scalaz/scalaz/issues/14
>
> -jason

ichoran
Joined: 2009-08-14,
User offline. Last seen 2 years 3 weeks ago.
Re: 2-ary grouping a la span (chop? fragment? groupedWhile?)
On Wed, Mar 23, 2011 at 3:28 PM, Kevin Wright <kev [dot] lee [dot] wright [at] gmail [dot] com> wrote:
I've called this one shard in the past, and would have it partition a sequence into groups of elements that match some predicate.
So "aaaabbcdeefffgg".shard(_==_) would yield Seq("aaaa", "bb", "c", "d", "ee", "fff", "gg")

I assume you've been doing this via pimping.  Do you have a way to get appropriate builders in there so that you have collections of collections of the same type in general (e.g. Vector[Vector[Int]]) but strings still end up wrapped as you showed above?

I haven't yet figured out a low-effort way to do it (i.e. without substantial code duplication), but it doesn't much matter to me for strings.  Still, it looks worth doing in the general case.  The collections library doesn't seem to include this use case--it either returns maps or iterators, the latter of which I suppose would be safe enough, but is a little inelegant.

I like a number of the names, but as you may have noticed from StackOverflow, it is hard for me to resist noticing the similarity to grouped(n: Int).

  --Rex

Paul Phillips 2
Joined: 2011-02-10,
User offline. Last seen 42 years 45 weeks ago.
Re: 2-ary grouping a la span (chop? fragment? groupedWhile?)

On 3/23/11 3:18 PM, Rex Kerr wrote:
> I assume you've been doing this via pimping. Do you have a way to
> get appropriate builders in there so that you have collections of
> collections of the same type in general (e.g. Vector[Vector[Int]])
> but strings still end up wrapped as you showed above?

> The collections library doesn't seem to include this use case--it
> either returns maps or iterators, the latter of which I suppose would
> be safe enough, but is a little inelegant.

I'm pretty sure I managed this the last time I tried it, and then having
done so decided in the general case we were better off without it. For
grouped and sliding for instance, there are just too many situations
where Foo[Foo[A]] is not what I want. I want some Foo[A]s. It's only
vaguely correlated with whether I want a Foo[Foo[A]] to deliver those
Foo[A]s.

So they come in an iterator because it's lazy and doesn't commit you to
much, and you can .toFoo the iterator.

WRT cluster/clusterBy, perhaps people remember

def cluster[A1 >: A : Equiv]: List[Repr]

from my last batch of proposed collections methods.

Kevin Wright 2
Joined: 2010-05-30,
User offline. Last seen 26 weeks 4 days ago.
Re: 2-ary grouping a la span (chop? fragment? groupedWhile?)


On 23 March 2011 22:49, Paul Phillips <paul [dot] phillips [at] gmail [dot] com> wrote:
On 3/23/11 3:18 PM, Rex Kerr wrote:
I assume you've been doing this via pimping.  Do you have a way to
get appropriate builders in there so that you have collections of
collections of the same type in general (e.g. Vector[Vector[Int]])
but strings still end up wrapped as you showed above?

The collections library doesn't seem to include this use case--it
either returns maps or iterators, the latter of which I suppose would
be safe enough, but is a little inelegant.

I'm pretty sure I managed this the last time I tried it, and then having done so decided in the general case we were better off without it.  For grouped and sliding for instance, there are just too many situations where Foo[Foo[A]] is not what I want.  I want some Foo[A]s.  It's only vaguely correlated with whether I want a Foo[Foo[A]] to deliver those Foo[A]s.

So they come in an iterator because it's lazy and doesn't commit you to much, and you can .toFoo the iterator.

WRT cluster/clusterBy, perhaps people remember

 def cluster[A1 >: A : Equiv]: List[Repr]

from my last batch of proposed collections methods.

I remember upvoting it too :)


--
Kevin Wright

gtalk / msn : kev [dot] lee [dot] wright [at] gmail [dot] comkev [dot] lee [dot] wright [at] gmail [dot] commail: kevin [dot] wright [at] scalatechnology [dot] com
vibe / skype: kev.lee.wrightquora: http://www.quora.com/Kevin-Wright
twitter: @thecoda

"My point today is that, if we wish to count lines of code, we should not regard them as "lines produced" but as "lines spent": the current conventional wisdom is so foolish as to book that count on the wrong side of the ledger" ~ Dijkstra
Kevin Wright 2
Joined: 2010-05-30,
User offline. Last seen 26 weeks 4 days ago.
Re: 2-ary grouping a la span (chop? fragment? groupedWhile?)


On 23 March 2011 22:18, Rex Kerr <ichoran [at] gmail [dot] com> wrote:
On Wed, Mar 23, 2011 at 3:28 PM, Kevin Wright <kev [dot] lee [dot] wright [at] gmail [dot] com> wrote:
I've called this one shard in the past, and would have it partition a sequence into groups of elements that match some predicate.
So "aaaabbcdeefffgg".shard(_==_) would yield Seq("aaaa", "bb", "c", "d", "ee", "fff", "gg")

I assume you've been doing this via pimping.  Do you have a way to get appropriate builders in there so that you have collections of collections of the same type in general (e.g. Vector[Vector[Int]]) but strings still end up wrapped as you showed above?


Kind of: http://stackoverflow.com/questions/5026055/mixing-in-generic-traits-in-parameterized-classes-without-duplicating-type-parame/5028335#5028335
It really depends on your definition of "low effort" :)
A significant part of the problem is getting higher kinded implicits to work across types of different arity.  Unless updated in 2.9, there's even a comment in the Manifest.scala source code to this effect :)
 

I haven't yet figured out a low-effort way to do it (i.e. without substantial code duplication), but it doesn't much matter to me for strings.  Still, it looks worth doing in the general case.  The collections library doesn't seem to include this use case--it either returns maps or iterators, the latter of which I suppose would be safe enough, but is a little inelegant.

I like a number of the names, but as you may have noticed from StackOverflow, it is hard for me to resist noticing the similarity to grouped(n: Int).

  --Rex




--
Kevin Wright

gtalk / msn : kev [dot] lee [dot] wright [at] gmail [dot] comkev [dot] lee [dot] wright [at] gmail [dot] commail: kevin [dot] wright [at] scalatechnology [dot] com
vibe / skype: kev.lee.wrightquora: http://www.quora.com/Kevin-Wright
twitter: @thecoda

"My point today is that, if we wish to count lines of code, we should not regard them as "lines produced" but as "lines spent": the current conventional wisdom is so foolish as to book that count on the wrong side of the ledger" ~ Dijkstra
Tony Morris 2
Joined: 2009-03-20,
User offline. Last seen 42 years 45 weeks ago.
Re: 2-ary grouping a la span (chop? fragment? groupedWhile?)

On 24/03/11 05:34, Jason Zaugg wrote:
> Unfortunately it's also buggy. We'll fix that soon:
> https://github.com/scalaz/scalaz/issues/14
Fixed.

https://github.com/scalaz/scalaz/commit/cffbb6e051d43ae094d6f5f33632e7f4...

Ben Hutchison
Joined: 2009-01-21,
User offline. Last seen 42 years 45 weeks ago.
Re: 2-ary grouping a la span (chop? fragment? groupedWhile?)

On Thu, Mar 24, 2011 at 4:34 AM, Rex Kerr wrote:
> I very frequently find myself needing to chop a collection into pieces based
> on adjacent elements.  One could, with list c and indicator function p(_,_)
> that returns true if the two elements belong together and false if not,

FWIW, +1 for standardizing this.

Ive (re)invented that very same method myself by pimping Seq, except that

1. I called it splitAt
2. my predicate returned true where the collection should be split.

-Ben

Copyright © 2012 École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland