This page is no longer maintained — Please continue to the home page at www.scala-lang.org

map + filter performance

10 replies
H-star Development
Joined: 2010-04-14,
User offline. Last seen 2 years 26 weeks ago.

what is the - in general - fastest way to both filter and convert a
collection at the same time?

flatMap (with some/none), collect, or something that is not in the
standard library?

Viktor Klang
Joined: 2008-12-17,
User offline. Last seen 1 year 27 weeks ago.
Re: map + filter performance


On Wed, Jan 25, 2012 at 9:56 PM, HamsterofDeath <h-star [at] gmx [dot] de> wrote:
what is the - in general - fastest way to both filter and convert a
collection at the same time?

collect 

flatMap (with some/none), collect, or something that is not in the
standard library?
--




--
Viktor Klang

Akka Tech LeadTypesafe - The software stack for applications that scale

Twitter: @viktorklang
dcsobral
Joined: 2009-04-23,
User offline. Last seen 38 weeks 5 days ago.
Re: map + filter performance

On Wed, Jan 25, 2012 at 18:56, HamsterofDeath wrote:
> what is the - in general - fastest way to both filter and convert a
> collection at the same time?
>
> flatMap (with some/none), collect, or something that is not in the
> standard library?

It's called collect if you want filter().map(). If you want
map().filter()... well, something else in this case.

H-star Development
Joined: 2010-04-14,
User offline. Last seen 2 years 26 weeks ago.
Re: map + filter performance

i don't really trust "isDefinedAt" performance wise because i don't know
what exactly it is doing.

does it simply return true/gets optimized away if i write

case x:Int => foo(x)

and no other cases are there, or does it do an instanceof check and a
method call no matter how many cases it has to check?

because if it does, this should be (by a small margin) faster:

def filterMap[T2](map:(T) => T2) = {
val newBuilder = t.companion.newBuilder[T2]
t.foreach(e => {
val mapped = map(e)
if (mapped != null) {
newBuilder += mapped
}
})
newBuilder.result()
}

Am 25.01.2012 22:01, schrieb Daniel Sobral:
> On Wed, Jan 25, 2012 at 18:56, HamsterofDeath wrote:
>> what is the - in general - fastest way to both filter and convert a
>> collection at the same time?
>>
>> flatMap (with some/none), collect, or something that is not in the
>> standard library?
> It's called collect if you want filter().map(). If you want
> map().filter()... well, something else in this case.
>
>

ichoran
Joined: 2009-08-14,
User offline. Last seen 2 years 3 weeks ago.
Re: map + filter performance

If we set collect as the reference and filter out half the items of a list, I get the following relative timings

foreach ArrayBuffer 0.5x
List.newBuilder     0.6x
foldLeft            0.6x
ArrayBuffer+toList  0.8x
collect             1.0x
filter/map          1.2x
flatMap             5.0x

If I do the same thing with vectors, I instead get

Vector.newBuilder   0.5x
foreach ArrayBuffer 0.6x
foldLeft            5.5x
Vector++ArrayBuffer 0.9x
collect             1.0x
filter/map          1.9x
flatMap             5.2x

So you basically want to use builders or ArrayBuffer if you want speed.  Collect is the best of the easy solutions.  (Assuming that you have an inexpensive isDefined.)

  --Rex


On Wed, Jan 25, 2012 at 3:56 PM, HamsterofDeath <h-star [at] gmx [dot] de> wrote:
what is the - in general - fastest way to both filter and convert a
collection at the same time?

flatMap (with some/none), collect, or something that is not in the
standard library?
--


H-star Development
Joined: 2010-04-14,
User offline. Last seen 2 years 26 weeks ago.
Re: map + filter performance
thx

Am 25.01.2012 22:27, schrieb Rex Kerr:
WKTgLmiL2KSOS7A [at] mail [dot] gmail [dot] com" type="cite">
If we set collect as the reference and filter out half the items of a list, I get the following relative timings

foreach ArrayBuffer 0.5x
List.newBuilder     0.6x
foldLeft            0.6x
ArrayBuffer+toList  0.8x
collect             1.0x
filter/map          1.2x
flatMap             5.0x

If I do the same thing with vectors, I instead get

Vector.newBuilder   0.5x
foreach ArrayBuffer 0.6x
foldLeft            5.5x
Vector++ArrayBuffer 0.9x
collect             1.0x
filter/map          1.9x
flatMap             5.2x

So you basically want to use builders or ArrayBuffer if you want speed.  Collect is the best of the easy solutions.  (Assuming that you have an inexpensive isDefined.)

  --Rex


On Wed, Jan 25, 2012 at 3:56 PM, HamsterofDeath <h-star [at] gmx [dot] de" rel="nofollow">h-star [at] gmx [dot] de> wrote:
what is the - in general - fastest way to both filter and convert a
collection at the same time?

flatMap (with some/none), collect, or something that is not in the
standard library?
--



Viktor Klang
Joined: 2008-12-17,
User offline. Last seen 1 year 27 weeks ago.
Re: map + filter performance


On Wed, Jan 25, 2012 at 10:27 PM, Rex Kerr <ichoran [at] gmail [dot] com> wrote:

If we set collect as the reference and filter out half the items of a list, I get the following relative timings

foreach ArrayBuffer 0.5x
List.newBuilder     0.6x
foldLeft            0.6x
ArrayBuffer+toList  0.8x
collect             1.0x
filter/map          1.2x
flatMap             5.0x

If I do the same thing with vectors, I instead get

Vector.newBuilder   0.5x
foreach ArrayBuffer 0.6x
foldLeft            5.5x
Vector++ArrayBuffer 0.9x
collect             1.0x
filter/map          1.9x
flatMap             5.2x

So you basically want to use builders or ArrayBuffer if you want speed.  Collect is the best of the easy solutions.  (Assuming that you have an inexpensive isDefined.)

Also, I wouldn't be surprised if PartialFunctions were optimized in the future.  


  --Rex


On Wed, Jan 25, 2012 at 3:56 PM, HamsterofDeath <h-star [at] gmx [dot] de> wrote:
what is the - in general - fastest way to both filter and convert a
collection at the same time?

flatMap (with some/none), collect, or something that is not in the
standard library?
--





--
Viktor Klang

Akka Tech LeadTypesafe - The software stack for applications that scale

Twitter: @viktorklang
edmondo1984
Joined: 2011-09-14,
User offline. Last seen 28 weeks 3 days ago.
R: Re: map + filter performance
Can you please add your code?? ThanksInviato da BlackBerry(R) Wireless HandheldFrom: Rex Kerr <ichoran [at] gmail [dot] com> Sender: scala-user [at] googlegroups [dot] com Date: Wed, 25 Jan 2012 16:27:50 -0500To: HamsterofDeath<h-star [at] gmx [dot] de>Cc: scala-user [at] googlegroups [dot] com<scala-user [at] googlegroups [dot] com>Subject: Re: [scala-user] map + filter performance

If we set collect as the reference and filter out half the items of a list, I get the following relative timings

foreach ArrayBuffer 0.5x
List.newBuilder     0.6x
foldLeft            0.6x
ArrayBuffer+toList  0.8x
collect             1.0x
filter/map          1.2x
flatMap             5.0x

If I do the same thing with vectors, I instead get

Vector.newBuilder   0.5x
foreach ArrayBuffer 0.6x
foldLeft            5.5x
Vector++ArrayBuffer 0.9x
collect             1.0x
filter/map          1.9x
flatMap             5.2x

So you basically want to use builders or ArrayBuffer if you want speed.  Collect is the best of the easy solutions.  (Assuming that you have an inexpensive isDefined.)

  --Rex


On Wed, Jan 25, 2012 at 3:56 PM, HamsterofDeath <h-star [at] gmx [dot] de> wrote:
what is the - in general - fastest way to both filter and convert a
collection at the same time?

flatMap (with some/none), collect, or something that is not in the
standard library?
--


ichoran
Joined: 2009-08-14,
User offline. Last seen 2 years 3 weeks ago.
Re: Re: map + filter performance


On Wed, Jan 25, 2012 at 5:21 PM, <edmondo [dot] porcu [at] gmail [dot] com> wrote:
Can you please add your code?? ThanksInviato da BlackBerry(R) Wireless HandheldFrom: Rex Kerr <ichoran [at] gmail [dot] com> Sender: scala-user [at] googlegroups [dot] com Date: Wed, 25 Jan 2012 16:27:50 -0500To: HamsterofDeath<h-star [at] gmx [dot] de>Cc: scala-user [at] googlegroups [dot] com<scala-user [at] googlegroups [dot] com> Subject: Re: [scala-user] map + filter performance

If we set collect as the reference and filter out half the items of a list, I get the following relative timings

foreach ArrayBuffer 0.5x
{ val a = new collection.mutable.ArrayBuffer[Int]; xs.foreach(x => if (x%2==0) a += x); a }
 
List.newBuilder     0.6x
{ val a = List.newBuilder[Int]; xs.foreach(x => if (x%2==0) a += x); a.result }
 
foldLeft            0.6x
(List[Int]() /: xs)((a,x) => if (x%2==0) (x/2)::a else a)
 
ArrayBuffer+toList  0.8x
{ val a = new collection.mutable.ArrayBuffer[Int]; xs.foreach(x => if (x%2==0) a += x); a.toList }

 
collect             1.0x
xs.collect{ case x if (x%2==0) => x/2 }
 
filter/map          1.2x
xs.filter(_%2==0).map(_/2)
 
flatMap             5.0x
xs.flatMap(x => if (x%2==0) Some(x/2) else None)

  --Rex
Derek Williams 3
Joined: 2011-08-12,
User offline. Last seen 42 years 45 weeks ago.
Re: Re: map + filter performance
Probably doesn't affect the performance too much, but all the fast ones are just filtering, no mapping.

On Wed, Jan 25, 2012 at 3:31 PM, Rex Kerr <ichoran [at] gmail [dot] com> wrote:


On Wed, Jan 25, 2012 at 5:21 PM, <edmondo [dot] porcu [at] gmail [dot] com> wrote:
Can you please add your code?? ThanksInviato da BlackBerry(R) Wireless HandheldFrom: Rex Kerr <ichoran [at] gmail [dot] com> Sender: scala-user [at] googlegroups [dot] com Date: Wed, 25 Jan 2012 16:27:50 -0500To: HamsterofDeath<h-star [at] gmx [dot] de>Cc: scala-user [at] googlegroups [dot] com<scala-user [at] googlegroups [dot] com> Subject: Re: [scala-user] map + filter performance

If we set collect as the reference and filter out half the items of a list, I get the following relative timings

foreach ArrayBuffer 0.5x
{ val a = new collection.mutable.ArrayBuffer[Int]; xs.foreach(x => if (x%2==0) a += x); a }
 
List.newBuilder     0.6x
{ val a = List.newBuilder[Int]; xs.foreach(x => if (x%2==0) a += x); a.result }
 
foldLeft            0.6x
(List[Int]() /: xs)((a,x) => if (x%2==0) (x/2)::a else a)
 
ArrayBuffer+toList  0.8x
{ val a = new collection.mutable.ArrayBuffer[Int]; xs.foreach(x => if (x%2==0) a += x); a.toList }

 
collect             1.0x
xs.collect{ case x if (x%2==0) => x/2 }
 
filter/map          1.2x
xs.filter(_%2==0).map(_/2)
 
flatMap             5.0x
xs.flatMap(x => if (x%2==0) Some(x/2) else None)

  --Rex



--
Derek Williams
ichoran
Joined: 2009-08-14,
User offline. Last seen 2 years 3 weeks ago.
Re: Re: map + filter performance
Whoops.  Good catch.  You're right; it doesn't make any difference to the results.
  --Rex

On Wed, Jan 25, 2012 at 7:00 PM, Derek Williams <derek [at] fyrie [dot] net> wrote:
Probably doesn't affect the performance too much, but all the fast ones are just filtering, no mapping.

On Wed, Jan 25, 2012 at 3:31 PM, Rex Kerr <ichoran [at] gmail [dot] com> wrote:


On Wed, Jan 25, 2012 at 5:21 PM, <edmondo [dot] porcu [at] gmail [dot] com> wrote:
Can you please add your code?? ThanksInviato da BlackBerry(R) Wireless HandheldFrom: Rex Kerr <ichoran [at] gmail [dot] com> Sender: scala-user [at] googlegroups [dot] com Date: Wed, 25 Jan 2012 16:27:50 -0500To: HamsterofDeath<h-star [at] gmx [dot] de>Cc: scala-user [at] googlegroups [dot] com<scala-user [at] googlegroups [dot] com> Subject: Re: [scala-user] map + filter performance

If we set collect as the reference and filter out half the items of a list, I get the following relative timings

foreach ArrayBuffer 0.5x
{ val a = new collection.mutable.ArrayBuffer[Int]; xs.foreach(x => if (x%2==0) a += x); a }
 
List.newBuilder     0.6x
{ val a = List.newBuilder[Int]; xs.foreach(x => if (x%2==0) a += x); a.result }
 
foldLeft            0.6x
(List[Int]() /: xs)((a,x) => if (x%2==0) (x/2)::a else a)
 
ArrayBuffer+toList  0.8x
{ val a = new collection.mutable.ArrayBuffer[Int]; xs.foreach(x => if (x%2==0) a += x); a.toList }

 
collect             1.0x
xs.collect{ case x if (x%2==0) => x/2 }
 
filter/map          1.2x
xs.filter(_%2==0).map(_/2)
 
flatMap             5.0x
xs.flatMap(x => if (x%2==0) Some(x/2) else None)

  --Rex



--
Derek Williams

Copyright © 2012 École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland