This page is no longer maintained — Please continue to the home page at www.scala-lang.org

Why is the pm method so much faster?

4 replies
David Pollak
Joined: 2008-12-16,
User offline. Last seen 42 years 45 weeks ago.
Folks,
I've been playing around with the performance of different types of "summing" code and I've run into something weird.  Here's the Scala code:
object Scope {
  def iter(list: List[Int], acc: Int): Int =
  if (list.isEmpty) acc else iter(list.tail, list.head + acc)

  def pm(list: List[Int], acc: Int): Int = list match {
    case Nil => acc
    case x :: xs => pm(xs, acc + x)
  }

  def looper(list: Seq[Int]): Int = {
    var acc = 0
    for (i <- list) acc += i
    acc
  }

  def timeIt[T](name: String, f: => T): T = {
    val start = System.currentTimeMillis
    val ret: T = f
    println(name+" took "+(System.currentTimeMillis - start))
    ret
  }

  def main(argv: Array[String]) {
    (1 to 100000).foreach{i => pm(List(1,2,3,4), 0); 
                          iter(List(1,2,3,4), 0);
                          looper(List(1,2,3,4))}

    val lst = (1 to 1000000).toList

    println(timeIt("looper", looper(lst)))
    println(timeIt("pm", pm(lst, 0)))
    println(timeIt("iter", iter(lst, 0)))
  }
}

On my machine, "pm" which does tail-recursive summing using pattern matching runs 6 times faster than iter which does tail-recursive isEmpty testing.
Here's the byte-code for each of the the methods:

public int pm(scala.List, int);
  Code:
   0:   aload_1
   1:   astore  4
   3:   getstatic       #3a3a3a; //Field scala/Nil$.MODULE$:Lscala/Nil$;
   6:   aload   4
   8:   astore  5
   10:  dup
   11:  ifnonnull       23
   14:  pop
   15:  aload   5
   17:  ifnull  31
   20:  goto    33
   23:  aload   5
   25:  invokevirtual   #424242; //Method java/lang/Object.equals:(Ljava/lang/Object;)Z
   28:  ifeq    33
   31:  iload_2
   32:  ireturn
   33:  aload   4
   35:  instanceof      #464646; //class scala/$colon$colon
   38:  ifeq    68
   41:  aload   4
   43:  checkcast       #464646; //class scala/$colon$colon
   46:  astore  6
   48:  aload   6
   50:  invokevirtual   #434343; //Method scala/$colon$colon.tl$1:()Lscala/List;
   53:  iload_2
   54:  aload   6
   56:  invokevirtual   #484848; //Method scala/$colon$colon.hd$1:()Ljava/lang/Object;
   59:  invokestatic    #505050; //Method scala/runtime/BoxesRunTime.unboxToInt:(Ljava/lang/Object;)I
   62:  iadd
   63:  istore_2
   64:  astore_1
   65:  goto    0
   68:  new     #4b4b4b; //class scala/MatchError
   71:  dup
   72:  aload   4
   74:  invokespecial   #4f4f4f; //Method scala/MatchError."<init>":(Ljava/lang/Object;)V
   77:  athrow

public int iter(scala.List, int);
  Code:
   0:   aload_1
   1:   invokeinterface #555555,  1; //InterfaceMethod scala/Seq.isEmpty:()Z
   6:   ifeq    11
   9:   iload_2
   10:  ireturn
   11:  aload_1
   12:  invokevirtual   #5e5e5e; //Method scala/List.tail:()Lscala/List;
   15:  aload_1
   16:  invokevirtual   #646464; //Method scala/List.head:()Ljava/lang/Object;
   19:  invokestatic    #505050; //Method scala/runtime/BoxesRunTime.unboxToInt:(Ljava/lang/Object;)I
   22:  iload_2
   23:  iadd
   24:  istore_2
   25:  astore_1
   26:  goto    0

The iter method looks a lot cleaner.  It's making a method call to isEmpty (dispatched via the interface Seq, so it's a tad slower than dispatching against the class, but HotSpot should figure this out.)  The calls to head and tail are slightly different than pm's calls the hd$1 and tl$1, but the extra method dispatch layer should, once again, be fixed by HotSpot.  All the class testing stuff should become costless in HotSpot.  So, why is iter so much slower than pm?
Thanks,
David
--
Lift, the simply functional web framework http://liftweb.net
Collaborative Task Management http://much4.us
Follow me: http://twitter.com/dpp
Git some: http://github.com/dpp
Carsten Saager
Joined: 2008-12-19,
User offline. Last seen 42 years 45 weeks ago.
Re: Why is the pm method so much faster?
My guess would be the call to isEmpty which is defined in the trait where the calls have an overhead

/Carsten

On Sat, Jan 3, 2009 at 4:15 PM, David Pollak <feeder [dot] of [dot] the [dot] bears [at] gmail [dot] com> wrote:
Folks,
I've been playing around with the performance of different types of "summing" code and I've run into something weird.  Here's the Scala code:
object Scope {
  def iter(list: List[Int], acc: Int): Int =
  if (list.isEmpty) acc else iter(list.tail, list.head + acc)

  def pm(list: List[Int], acc: Int): Int = list match {
    case Nil => acc
    case x :: xs => pm(xs, acc + x)
  }

  def looper(list: Seq[Int]): Int = {
    var acc = 0
    for (i <- list) acc += i
    acc
  }

  def timeIt[T](name: String, f: => T): T = {
    val start = System.currentTimeMillis
    val ret: T = f
    println(name+" took "+(System.currentTimeMillis - start))
    ret
  }

  def main(argv: Array[String]) {
    (1 to 100000).foreach{i => pm(List(1,2,3,4), 0); 
                          iter(List(1,2,3,4), 0);
                          looper(List(1,2,3,4))}

    val lst = (1 to 1000000).toList

    println(timeIt("looper", looper(lst)))
    println(timeIt("pm", pm(lst, 0)))
    println(timeIt("iter", iter(lst, 0)))
  }
}

On my machine, "pm" which does tail-recursive summing using pattern matching runs 6 times faster than iter which does tail-recursive isEmpty testing.
Here's the byte-code for each of the the methods:

public int pm(scala.List, int);
  Code:
   0:   aload_1
   1:   astore  4
   3:   getstatic       #3a3a3a; //Field scala/Nil$.MODULE$:Lscala/Nil$;
   6:   aload   4
   8:   astore  5
   10:  dup
   11:  ifnonnull       23
   14:  pop
   15:  aload   5
   17:  ifnull  31
   20:  goto    33
   23:  aload   5
   25:  invokevirtual   #424242; //Method java/lang/Object.equals:(Ljava/lang/Object;)Z
   28:  ifeq    33
   31:  iload_2
   32:  ireturn
   33:  aload   4
   35:  instanceof      #464646; //class scala/$colon$colon
   38:  ifeq    68
   41:  aload   4
   43:  checkcast       #464646; //class scala/$colon$colon
   46:  astore  6
   48:  aload   6
   50:  invokevirtual   #434343; //Method scala/$colon$colon.tl$1:()Lscala/List;
   53:  iload_2
   54:  aload   6
   56:  invokevirtual   #484848; //Method scala/$colon$colon.hd$1:()Ljava/lang/Object;
   59:  invokestatic    #505050; //Method scala/runtime/BoxesRunTime.unboxToInt:(Ljava/lang/Object;)I
   62:  iadd
   63:  istore_2
   64:  astore_1
   65:  goto    0
   68:  new     #4b4b4b; //class scala/MatchError
   71:  dup
   72:  aload   4
   74:  invokespecial   #4f4f4f; //Method scala/MatchError."<init>":(Ljava/lang/Object;)V
   77:  athrow

public int iter(scala.List, int);
  Code:
   0:   aload_1
   1:   invokeinterface #555555,  1; //InterfaceMethod scala/Seq.isEmpty:()Z
   6:   ifeq    11
   9:   iload_2
   10:  ireturn
   11:  aload_1
   12:  invokevirtual   #5e5e5e; //Method scala/List.tail:()Lscala/List;
   15:  aload_1
   16:  invokevirtual   #646464; //Method scala/List.head:()Ljava/lang/Object;
   19:  invokestatic    #505050; //Method scala/runtime/BoxesRunTime.unboxToInt:(Ljava/lang/Object;)I
   22:  iload_2
   23:  iadd
   24:  istore_2
   25:  astore_1
   26:  goto    0

The iter method looks a lot cleaner.  It's making a method call to isEmpty (dispatched via the interface Seq, so it's a tad slower than dispatching against the class, but HotSpot should figure this out.)  The calls to head and tail are slightly different than pm's calls the hd$1 and tl$1, but the extra method dispatch layer should, once again, be fixed by HotSpot.  All the class testing stuff should become costless in HotSpot.  So, why is iter so much slower than pm?
Thanks,
David
--
Lift, the simply functional web framework http://liftweb.net
Collaborative Task Management http://much4.us
Follow me: http://twitter.com/dpp
Git some: http://github.com/dpp

Alex Boisvert
Joined: 2008-12-16,
User offline. Last seen 42 years 45 weeks ago.
Re: Why is the pm method so much faster?
They run about the same speed on my system:

boisvert@boog:~/tmp$ dmesg | grep Intel | grep CPU
[    0.413331] CPU0: Intel(R) Core(TM)2 CPU         T7200  @ 2.00GHz stepping 06
[    0.500706] CPU1: Intel(R) Core(TM)2 CPU         T7200  @ 2.00GHz stepping 06

boisvert@boog:~/tmp$ uname -a
Linux boog 2.6.27-9-generic #1 SMP Thu Nov 20 21:57:00 UTC 2008 i686 GNU/Linux

boisvert@boog:~/tmp$ java -version
java version "1.6.0_10"
Java(TM) SE Runtime Environment (build 1.6.0_10-b33)
Java HotSpot(TM) Server VM (build 11.0-b15, mixed mode)

boisvert@boog:~/tmp$ scala Scope
looper took 10
1784293664
pm took 8
1784293664
iter took 10
1784293664

alex

On Sat, Jan 3, 2009 at 7:15 AM, David Pollak <feeder [dot] of [dot] the [dot] bears [at] gmail [dot] com> wrote:
Folks,
I've been playing around with the performance of different types of "summing" code and I've run into something weird.  Here's the Scala code:
object Scope {
  def iter(list: List[Int], acc: Int): Int =
  if (list.isEmpty) acc else iter(list.tail, list.head + acc)

  def pm(list: List[Int], acc: Int): Int = list match {
    case Nil => acc
    case x :: xs => pm(xs, acc + x)
  }

  def looper(list: Seq[Int]): Int = {
    var acc = 0
    for (i <- list) acc += i
    acc
  }

  def timeIt[T](name: String, f: => T): T = {
    val start = System.currentTimeMillis
    val ret: T = f
    println(name+" took "+(System.currentTimeMillis - start))
    ret
  }

  def main(argv: Array[String]) {
    (1 to 100000).foreach{i => pm(List(1,2,3,4), 0); 
                          iter(List(1,2,3,4), 0);
                          looper(List(1,2,3,4))}

    val lst = (1 to 1000000).toList

    println(timeIt("looper", looper(lst)))
    println(timeIt("pm", pm(lst, 0)))
    println(timeIt("iter", iter(lst, 0)))
  }
}

On my machine, "pm" which does tail-recursive summing using pattern matching runs 6 times faster than iter which does tail-recursive isEmpty testing.
Here's the byte-code for each of the the methods:

public int pm(scala.List, int);
  Code:
   0:   aload_1
   1:   astore  4
   3:   getstatic       #3a3a3a; //Field scala/Nil$.MODULE$:Lscala/Nil$;
   6:   aload   4
   8:   astore  5
   10:  dup
   11:  ifnonnull       23
   14:  pop
   15:  aload   5
   17:  ifnull  31
   20:  goto    33
   23:  aload   5
   25:  invokevirtual   #424242; //Method java/lang/Object.equals:(Ljava/lang/Object;)Z
   28:  ifeq    33
   31:  iload_2
   32:  ireturn
   33:  aload   4
   35:  instanceof      #464646; //class scala/$colon$colon
   38:  ifeq    68
   41:  aload   4
   43:  checkcast       #464646; //class scala/$colon$colon
   46:  astore  6
   48:  aload   6
   50:  invokevirtual   #434343; //Method scala/$colon$colon.tl$1:()Lscala/List;
   53:  iload_2
   54:  aload   6
   56:  invokevirtual   #484848; //Method scala/$colon$colon.hd$1:()Ljava/lang/Object;
   59:  invokestatic    #505050; //Method scala/runtime/BoxesRunTime.unboxToInt:(Ljava/lang/Object;)I
   62:  iadd
   63:  istore_2
   64:  astore_1
   65:  goto    0
   68:  new     #4b4b4b; //class scala/MatchError
   71:  dup
   72:  aload   4
   74:  invokespecial   #4f4f4f; //Method scala/MatchError."<init>":(Ljava/lang/Object;)V
   77:  athrow

public int iter(scala.List, int);
  Code:
   0:   aload_1
   1:   invokeinterface #555555,  1; //InterfaceMethod scala/Seq.isEmpty:()Z
   6:   ifeq    11
   9:   iload_2
   10:  ireturn
   11:  aload_1
   12:  invokevirtual   #5e5e5e; //Method scala/List.tail:()Lscala/List;
   15:  aload_1
   16:  invokevirtual   #646464; //Method scala/List.head:()Ljava/lang/Object;
   19:  invokestatic    #505050; //Method scala/runtime/BoxesRunTime.unboxToInt:(Ljava/lang/Object;)I
   22:  iload_2
   23:  iadd
   24:  istore_2
   25:  astore_1
   26:  goto    0

The iter method looks a lot cleaner.  It's making a method call to isEmpty (dispatched via the interface Seq, so it's a tad slower than dispatching against the class, but HotSpot should figure this out.)  The calls to head and tail are slightly different than pm's calls the hd$1 and tl$1, but the extra method dispatch layer should, once again, be fixed by HotSpot.  All the class testing stuff should become costless in HotSpot.  So, why is iter so much slower than pm?
Thanks,
David
--
Lift, the simply functional web framework http://liftweb.net
Collaborative Task Management http://much4.us
Follow me: http://twitter.com/dpp
Git some: http://github.com/dpp

David Pollak
Joined: 2008-12-16,
User offline. Last seen 42 years 45 weeks ago.
Re: Why is the pm method so much faster?
I did my test on Windows with a client VM.  I guess the server version does better optimizations.

On Sat, Jan 3, 2009 at 8:56 AM, Alex Boisvert <boisvert [at] intalio [dot] com> wrote:
They run about the same speed on my system:

boisvert@boog:~/tmp$ dmesg | grep Intel | grep CPU
[    0.413331] CPU0: Intel(R) Core(TM)2 CPU         T7200  @ 2.00GHz stepping 06
[    0.500706] CPU1: Intel(R) Core(TM)2 CPU         T7200  @ 2.00GHz stepping 06

boisvert@boog:~/tmp$ uname -a
Linux boog 2.6.27-9-generic #1 SMP Thu Nov 20 21:57:00 UTC 2008 i686 GNU/Linux

boisvert@boog:~/tmp$ java -version
java version "1.6.0_10"
Java(TM) SE Runtime Environment (build 1.6.0_10-b33)
Java HotSpot(TM) Server VM (build 11.0-b15, mixed mode)

boisvert@boog:~/tmp$ scala Scope
looper took 10
1784293664
pm took 8
1784293664
iter took 10
1784293664

alex

On Sat, Jan 3, 2009 at 7:15 AM, David Pollak <feeder [dot] of [dot] the [dot] bears [at] gmail [dot] com> wrote:
Folks,
I've been playing around with the performance of different types of "summing" code and I've run into something weird.  Here's the Scala code:
object Scope {
  def iter(list: List[Int], acc: Int): Int =
  if (list.isEmpty) acc else iter(list.tail, list.head + acc)

  def pm(list: List[Int], acc: Int): Int = list match {
    case Nil => acc
    case x :: xs => pm(xs, acc + x)
  }

  def looper(list: Seq[Int]): Int = {
    var acc = 0
    for (i <- list) acc += i
    acc
  }

  def timeIt[T](name: String, f: => T): T = {
    val start = System.currentTimeMillis
    val ret: T = f
    println(name+" took "+(System.currentTimeMillis - start))
    ret
  }

  def main(argv: Array[String]) {
    (1 to 100000).foreach{i => pm(List(1,2,3,4), 0); 
                          iter(List(1,2,3,4), 0);
                          looper(List(1,2,3,4))}

    val lst = (1 to 1000000).toList

    println(timeIt("looper", looper(lst)))
    println(timeIt("pm", pm(lst, 0)))
    println(timeIt("iter", iter(lst, 0)))
  }
}

On my machine, "pm" which does tail-recursive summing using pattern matching runs 6 times faster than iter which does tail-recursive isEmpty testing.
Here's the byte-code for each of the the methods:

public int pm(scala.List, int);
  Code:
   0:   aload_1
   1:   astore  4
   3:   getstatic       #3a3a3a; //Field scala/Nil$.MODULE$:Lscala/Nil$;
   6:   aload   4
   8:   astore  5
   10:  dup
   11:  ifnonnull       23
   14:  pop
   15:  aload   5
   17:  ifnull  31
   20:  goto    33
   23:  aload   5
   25:  invokevirtual   #424242; //Method java/lang/Object.equals:(Ljava/lang/Object;)Z
   28:  ifeq    33
   31:  iload_2
   32:  ireturn
   33:  aload   4
   35:  instanceof      #464646; //class scala/$colon$colon
   38:  ifeq    68
   41:  aload   4
   43:  checkcast       #464646; //class scala/$colon$colon
   46:  astore  6
   48:  aload   6
   50:  invokevirtual   #434343; //Method scala/$colon$colon.tl$1:()Lscala/List;
   53:  iload_2
   54:  aload   6
   56:  invokevirtual   #484848; //Method scala/$colon$colon.hd$1:()Ljava/lang/Object;
   59:  invokestatic    #505050; //Method scala/runtime/BoxesRunTime.unboxToInt:(Ljava/lang/Object;)I
   62:  iadd
   63:  istore_2
   64:  astore_1
   65:  goto    0
   68:  new     #4b4b4b; //class scala/MatchError
   71:  dup
   72:  aload   4
   74:  invokespecial   #4f4f4f; //Method scala/MatchError."<init>":(Ljava/lang/Object;)V
   77:  athrow

public int iter(scala.List, int);
  Code:
   0:   aload_1
   1:   invokeinterface #555555,  1; //InterfaceMethod scala/Seq.isEmpty:()Z
   6:   ifeq    11
   9:   iload_2
   10:  ireturn
   11:  aload_1
   12:  invokevirtual   #5e5e5e; //Method scala/List.tail:()Lscala/List;
   15:  aload_1
   16:  invokevirtual   #646464; //Method scala/List.head:()Ljava/lang/Object;
   19:  invokestatic    #505050; //Method scala/runtime/BoxesRunTime.unboxToInt:(Ljava/lang/Object;)I
   22:  iload_2
   23:  iadd
   24:  istore_2
   25:  astore_1
   26:  goto    0

The iter method looks a lot cleaner.  It's making a method call to isEmpty (dispatched via the interface Seq, so it's a tad slower than dispatching against the class, but HotSpot should figure this out.)  The calls to head and tail are slightly different than pm's calls the hd$1 and tl$1, but the extra method dispatch layer should, once again, be fixed by HotSpot.  All the class testing stuff should become costless in HotSpot.  So, why is iter so much slower than pm?
Thanks,
David
--
Lift, the simply functional web framework http://liftweb.net
Collaborative Task Management http://much4.us
Follow me: http://twitter.com/dpp
Git some: http://github.com/dpp




--
Lift, the simply functional web framework http://liftweb.net
Collaborative Task Management http://much4.us
Follow me: http://twitter.com/dpp
Git some: http://github.com/dpp
Alex Cruise
Joined: 2008-12-17,
User offline. Last seen 2 years 26 weeks ago.
Re: Why is the pm method so much faster?

Quoting David Pollak :

> I did my test on Windows with a client VM. I guess the server version does
> better optimizations.

IIUC the -server compiler (aka C2) was originally written by Cliff
Click, and has very little in common with the -client compiler. One
of the ongoing projects to improve Java's desktop performance involves
porting optimizations from -server to -client. :)

Basically, any run in which you're trying to achieve high throughput
should use -server, whereas for any run in which you want to minimize
startup time and/or memory usage, -client may be a better choice.

-0xe1a

----------------------------------------------------------------
This message was sent using IMP, the Internet Messaging Program.

Copyright © 2012 École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland