This page is no longer maintained — Please continue to the home page at www.scala-lang.org

More accurate positions in classfiles / in the debugger

2 replies
Johannes Rudolph 2
Joined: 2010-02-12,
User offline. Last seen 42 years 45 weeks ago.

Yesterday at the Zurich Scala Meetup, I talked with Daniel Kroeni
about how to improve the situation when it comes to the coarse
granularity of positions when debugging. As you know, the problem is
that in classfiles, you can only map single bytecode instructions to a
line of code and not more accurate like the position inside the line.
We thought about how to improve this situation and we came up with an
almost too stupid, simple system:
Why does Scala not just overload the existing infrastructure by saving
absolute offsets instead of line numbers in the LineNumberTable? The
obvious problem here would be, that entries of that table are only
16bit wide so you would have to think about how to expand that scheme
to file size bigger than 2^16. Our simple workaround would be to throw
away as much bits as necessary to still address positions as accurate
as possible. For example for a file size between 2^16 and 2^17 you
would only address positions with a granularity of 2 characters, for a
file-size between 2^17 and 2^18 with a granularity of 4 characters and
so on. Most of the files are even below the 2^16 threshold so you
could have perfectly accurate positions most of the time. This
encoding would always be reversible since the debugger knows the file
size of the source file anyway. On the code generation side, that
seems to be a trivial change.

On the client side, of course, you have to change interpretation of
line number information coming from the debugger or when setting
breakpoints and when clicking on stack traces it has to be known that
this alternative encoding was used for this file/class to jump to the
correct positions. With this scheme you don't have to make any
intrusive changes into the debugging API of the JVM or by adding new
classfile attributes to account for the positions.

Another drawback would be that stacktraces would be less obvious to
parse by humans because jumping to positions by offset is not the
common way to navigate files.

Another way of addressing could be by numbering the tokens instead of
the actual offsets into source files.

What do you think?

Kevin Wright 2
Joined: 2010-05-30,
User offline. Last seen 26 weeks 4 days ago.
Re: More accurate positions in classfiles / in the debugger
While I love the idea of indexing by tokens, this means that any tool capable of using the debug information would need to have access to a full parser.  There's also a backwards compatibility issue involved if future language evolutions change what's considered to be a token.
Another possibility would be to reformat the file according to some strictly-defined set of rules, and then continue to use line numbering.  The reformatted sources could then be output at the same time as the classes, and even used in creating source jar files so that existing maven-aware tools could continue to use them without needing modification.

On 3 November 2010 14:35, Johannes Rudolph <johannes [dot] rudolph [at] googlemail [dot] com> wrote:
Yesterday at the Zurich Scala Meetup, I talked with Daniel Kroeni
about how to improve the situation when it comes to the coarse
granularity of positions when debugging. As you know, the problem is
that in classfiles, you can only map single bytecode instructions to a
line of code and not more accurate like the position inside the line.
We thought about how to improve this situation and we came up with an
almost too stupid, simple system:
Why does Scala not just overload the existing infrastructure by saving
absolute offsets instead of line numbers in the LineNumberTable? The
obvious problem here would be, that entries of that table are only
16bit wide so you would have to think about how to expand that scheme
to file size bigger than 2^16. Our simple workaround would be to throw
away as much bits as necessary to still address positions as accurate
as possible. For example for a file size between 2^16 and 2^17 you
would only address positions with a granularity of 2 characters, for a
file-size between 2^17 and 2^18 with a granularity of 4 characters and
so on. Most of the files are even below the 2^16 threshold so you
could have perfectly accurate positions most of the time. This
encoding would always be reversible since the debugger knows the file
size of the source file anyway. On the code generation side, that
seems to be a trivial change.

On the client side, of course, you have to change interpretation of
line number information coming from the debugger or when setting
breakpoints and when clicking on stack traces it has to be known that
this alternative encoding was used for this file/class to jump to the
correct positions. With this scheme you don't have to make any
intrusive changes into the debugging API of the JVM or by adding new
classfile attributes to account for the positions.

Another drawback would be that stacktraces would be less obvious to
parse by humans because jumping to positions by offset is not the
common way to navigate files.

Another way of addressing could be by numbering the tokens instead of
the actual offsets into source files.

What do you think?

--
Johannes

-----------------------------------------------
Johannes Rudolph
http://virtual-void.net



--
Kevin Wright

mail / gtalk / msn : kev [dot] lee [dot] wright [at] gmail [dot] com
pulse / skype: kev.lee.wright
twitter: @thecoda

Iulian Dragos 2
Joined: 2009-02-10,
User offline. Last seen 42 years 45 weeks ago.
Re: More accurate positions in classfiles / in the debugger


On Wed, Nov 3, 2010 at 3:35 PM, Johannes Rudolph <johannes [dot] rudolph [at] googlemail [dot] com> wrote:
Yesterday at the Zurich Scala Meetup, I talked with Daniel Kroeni
about how to improve the situation when it comes to the coarse
granularity of positions when debugging. As you know, the problem is
that in classfiles, you can only map single bytecode instructions to a
line of code and not more accurate like the position inside the line.

I started working on improving the debugging experience for Scala programs, and one of the sore-points is indeed line numbers. Ideally there would be support for more than just line numbers. I am looking at JSR 45 (Debugging support for other languages). I am not sure yet if that is enough, but maybe a Scala stratum could use your scheme.  
We thought about how to improve this situation and we came up with an
almost too stupid, simple system:
Why does Scala not just overload the existing infrastructure by saving
absolute offsets instead of line numbers in the LineNumberTable? The
obvious problem here would be, that entries of that table are only
16bit wide so you would have to think about how to expand that scheme
to file size bigger than 2^16. Our simple workaround would be to throw
away as much bits as necessary to still address positions as accurate
as possible. For example for a file size between 2^16 and 2^17 you
would only address positions with a granularity of 2 characters, for a
file-size between 2^17 and 2^18 with a granularity of 4 characters and
so on. Most of the files are even below the 2^16 threshold so you
could have perfectly accurate positions most of the time. This
encoding would always be reversible since the debugger knows the file
size of the source file anyway. On the code generation side, that
seems to be a trivial change.

I am not thrilled by this change since it would render all existing tools unusable. So far there is no Scala debugger, but Java debuggers work more or less. With such a change we'd break all existing tools: debuggers, profilers, data coverage tools, etc.
I think the better way is to provide additional, Scala specific, debugging information in classfile attributes or annotations. As far as I know, JDI does not give access to classfile attributes, nor to annotations, but there are some tricks we could use to get to them. This way we'd preserve existing functionality, and allow tools to do more when they know about Scala attributes.  
With this scheme you don't have to make any
intrusive changes into the debugging API of the JVM or by adding new
classfile attributes to account for the positions.

There is more than just positions that you'd need for a good debugger. For instance, synthetic getters should be skipped when stepping into, the environment inside closures should be presented as locals instead of RefCells no the heap, etc. I'm not sure all this information can be recovered just by looking at the names, so richer debug information is needed.  
What do you think?

It's great to see people interested in this! We could come up with a SID, summarizing what information is needed for debuggers.
iulian  

--
Johannes

-----------------------------------------------
Johannes Rudolph
http://virtual-void.net



--
« Je déteste la montagne, ça cache le paysage »
Alphonse Allais

Copyright © 2012 École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland