This page is no longer maintained — Please continue to the home page at www.scala-lang.org

Problem with JavaTokenParsers

3 replies
Tomygun
Joined: 2008-11-27,
User offline. Last seen 3 years 48 weeks ago.
I'm going crazy trying to parse a really simple text, I want to parse a raw wiki script. After having many problems trying to make it work I reduced the code to the least possible to show what I appears to be a bug, unless there's something I'm missing.

import scala.util.parsing.combinator._

object Parser extends JavaTokenParsers {
    def list: Parser[Any] = rep("+"~text)
    def text: Parser[Any] = ".*".r

    val source = "+elem1\n+elem2\n+elem3"

    def main(args: Array[String]) {
        println(source)
        println(parseAll(list, source))
    }
}

Outputs:

[3.7] parsed: List((+~elem1), (+~elem2), (+~elem3))

Ok everything works as expected but if "rep("+"~text) " is replaced with "rep(text)" it goes into an infinite loop forcing me to kill the proccess which by the way is reaching a whole GB of ram.

I run into this problem when trying to parse this lists scattered through regular text and now I can't continue because of this. I really need help.

Thanks!


Roland Kuhn
Joined: 2008-12-26,
User offline. Last seen 3 years 14 weeks ago.
Re: Problem with JavaTokenParsers

Hi Tomás,

On Fri, December 26, 2008 03:58, Tomás Lázaro wrote:
> I'm going crazy trying to parse a really simple text, I want to parse a raw
> wiki script. After having many problems trying to make it work I reduced the
> code to the least possible to show what I appears to be a bug, unless
> there's something I'm missing.
>
> import scala.util.parsing.combinator._
>
> object Parser extends JavaTokenParsers {
> def list: Parser[Any] = rep("+"~text)
> def text: Parser[Any] = ".*".r
>
> val source = "+elem1\n+elem2\n+elem3"
>
> def main(args: Array[String]) {
> println(source)
> println(parseAll(list, source))
> }
> }
>
> Outputs:
>
> [3.7] parsed: List((+~elem1), (+~elem2), (+~elem3))
>
> Ok everything works as expected but if "rep("+"~text) " is replaced with
> "rep(text)" it goes into an infinite loop forcing me to kill the proccess
> which by the way is reaching a whole GB of ram.
>
> I run into this problem when trying to parse this lists scattered through
> regular text and now I can't continue because of this. I really need help.
>
> Thanks!
>
The problem is your regex, which happily accepts the empty string. Putting that into a "rep" is
asking for disaster ;-) Without having access to my normal Scala gear, I suspect that the regex
parser does not discard whitespace (the newline) like the literal "+" parser does, so you get
stuck at the end of the first line. I don't know what exactly you are trying to parse, but you
should be more specific with your regex. At least use '+' instead of '*', but you can also send me
a more specific example so I can help you better.

Ciao,

Roland

Stefan Ackermann
Joined: 2008-12-22,
User offline. Last seen 42 years 45 weeks ago.
Re: Problem with JavaTokenParsers

I have replied directly instead of to the mailing list...
here is what I said...

Well if you just put rep(text) and text is virtually anything, how is it
supposed to be parsed?

This was indeed the problem. I got some debug output using:
def text: Parser[Any] = ".*".r ^^ {x => println(x);x}
And it immediately showed me, that it was trying to match empty strings with
this parser. Which this parser happily accepted.
Change it to:
def text: Parser[Any] = ".+".r
and it works fine...

Roland Kuhn-2 wrote:
>
> Hi Tomás,
>
> On Fri, December 26, 2008 03:58, Tomás Lázaro wrote:
>> I'm going crazy trying to parse a really simple text, I want to parse a
>> raw
>> wiki script. After having many problems trying to make it work I reduced
>> the
>> code to the least possible to show what I appears to be a bug, unless
>> there's something I'm missing.
>>
>> import scala.util.parsing.combinator._
>>
>> object Parser extends JavaTokenParsers {
>> def list: Parser[Any] = rep("+"~text)
>> def text: Parser[Any] = ".*".r
>>
>> val source = "+elem1\n+elem2\n+elem3"
>>
>> def main(args: Array[String]) {
>> println(source)
>> println(parseAll(list, source))
>> }
>> }
>>
>> Outputs:
>>
>> [3.7] parsed: List((+~elem1), (+~elem2), (+~elem3))
>>
>> Ok everything works as expected but if "rep("+"~text) " is replaced with
>> "rep(text)" it goes into an infinite loop forcing me to kill the proccess
>> which by the way is reaching a whole GB of ram.
>>
>> I run into this problem when trying to parse this lists scattered through
>> regular text and now I can't continue because of this. I really need
>> help.
>>
>> Thanks!
>>
> The problem is your regex, which happily accepts the empty string. Putting
> that into a "rep" is
> asking for disaster ;-) Without having access to my normal Scala gear, I
> suspect that the regex
> parser does not discard whitespace (the newline) like the literal "+"
> parser does, so you get
> stuck at the end of the first line. I don't know what exactly you are
> trying to parse, but you
> should be more specific with your regex. At least use '+' instead of '*',
> but you can also send me
> a more specific example so I can help you better.
>
> Ciao,
>
> Roland
>
>
>

Tomygun
Joined: 2008-11-27,
User offline. Last seen 3 years 48 weeks ago.
Re: Problem with JavaTokenParsers
Wow that was unexpected... it's like an extremely non-greedy regex. Thanks to all, that got me working again.

(That example was a simplification, it does not make sense to parse any text anywhere.)

On Fri, Dec 26, 2008 at 9:04 AM, Stefan Ackermann-4 <stivo [dot] scala [at] gmail [dot] com> wrote:

I have replied directly instead of to the mailing list...
here is what I said...

Well if you just put rep(text) and text is virtually anything, how is it
supposed to be parsed?

This was indeed the problem. I got some debug output using:
def text: Parser[Any] = ".*".r ^^ {x => println(x);x}
And it immediately showed me, that it was trying to match empty strings with
this parser. Which this parser happily accepted.
Change it to:
def text: Parser[Any] = ".+".r
and it works fine...


Roland Kuhn-2 wrote:
>
> Hi Tomás,
>
> On Fri, December 26, 2008 03:58, Tomás Lázaro wrote:
>> I'm going crazy trying to parse a really simple text, I want to parse a
>> raw
>> wiki script. After having many problems trying to make it work I reduced
>> the
>> code to the least possible to show what I appears to be a bug, unless
>> there's something I'm missing.
>>
>> import scala.util.parsing.combinator._
>>
>> object Parser extends JavaTokenParsers {
>>     def list: Parser[Any] = rep("+"~text)
>>     def text: Parser[Any] = ".*".r
>>
>>     val source = "+elem1\n+elem2\n+elem3"
>>
>>     def main(args: Array[String]) {
>>         println(source)
>>         println(parseAll(list, source))
>>     }
>> }
>>
>> Outputs:
>>
>> [3.7] parsed: List((+~elem1), (+~elem2), (+~elem3))
>>
>> Ok everything works as expected but if "rep("+"~text) " is replaced with
>> "rep(text)" it goes into an infinite loop forcing me to kill the proccess
>> which by the way is reaching a whole GB of ram.
>>
>> I run into this problem when trying to parse this lists scattered through
>> regular text and now I can't continue because of this. I really need
>> help.
>>
>> Thanks!
>>
> The problem is your regex, which happily accepts the empty string. Putting
> that into a "rep" is
> asking for disaster ;-) Without having access to my normal Scala gear, I
> suspect that the regex
> parser does not discard whitespace (the newline) like the literal "+"
> parser does, so you get
> stuck at the end of the first line. I don't know what exactly you are
> trying to parse, but you
> should be more specific with your regex. At least use '+' instead of '*',
> but you can also send me
> a more specific example so I can help you better.
>
> Ciao,
>
> Roland
>
>
>

--
View this message in context: http://www.nabble.com/Problem-with-JavaTokenParsers-tp21171432p21173385.html
Sent from the Scala - User mailing list archive at Nabble.com.


Copyright © 2012 École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland