This page is no longer maintained — Please continue to the home page at www.scala-lang.org

parser combinators vs. regex question

25 replies
ArtemGr
Joined: 2009-01-12,
User offline. Last seen 4 days 20 hours ago.

I'm trying to do a CSV parser: http://gist.github.com/115557

A theoretical question:

The following regex: /(?xs) ("(.*?)"|) ; ("(.*?)"|) (?: \r\n | \z )/
have a nice property of ignoring any double quotes in the file which aren't
followed by either a semicolon, and end-of-line or an end-of-file.
In particular, I can pass "\"\"\";" - and it will be a valid CSV file
containing a double-quote in the first column.

If I'm not mistaking, it's called backtracking:
the regular expression engine can see that the second double-quote wasn't not
the proper match and will skip to the third double-quote, which is followed by
a semi-colon.

I have tried to do something similar with parser combinators,
e.g. with
def stringInQuotes = "" | ('"' ~ rep ("(?s).".r) ~ '"'
^^ {case _ ~ chars ~ _ => chars.mkString ("")})
or
def stringInQuotes = opt ('"' ~ rep (elem("value", (c: Char) => true)) ~ '"')
^^ {case None => ""; case Some (_ ~ chars ~ _) => chars.mkString ("")}

but the first doesn't do what expected
and the second goes into an infinite cycle.

Is it possible to do that kind of backtracking with the current (2.7.4) release
of the combinators and if so, then what i'm doing wrong?

ArtemGr
Joined: 2009-01-12,
User offline. Last seen 4 days 20 hours ago.
Re: parser combinators vs. regex question

ArtemGr writes:
> I have tried to do something similar with parser combinators,
> e.g. with
> def stringInQuotes = "" | ('"' ~ rep ("(?s).".r) ~ '"'
> ^^ {case _ ~ chars ~ _ => chars.mkString ("")})
> or
> def stringInQuotes = opt ('"' ~ rep (elem("value", (c: Char) => true)) ~ '"')
> ^^ {case None => ""; case Some (_ ~ chars ~ _) => chars.mkString ("")}
>
> but the first doesn't do what expected
> and the second goes into an infinite cycle.

As a side note, it would be good if the
implicit def regex(r: Regex): Parser[String]
method in RegexParsers produced a matcher with access to the matched
groups, instead of just a string.
Then it will be possible to use the regex backtracking features locally inside
the parser combinators.

extempore
Joined: 2008-12-17,
User offline. Last seen 35 weeks 3 days ago.
Re: parser combinators vs. regex question

On Thu, May 21, 2009 at 05:53:35PM +0000, ArtemGr wrote:
> The following regex: /(?xs) ("(.*?)"|) ; ("(.*?)"|) (?: \r\n | \z )/
> have a nice property of ignoring any double quotes in the file which aren't
> followed by either a semicolon, and end-of-line or an end-of-file.
> In particular, I can pass "\"\"\";" - and it will be a valid CSV file
> containing a double-quote in the first column.

For 2.8 the guard combinator has been added, so you could say:

'"' ~> rep(chars) <~ '"' <~ guard("\r\n" | EOF)

but in 2.7 you can achieve the same effect with double negation:

'"' ~> rep(chars) <~ '"' <~ not(not("\r\n" | EOF))

This is pseudocode but you should be able to get it going. Neither
guard nor not consume any input, they only place constraints on a match.

ArtemGr
Joined: 2009-01-12,
User offline. Last seen 4 days 20 hours ago.
Re: parser combinators vs. regex question

Paul Phillips writes:
> For 2.8 the guard combinator has been added, so you could say:
>
> '"' ~> rep(chars) <~ '"' <~ guard("\r\n" | EOF)
>
> but in 2.7 you can achieve the same effect with double negation:
>
> '"' ~> rep(chars) <~ '"' <~ not(not("\r\n" | EOF))
>
> This is pseudocode but you should be able to get it going. Neither
> guard nor not consume any input, they only place constraints on a match.

Thanks for the hint.
I think the guard is a look-ahead construct,
it doesn't answer the backtracking question in any way.

Intuitively, since there is a disjunction which effectively does a backtracking
(e.g. if the first Parser didn't work, disj. should return to the same position
and try the second Parser), I think replacing

def stringInQuotes = """(?xs) ".*?" |""".r ^^ {
case qstr => if (qstr.length != 0) qstr.substring (1, qstr.length - 1) else ""}
def line = stringInQuotes ~ ';' ~ stringInQuotes ~ (CRLF | EOF) ^^ {
case col1 ~ _ ~ col2 ~ _ => col1 :: col2 :: Nil}

with

def chars1 = rep ("(?s).".r) ^^ {case chars_ => chars_ mkString ""}
def chars2 = rep (elem("value", (c: Char) => true)) ^^ {
case chars_ => chars_ mkString ""}
def col1 = ('"' ~ chars1 ~ "\";" ^^ {case _ ~ value ~ _ => value}
| ";" ^^ {case _ => ""})
def col2 = ('"' ~ chars1 ~ ("\"" ~ (CRLF | EOF)) ^^ {
case _ ~ value ~ _ => value}
| (CRLF | EOF) ^^ {case _ => ""})
def line = col1 ~ col2 ^^ {case v1 ~ v2 => v1 :: v2 :: Nil}

should just work.
However, like I said earlier, with chars1 it gives wrong results, e.g.

[1.1] failure: `";' expected but `' found
with input "\"qq\nqq\";"

and with chars2 it goes into an infinite loop...

extempore
Joined: 2008-12-17,
User offline. Last seen 35 weeks 3 days ago.
Re: Re: parser combinators vs. regex question

On Thu, May 21, 2009 at 08:01:28PM +0000, ArtemGr wrote:
> [snip]

The degree to which you are overcomplicating this by trying to apply
regexp backtracking defies description. Your code is too difficult to
read, but what you're trying to do can be done in one or two lines.

> I think the guard is a look-ahead construct, it doesn't answer the
> backtracking question in any way.

What do you think happens when the guard fails? Backtracking is what the
combinators do if you use the backtracking ops (that's | among others.)
Attempting to reimplement backtracking with regexps inside the
combinator framework is like throwing away your sword so you can fight
off the marauders with a lampshade.

ArtemGr
Joined: 2009-01-12,
User offline. Last seen 4 days 20 hours ago.
Re: parser combinators vs. regex question

Paul Phillips writes:
> On Thu, May 21, 2009 at 08:01:28PM +0000, ArtemGr wrote:
> > [snip]
>
> The degree to which you are overcomplicating this by trying to apply
> regexp backtracking defies description.

I'm not "trying to apply" regex backtracking, i've already said it isn't
very convenient without access to subgroups.

> Your code is too difficult to
> read, but what you're trying to do can be done in one or two lines.

> > I think the guard is a look-ahead construct, it doesn't answer the
> > backtracking question in any way.
>
> What do you think happens when the guard fails? Backtracking is what the
> combinators do if you use the backtracking ops (that's | among others.)

That's what I implied in the previous post, by saying that disjunction should
use backtracking. Thanks for clarifying. That answers the theoretical
half of my question.

> Attempting to reimplement backtracking with regexps inside the
> combinator framework is like throwing away your sword so you can fight
> off the marauders with a lampshade.

I think the comparison would be rather of throwing away the spare parts of a
calculator in order to fight the maradeurs with the old good regex sword.
After all, the simple and intuitive regex
"""(?xs) ("(.*?)"|) ; ("(.*?)"|) (?: \r?\n | \z ) """
which took me one minute to write - works, and parser combinators, after 12
to 16 hours of tweaking are either fail unexpectedly or go skyrocket into
infinite loop.

ArtemGr
Joined: 2009-01-12,
User offline. Last seen 4 days 20 hours ago.
Re: parser combinators vs. regex question
ArtemGr writes: > Is it possible to do that kind of backtracking with the current (2.7.4) > release of the combinators and if so, then what i'm doing wrong? I have found an interesting comment in the javadoc of disjunction method "Parser.|":

`p | q' succeeds if `p' succeeds or `q' succeeds Note that `q' is only tried if `p's failure is non-fatal (i.e., back-tracking is allowed).

It implies that backtracking is not always allowed. I wonder what kind of parsers might produce a "fatal failure" and why?
Johannes Rudolph
Joined: 2008-12-17,
User offline. Last seen 29 weeks 19 hours ago.
Re: Re: parser combinators vs. regex question
A not-matching branch of an alternative is non-fatal. For better error handling you might anticipate and introduce error branches (alternatives) to give better error messages. These are fatal, since you don't want parsing to continue in these cases. On Fri, May 22, 2009 at 10:49 AM, ArtemGr wrote: > ArtemGr writes: >> Is it possible to do that kind of backtracking with the current (2.7.4) >> release of the combinators and if so, then what i'm doing wrong? > > I have found an interesting comment > in the javadoc of disjunction method "Parser.|": > >

`p | q' succeeds if `p' succeeds or `q' succeeds > Note that `q' is only tried if `p's failure is non-fatal > (i.e., back-tracking is allowed).

> > It implies that backtracking is not always allowed. > > I wonder what kind of parsers might produce a "fatal failure" and why? > >
Randall R Schulz
Joined: 2008-12-16,
User offline. Last seen 1 year 29 weeks ago.
Re: Re: parser combinators vs. regex question

On Friday May 22 2009, Johannes Rudolph wrote:
> A not-matching branch of an alternative is non-fatal.
> For better error handling you might anticipate and introduce error
> branches (alternatives) to give better error messages. These are
> fatal, since you don't want parsing to continue in these cases.

Unless, of course, you do. I tend to think that nothing is worse than a
prematurely terminated parse. It's at least as bad as poor error
messages.

There are typically some (often many) errors that don't render the rest
of the parse impossible or meaningless. When I write parsers, I always
try to include as many error productions as possible.

Randall Schulz

normen.mueller
Joined: 2008-10-31,
User offline. Last seen 3 years 8 weeks ago.
Regex question

He,

I am trying to strip of potential leading/ tailing quotation marks
("), but don't get my regex right:

val s1 = "\"hello world\""
val s2 = "hello world"

val StrValue = """[^"]((\w*\s*)*)""".r

(StrValue findFirstIn s1) foreach (println)
(StrValue findFirstIn s2) foreach (println)

val StrValue(_, ss1) = s1 // XXX Match Error
println(ss1)

val StrValue(ss2) = s2 // XXX Match Error
println(ss2)

Can anyone help me out, please?

Cheers,
--
Normen Müller

normen.mueller
Joined: 2008-10-31,
User offline. Last seen 3 years 8 weeks ago.
Re: Regex question

I just recognized that

val Decimal = """(-)?(\d+)(\.\d*)?""".r
val Decimal(d) = "1.0"
println(d)

out of ``Programming in Scala'' also doesn't work. But I am pretty
sure it worked some time before 2.7.6. :\

On Sep 29, 2009, at 3:39 PM, Normen Müller wrote:

> He,
>
> I am trying to strip of potential leading/ tailing quotation marks
> ("), but don't get my regex right:
>
> val s1 = "\"hello world\""
> val s2 = "hello world"
>
> val StrValue = """[^"]((\w*\s*)*)""".r
>
> (StrValue findFirstIn s1) foreach (println)
> (StrValue findFirstIn s2) foreach (println)
>
> val StrValue(_, ss1) = s1 // XXX Match Error
> println(ss1)
>
> val StrValue(ss2) = s2 // XXX Match Error
> println(ss2)
>
> Can anyone help me out, please?
>
> Cheers,
> --
> Normen Müller
>

Cheers,
--
Normen Müller

Randall R Schulz
Joined: 2008-12-16,
User offline. Last seen 1 year 29 weeks ago.
Re: Re: Regex question

On Tuesday September 29 2009, Normen Müller wrote:
> I just recognized that
>
> val Decimal = """(-)?(\d+)(\.\d*)?""".r
> val Decimal(d) = "1.0"
> println(d)
>
> out of ``Programming in Scala'' also doesn't work. But I am pretty
> sure it worked some time before 2.7.6. :\

On this machine (where I don't do development and hence don't
bother updating Scala) I have 2.7.4.

But your result is to be expected. There are three capturing groups in
your RE, so you have to bind three values in the combined match /
declaration syntax:

scala> val Decimal(signPart, intPart, fracPart) = "1.0"
signPart: String = null
intPart: String = 1
fracPart: String = .0

Randall Schulz

normen.mueller
Joined: 2008-10-31,
User offline. Last seen 3 years 8 weeks ago.
Re: Re: Regex question

On Sep 29, 2009, at 4:19 PM, Randall R Schulz wrote:

> On Tuesday September 29 2009, Normen Müller wrote:
>> I just recognized that
>>
>> val Decimal = """(-)?(\d+)(\.\d*)?""".r
>> val Decimal(d) = "1.0"
>> println(d)
>>
>> out of ``Programming in Scala'' also doesn't work. But I am pretty
>> sure it worked some time before 2.7.6. :\
>
> On this machine (where I don't do development and hence don't
> bother updating Scala) I have 2.7.4.
>
> But your result is to be expected. There are three capturing groups in
> your RE, so you have to bind three values in the combined match /
> declaration syntax:
>
> scala> val Decimal(signPart, intPart, fracPart) = "1.0"
> signPart: String = null
> intPart: String = 1
> fracPart: String = .0

My fault … :(

But what about

val s1 = "\"hello world\""
val s2 = "hello world"

val StrValue = """[^"]((\w*\s*)*)""".r

val StrValue(_, ss1) = s1 // XXX Match Error
println(ss1)

>
>
> Randall Schulz

Cheers,
--
Normen Müller

Gordon Tyler
Joined: 2009-06-10,
User offline. Last seen 42 years 45 weeks ago.
Re: Re: Regex question

Normen Müller wrote:
> But what about
>
> val s1 = "\"hello world\""
> val s2 = "hello world"
>
> val StrValue = """[^"]((\w*\s*)*)""".r
>
> val StrValue(_, ss1) = s1 // XXX Match Error
> println(ss1)

You're telling it that a double-quote character at the start of your
string must NOT match. Try this:

val StrValue = """"*([\w\s]*)"*""".r
val StrValue(ss1) = s1

Ciao,
Gordon

normen.mueller
Joined: 2008-10-31,
User offline. Last seen 3 years 8 weeks ago.
Re: Re: Regex question

On Sep 29, 2009, at 4:38 PM, Gordon Tyler wrote:

> val StrValue = """"*([\w\s]*)"*""".r

And once more … my fault! You are absolutely right!

Thank you!!

Cheers,
--
Normen Müller

dcsobral
Joined: 2009-04-23,
User offline. Last seen 38 weeks 5 days ago.
Re: Regex question
There are many problems. First, when you do pattern matching such as the match errors you indicated, the string must be _exactly_ equal to what is matched by findFirstIn. In other words:   (StrValue findFirstIn s1).get == s1
(StrValue findFirstIn s2).get == s2   The second one is true, the first one is not, and that's why the first one gives an error.   The next problem is that, when doing pattern matching, one parameter will be returned for each parenthesis group (aside those you explicitly flag not to -- see the API docs on Java's Pattern class).   Now, StrValue, as you defined, has two parenthesis, but you are only passing one parameter when matching against s2, and that's why that line gives an error.
Finally, the pattern itself is inefficent: ((\w*\s*)*). The problem is that this gives multiple ways of interpreting the same pattern. For instance, "abc" can be interpreted as (\w{1}\s{0}){3} or (\w{3}\s{0}){1}, or various multiple combinations. You must strive to have your patterns have only one possible match.   What I recommend, to deal will all the problems, is   val StrValue= """^(?:[^"]*")?([^"]*)(?:"[^"]*)?$""".r   scala> val StrValue(ss1) = s1
ss1: String = hello world   scala> val StrValue(ss2) = s2
ss2: String = hello world   This is a rather complex pattern. You may have some fun (or not! :) figuring out what does it mean, and feeding it sample strings to see how it works.   On Tue, Sep 29, 2009 at 10:39 AM, Normen Müller <normen [dot] mueller [at] googlemail [dot] com> wrote:
He,

I am trying to strip of potential leading/ tailing quotation marks ("), but don't get my regex right:

val s1 = "\"hello world\""
val s2 = "hello world"

val StrValue = """[^"]((\w*\s*)*)""".r

(StrValue findFirstIn s1) foreach (println)
(StrValue findFirstIn s2) foreach (println)

val StrValue(_, ss1) = s1  // XXX Match Error
println(ss1)

val StrValue(ss2) = s2 // XXX Match Error
println(ss2)

Can anyone help me out, please?

Cheers,
--
Normen Müller




--
Daniel C. Sobral

Something I learned in academia: there are three kinds of academic reviews: review by name, review by reference and review by value.
Randall R Schulz
Joined: 2008-12-16,
User offline. Last seen 1 year 29 weeks ago.
Re: Regex question

On Tuesday September 29 2009, Daniel Sobral wrote:
> There are many problems. First, when you do pattern matching such as
> the match errors you indicated, the string must be _exactly_ equal to
> what is matched by findFirstIn. In other words:
>
> (StrValue findFirstIn s1).get == s1
> (StrValue findFirstIn s2).get == s2

That would make the "find" part of the method name very poorly chosen.
You describe a complete match, not a find.

This seems to contradict what you say, though:

scala> val s1 = "123abcXYZ"
s1: java.lang.String = 123abcXYZ

scala> val re1 = "abc".r
re1: scala.util.matching.Regex = abc

scala> re1.findFirstIn(s1)
res0: Option[String] = Some(abc)

> ...
>
> For instance, "abc" can be interpreted as (\w{1}\s{0}){3} or
> (\w{3}\s{0}){1}, or various multiple combinations. You must strive to
> have your patterns have only one possible match.

That is technically true, but REs disambiguate these through "maximum
bite" semantics.

> ...

Randall Schulz

dcsobral
Joined: 2009-04-23,
User offline. Last seen 38 weeks 5 days ago.
Re: Regex question


On Tue, Sep 29, 2009 at 1:20 PM, Randall R Schulz <rschulz [at] sonic [dot] net> wrote:
On Tuesday September 29 2009, Daniel Sobral wrote:
> There are many problems. First, when you do pattern matching such as
> the match errors you indicated, the string must be _exactly_ equal to
> what is matched by findFirstIn. In other words:
>
> (StrValue findFirstIn s1).get == s1
> (StrValue findFirstIn s2).get == s2

That would make the "find" part of the method name very poorly chosen.
You describe a complete match, not a find.
  I'm describing the condition that must hold true for a pattern matching to work.  

This seems to contradict what you say, though:

scala> val s1 = "123abcXYZ"
s1: java.lang.String = 123abcXYZ

scala> val re1 = "abc".r
re1: scala.util.matching.Regex = abc

scala> re1.findFirstIn(s1)
res0: Option[String] = Some(abc)
    scala> val re1 = "(abc)".r
re1: scala.util.matching.Regex = (abc)   scala> (re1 findFirstIn s1).get == s1
res30: Boolean = false   scala> re1.findFirstIn(s1)
res31: Option[String] = Some(abc)   scala> val re1(ss1) = s1
scala.MatchError: 123abcXYZ
        at .<init>(<console>:9)
        at .<clinit>(<console>)
        at RequestResult$.<init>(<console>:4)
        at RequestResult$.<clinit>(<console>)
        at RequestResult$result(<console>)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invo...
scala> val re1(ss1) = "abc"
ss1: String = abc
 


> ...
>
> For instance, "abc" can be interpreted as (\w{1}\s{0}){3} or
> (\w{3}\s{0}){1}, or various multiple combinations. You must strive to
> have your patterns have only one possible match.

That is technically true, but REs disambiguate these through "maximum
bite" semantics.
  Which degenerate into exponential time searches if the match fails, as it backtrack each combination one by one.  


> ...


Randall Schulz





--
Daniel C. Sobral

Something I learned in academia: there are three kinds of academic reviews: review by name, review by reference and review by value.
normen.mueller
Joined: 2008-10-31,
User offline. Last seen 3 years 8 weeks ago.
Re: Regex question

He Daniel,

On Sep 29, 2009, at 5:20 PM, Daniel Sobral wrote:

> StrValue= """^(?:[^"]*")?([^"]*)(?:"[^"]*)?$""".r

that's an very complex regex … I don't understand a word ;)

BUT, it works perfect for me (@thanks to Randall as well … his one was
fine for me as well!)!!! All I want to do is, to strip of potential
quotation marks before or after a string. Between the quotation marks
any character is allowed. I just tested your regex and it does
exactly that: THANK YOU!!!

Cheers,
--
Normen Müller

J Robert Ray
Joined: 2008-12-18,
User offline. Last seen 3 years 16 weeks ago.
Re: Regex question

On Tue, Sep 29, 2009 at 8:20 AM, Daniel Sobral wrote:
>
> val StrValue= """^(?:[^"]*")?([^"]*)(?:"[^"]*)?$""".r

What result is expected with the following inputs?

"r"b"

r"b"

"r"b

"rb""

rb"

"rb

\""rb"

The above regex results with capture group 1:

"r"b" - No match

r"b" - Some(b)

"r"b - Some(r)

"rb"" - No match

rb" - No match

"rb - Some(rb)

\""rb" - Some()

These are all unappealing results to me. I suggest this pattern is
closer to the goal of stripping one leading and/or trailing quote
(plus whitespace):

"""^(?:\s*")?(.*?)(?:"\s*)?$""".r

"r"b" - Some(r"b)

r"b" - Some(r"b)

"r"b - Some(r"b)

"rb"" - Some(rb")

rb" - Some(rb)

"rb - Some(rb)

\""rb" - Some(\""rb)

Requiring the surrounding quotes be balanced (on both sides) gets uglier.

Testing methodology:

scala> """^(?:\s*")?(.*?)(?:"\s*)?$""".r.findFirstMatchIn("\\\"\"rb").map(_.group(1))
res51: Option[String] = Some(\""rb)

normen.mueller
Joined: 2008-10-31,
User offline. Last seen 3 years 8 weeks ago.
Re: Regex question

He Robert,

On Sep 30, 2009, at 8:21 AM, J Robert Ray wrote:
> On Tue, Sep 29, 2009 at 8:20 AM, Daniel Sobral
> wrote:
>>
>> val StrValue= """^(?:[^"]*")?([^"]*)(?:"[^"]*)?$""".r
>
> What result is expected with the following inputs?

Actually, these are very good questions. In my scenario a string can
be quoted or not. If the string is quoted, then the quotes have to be
balanced and I just want to extract the string between the quotes no
matter what characters are in between.

I guess the grammar could be something like this, assuming that
``StringLiteral'' accepts any character.

Name ::= '"' StringLiteral '"' | StringLiteral

> "r"b"
>
> r"b"
>
> "r"b
>
> "rb""
>
> rb"
>
> "rb
>
> \""rb"
>
> The above regex results with capture group 1:
>
> "r"b" - No match
>
> r"b" - Some(b)
>
> "r"b - Some(r)
>
> "rb"" - No match
>
> rb" - No match
>
> "rb - Some(rb)
>
> \""rb" - Some()
>
> These are all unappealing results to me. I suggest this pattern is
> closer to the goal of stripping one leading and/or trailing quote
> (plus whitespace):
>
> """^(?:\s*")?(.*?)(?:"\s*)?$""".r
>
> "r"b" - Some(r"b)
>
> r"b" - Some(r"b)
>
> "r"b - Some(r"b)
>
> "rb"" - Some(rb")
>
> rb" - Some(rb)
>
> "rb - Some(rb)
>
> \""rb" - Some(\""rb)
>
> Requiring the surrounding quotes be balanced (on both sides) gets
> uglier.
>
> Testing methodology:
>
> scala> """^(?:\s*")?(.*?)(?:"\s*)?$""".r.findFirstMatchIn("\\
> \"\"rb").map(_.group(1))
> res51: Option[String] = Some(\""rb)

Cheers,
--
Normen Müller

Seth Tisue
Joined: 2008-12-16,
User offline. Last seen 34 weeks 3 days ago.
Re: Regex question

>>>>> "Normen" == Normen Müller writes:

Normen> Actually, these are very good questions. In my scenario a
Normen> string can be quoted or not. If the string is quoted, then the
Normen> quotes have to be balanced and I just want to extract the
Normen> string between the quotes no matter what characters are in
Normen> between.

dunno if your goal is to improve your understanding of regexes or just
get the job done. if the latter, I think

def unquoted(x: String): String =
if(x.head == '"' && x.last == '"')
x.drop(1).dropRight(1)
else x

is 10x more readable than a regex.

(note: this is 2.8 code, not sure what the most elegant 2.7 version
would be. hooray for the String improvements in 2.8.)

manojo
Joined: 2008-12-22,
User offline. Last seen 3 years 3 weeks ago.
Re: Re: parser combinators vs. regex question

On Thu, May 21, 2009 at 10:44 AM, ArtemGr <artemciy [at] gmail [dot] com> wrote:
ArtemGr <artemciy@...> writes:
As a side note, it would be good if the
implicit def regex(r: Regex): Parser[String]
method in RegexParsers produced a matcher with access to the matched
groups, instead of just a string.

I have also found the need for this, and have come up with the following solution as a makeshift change, as you can define a parser that does this for you :

  def regexMatch(r : Regex) : Parser[Match] = Parser { in => regex(r)(in) match {
    case Success(aString, theRest) => Success(r.findFirstMatchIn(aString).get, theRest)
    case f@Failure(_,_) => f
    case e@Error(_,_) => e
  }}

The other solution is to change the implicit regex method in RegexParsers itself.

I was also wondering if there is a better solution to parsing different groups.

Thanks,
Manohar

Drew
Joined: 2011-12-16,
User offline. Last seen 42 years 45 weeks ago.
Regex Question

Hi Everyone,

I wasn't able to find an example for what I'm trying to do. How can I convert the following Java code to Scala (and make it more Scala like)?

Pattern mapParser = Pattern.compile("\u0000([^:]*):([^\u0000]*)\u0000");
Map map = new LinkedHashMap();
Matcher matcher = mapParser.matcher(decrypted);
while (matcher.find()) {
map.put(matcher.group(1), matcher.group(2));
}

Thanks!

dcsobral
Joined: 2009-04-23,
User offline. Last seen 38 weeks 5 days ago.
Re: Regex Question

On Tue, Jan 10, 2012 at 01:26, Drew Kutcharian wrote:
> Hi Everyone,
>
> I wasn't able to find an example for what I'm trying to do. How can I convert the following Java code to Scala (and make it more Scala like)?
>
> Pattern mapParser = Pattern.compile("\u0000([^:]*):([^\u0000]*)\u0000");
> Map map = new LinkedHashMap();
> Matcher matcher = mapParser.matcher(decrypted);
> while (matcher.find()) {
>        map.put(matcher.group(1), matcher.group(2));
> }

This kind of question works better on codereview.stackexchange.com or
stackoverflow.com. Regardless, here's how you'd do it:

val mapParser = "\u0000([^:]*):([^\u0000]*)\u0000".r
val pairs = for (mapParser(key, value) <- mapParser findAllIn
decrypted) yield key -> value
val map = pairs.toMap

There are some ways in which you can improve the performance, but it
decreases a readability a bit.

Drew
Joined: 2011-12-16,
User offline. Last seen 42 years 45 weeks ago.
Re: Regex Question

Thanks Daniel.

On Jan 10, 2012, at 9:29 AM, Daniel Sobral wrote:

> On Tue, Jan 10, 2012 at 01:26, Drew Kutcharian wrote:
>> Hi Everyone,
>>
>> I wasn't able to find an example for what I'm trying to do. How can I convert the following Java code to Scala (and make it more Scala like)?
>>
>> Pattern mapParser = Pattern.compile("\u0000([^:]*):([^\u0000]*)\u0000");
>> Map map = new LinkedHashMap();
>> Matcher matcher = mapParser.matcher(decrypted);
>> while (matcher.find()) {
>> map.put(matcher.group(1), matcher.group(2));
>> }
>
> This kind of question works better on codereview.stackexchange.com or
> stackoverflow.com. Regardless, here's how you'd do it:
>
> val mapParser = "\u0000([^:]*):([^\u0000]*)\u0000".r
> val pairs = for (mapParser(key, value) <- mapParser findAllIn
> decrypted) yield key -> value
> val map = pairs.toMap
>
> There are some ways in which you can improve the performance, but it
> decreases a readability a bit.
>

Copyright © 2012 École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland