This page is no longer maintained — Please continue to the home page at www.scala-lang.org

Looking for comments/suggestions on newbie code

3 replies
Kenneth McDonald
Joined: 2009-01-11,
User offline. Last seen 42 years 45 weeks ago.

As a learning exercise, I'm putting together a little library that
allows one to easily build regular expressions, piece by piece. It
also allows one to name groups as the regular expression is defined,
and to extract from a match based on group name. It's very incomplete,
and as yet has no testing code, but I thought I would put it out
because:

1) It might be useful to somebody.
2) I'm looking for comments on improving my Scala coding.

Please don't hesitate to be critical, I won't be offended.

Thanks,
Ken

Code below. Look in the "Main" object for an example of how things are
used. Sorry for the lack of commenting otherwise, but the mapping
between functions and the re's they build is fairly straightforward.

----------------------------------------
package rex

import scala.util.matching.Regex
import scala.collection.immutable.HashMap

protected object Matcher {
def anonGroup(m:Matcher) = "(?:" + m.pattern + ")"
}

/** A class to make it easier to build and use regular expressions */
protected class Matcher(string:String) {
val regex = new Regex(string)
//---------------------------
def nameToGroupNumber = Map[String, Int]()
def groupCount = 0

def pattern = regex.pattern.pattern()

def +(other:Matcher) = new BinopMatcher(this, "", other)
def +(other:String):Matcher = this + Lit(other)

def |(other:Matcher) = new BinopMatcher(this, "|", other)
def |(other:String):Matcher = this | Lit(other)

def *(count:Int) = new Matcher(anonGroup + "{" + count + ",}")
def *?(count:Int) = new Matcher(anonGroup + "{" + count + ",}?")
def *+(count:Int) = new Matcher(anonGroup + "{" + count + ",}+")
def *(count:Tuple2[Int,Int]) = new Matcher(anonGroup + "{" +
count._1 + "," + count._2 + "}")
def *?(count:Tuple2[Int,Int]) = new Matcher(anonGroup + "{" +
count._1 + "," + count._2 + "}?")
def *+(count:Tuple2[Int,Int]) = new Matcher(anonGroup + "{" +
count._1 + "," + count._2 + "}+")

/** Make this pattern into a named group; the contents of the
group,
from a match, will be retrieved by its name, not by a number. */
def group(name:String) = new GroupMatcher(this, name)

def replaceAllIn(target:String, replacement:String) =
regex.replaceAllIn(target, replacement)

def findFirst(target:String) : Option[MatchResult] = {
regex.findFirstMatchIn(target) match {
case None => None
case Some(m) => Some(new MatchResult(m, this))
}
}

private def anonGroup = Matcher.anonGroup(this)

def iterMatches(target:String) =
new MatchResultIterator(regex.findAllIn(target).matchData,
this)
}

protected class BinopMatcher(val m1:Matcher, op:String, val
m2:Matcher) extends
Matcher(Matcher.anonGroup(m1) + op + Matcher.anonGroup(m2)) {

/** Number of non-anonymous (counting) groups in this regular
expression. */
override def groupCount = m1.groupCount + m2.groupCount

/** Given a group name, return its corresponding group number.

@see Matcher.group(name)
*/
override def nameToGroupNumber = m1.nameToGroupNumber ++
m2.nameToGroupNumber.map(x => (x._1, x._2 + m1.groupCount))
}

protected class GroupMatcher(pat:Matcher, val name:String) extends
Matcher("("+pat.pattern+")") {
override val nameToGroupNumber:Map[String, Int] = Map(name -> 1) ++
pat.nameToGroupNumber.map(x => (x._1, x._2 + 1))

override def groupCount = 1 + pat.groupCount
}

case class Lit(lit:String) extends
Matcher(java.util.regex.Pattern.quote(lit))

protected case class RawCharSet(in:String, notIn:String) extends
Matcher("[" + in + (if (notIn =="") "" else "&&[^" + notIn + "]")
+ "]")

case class CharSet(set:String) extends Matcher("[" + set + "]") {
/*def -(notin:String) = {
this(set + "&&[^" + notin + "]")
require(notin != "")
}*/
}
case class CharRange(start:Char, end:Char) extends Matcher("[" + start
+ "-" + end + "]")

class MatchResult(val m:Regex.Match, val matcher:Matcher) {

/** Retrieve a group by its name */
def group(name:String) = m.group(matcher.nameToGroupNumber(name))
}

class MatchResultIterator(val matches:Iterator[Regex.Match], val
matcher:Matcher) {
def hasNext = matches.hasNext
def next = matches.next
}

object Main {
def main(args: Array[String]) = {
// A positive integer is one or more digits
val posIntMatcher = CharSet("0-9")*1
// an integer is a positive int with an optional "-" in front of it
val intMatcher = Lit("-")*(0,1) + posIntMatcher
// A float is an int followed by a decimal followed by a positive
int.
val floatMatcher = intMatcher + (Lit(".") + posIntMatcher)*(1,1)
// A complex is a float followed by a + or - followed by a float,
followed by an "i"
// The two numeric parts and the sign are named for access.
val complexMatcher = floatMatcher.group("re") + (Lit("-")|
Lit("+")).group("sign") + floatMatcher.group("im") + "i"

// Print out the pattern. Look at all those ?:'s!
println(complexMatcher.pattern)

/* Match against a floating-point complex number and print the
result. */
println(complexMatcher.findFirst("3.2+4.5i") match {
case None => None
case Some(m) => m.group("re") + " " + m.group("sign") + " " +
m.group("im") + "i"
})
}
}

Warren Henning
Joined: 2008-12-31,
User offline. Last seen 42 years 45 weeks ago.
Re: Looking for comments/suggestions on newbie code

This is pretty neat code. Various Scheme implementations have stuff
like this where they kind of embed regexes into the language rather
than just type long strings of line noise. What license is it under?

On Tue, Jan 20, 2009 at 1:20 PM, Kenneth McDonald
wrote:
> As a learning exercise, I'm putting together a little library that allows
> one to easily build regular expressions, piece by piece. It also allows one
> to name groups as the regular expression is defined, and to extract from a
> match based on group name. It's very incomplete, and as yet has no testing
> code, but I thought I would put it out because:

Kenneth McDonald
Joined: 2009-01-11,
User offline. Last seen 42 years 45 weeks ago.
Re: Looking for comments/suggestions on newbie code
Thanks for the feedback. It's in the public domain, with the single exception
that if you use it for something, give me credit somewhere. (So it's not really
in the public domain after all :-), but nearly so.)

If anyone wants to make contributions, I'll be happy to include them if it's
reasonable to do so. I plan to continue working on this as time permits.

Now we just need more people looking for Scala programmers :-)


Thanks,
Ken

On Jan 20, 2009, at 3:27 PM, Warren Henning wrote:

This is pretty neat code. Various Scheme implementations have stuff
like this where they kind of embed regexes into the language rather
than just type long strings of line noise. What license is it under?

On Tue, Jan 20, 2009 at 1:20 PM, Kenneth McDonald
<kenneth [dot] m [dot] mcdonald [at] sbcglobal [dot] net> wrote:
As a learning exercise, I'm putting together a little library that allows
one to easily build regular expressions, piece by piece. It also allows one
to name groups as the regular expression is defined, and to extract from a
match based on group name. It's very incomplete, and as yet has no testing
code, but I thought I would put it out because:

On Jan 20, 2009, at 3:27 PM, Warren Henning wrote:
This is pretty neat code. Various Scheme implementations have stuff
like this where they kind of embed regexes into the language rather
than just type long strings of line noise. What license is it under?

On Tue, Jan 20, 2009 at 1:20 PM, Kenneth McDonald
<kenneth [dot] m [dot] mcdonald [at] sbcglobal [dot] net> wrote:
As a learning exercise, I'm putting together a little library that allows
one to easily build regular expressions, piece by piece. It also allows one
to name groups as the regular expression is defined, and to extract from a
match based on group name. It's very incomplete, and as yet has no testing
code, but I thought I would put it out because:

Viktor Klang
Joined: 2008-12-17,
User offline. Last seen 1 year 27 weeks ago.
Re: Looking for comments/suggestions on newbie code
While I like the idea of a more "understandable" regex syntax, I'd so looooove a program that takes a regex-string and converts it to English.

'^str$' = A line of text which only includes 'str'

Imagine the endless possibilities of utter lols

Cheers,
Viktor

On Tue, Jan 20, 2009 at 10:27 PM, Warren Henning <warren [dot] henning [at] gmail [dot] com> wrote:
This is pretty neat code. Various Scheme implementations have stuff
like this where they kind of embed regexes into the language rather
than just type long strings of line noise. What license is it under?

On Tue, Jan 20, 2009 at 1:20 PM, Kenneth McDonald
<kenneth [dot] m [dot] mcdonald [at] sbcglobal [dot] net> wrote:
> As a learning exercise, I'm putting together a little library that allows
> one to easily build regular expressions, piece by piece. It also allows one
> to name groups as the regular expression is defined, and to extract from a
> match based on group name. It's very incomplete, and as yet has no testing
> code, but I thought I would put it out because:



--
Viktor Klang
Senior Systems Analyst

Copyright © 2012 École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland