This page is no longer maintained — Please continue to the home page at www.scala-lang.org

Enhancement Proposal: Regex Replace

3 replies
dcsobral
Joined: 2009-04-23,
User offline. Last seen 38 weeks 5 days ago.
For a long while I have been considering ways in which Scala's Regex is non-intuitive, and way in which it could be improved. While I have many thoughts in this regard, I'd like to discuss one specific point: replacing matches.
Right now, there are two ways to do it from a Regex object: replaceAllIn and replaceFirstIn. Both take two parameters: the string in which to replace stuff, and the string that will replace the match.
Scala can do much better than that, however. If we overload these methods to accept a function or partial function, we greatly enhance their usefulness. I can think of the following possibilities for the type signatures of the new methods:
1. (String, String => String) => String2. (String, String => Option[String]) => String3. (String, PartialFunction[String, String]) => String4. (String, Match => String) => String 5. (String, Match => Option[String]) => String6. (String, PartialFunction[Match, String]) => String
The advantage of receiving a PartialFunction is that one can then simply do regex.replaceAllIn(string, map), and have it do the right thing. The main disadvantage is that one can't, then, pass a Function to it. And if both function and partial function versions are defined, we loose type inference for function literals.
The options 4 through 6 are just the same as the options 1 through 3, but passing a Match instead of a String, which would enable one to extract more information from the match. One can't mix them because the erasure is the same, but there can be replaceAllMatchesIn/replaceFirstMatchIn versions providing them.
I'd much like to know what people think of these methods, what would be the better choice for type signature, etc.
There's one more method I'd like to add -- in fact, I even went so far as open an enhancement ticket for it (https://lampsvn.epfl.ch/trac/scala/ticket/2761), which is adding a "replace" method to Match. Simply put, if I got a Match object, from either a findFirstMatchIn or a findAllIn(...).matchData, and I call a "replace(string)" on it, it returns the original string with that match replaced. If used together with a MatchIterator, each call would return the string with all the replacements made so far, plus the new one.
I think these methods would greatly enhance the usability of Scala's Regex library for substitution. 
--
Daniel C. Sobral

I travel to the future all the time.
Justin du coeur
Joined: 2009-03-04,
User offline. Last seen 42 years 45 weeks ago.
Re: Enhancement Proposal: Regex Replace
I haven't used Regex much yet, so this is mostly a general opinion, but:
On Fri, Dec 4, 2009 at 6:15 AM, Daniel Sobral <dcsobral [at] gmail [dot] com> wrote:
Right now, there are two ways to do it from a Regex object: replaceAllIn and replaceFirstIn. Both take two parameters: the string in which to replace stuff, and the string that will replace the match.
Scala can do much better than that, however. If we overload these methods to accept a function or partial function, we greatly enhance their usefulness.

Hear, hear.  In the languages that support it, I've often found the ability to pass a function into regex-replacement *enormously* useful -- it tends to produce far cleaner code for a bunch of complex scenarios...
Kris Nuttycombe
Joined: 2009-01-16,
User offline. Last seen 42 years 45 weeks ago.
Re: Enhancement Proposal: Regex Replace

On Fri, Dec 4, 2009 at 4:15 AM, Daniel Sobral wrote:
> For a long while I have been considering ways in which Scala's Regex is
> non-intuitive, and way in which it could be improved. While I have many
> thoughts in this regard, I'd like to discuss one specific point: replacing
> matches.
> Right now, there are two ways to do it from a Regex object: replaceAllIn and
> replaceFirstIn. Both take two parameters: the string in which to replace
> stuff, and the string that will replace the match.
> Scala can do much better than that, however. If we overload these methods to
> accept a function or partial function, we greatly enhance their usefulness.
> I can think of the following possibilities for the type signatures of the
> new methods:
> 1. (String, String => String) => String
> 2. (String, String => Option[String]) => String
> 3. (String, PartialFunction[String, String]) => String
> 4. (String, Match => String) => String
> 5. (String, Match => Option[String]) => String
> 6. (String, PartialFunction[Match, String]) => String
> The advantage of receiving a PartialFunction is that one can then simply do
> regex.replaceAllIn(string, map), and have it do the right thing. The main
> disadvantage is that one can't, then, pass a Function to it. And if both
> function and partial function versions are defined, we loose type inference
> for function literals.
> The options 4 through 6 are just the same as the options 1 through 3, but
> passing a Match instead of a String, which would enable one to extract more
> information from the match. One can't mix them because the erasure is the
> same, but there can be replaceAllMatchesIn/replaceFirstMatchIn versions
> providing them.
> I'd much like to know what people think of these methods, what would be the
> better choice for type signature, etc.
> There's one more method I'd like to add -- in fact, I even went so far as
> open an enhancement ticket for it
> (https://lampsvn.epfl.ch/trac/scala/ticket/2761), which is adding a
> "replace" method to Match. Simply put, if I got a Match object, from either
> a findFirstMatchIn or a findAllIn(...).matchData, and I call a
> "replace(string)" on it, it returns the original string with that match
> replaced. If used together with a MatchIterator, each call would return the
> string with all the replacements made so far, plus the new one.
> I think these methods would greatly enhance the usability of Scala's Regex
> library for substitution.

This suggestion makes a lot of sense to me, and the solution would be
obvious if Function <: PartialFunction instead of the other way
around. As it stands currently, maybe use signature 5 and then provide
an implicit conversion in predef (or wherever) PartialFunction[A,B] =>
Function1[A, Option[B]]?

Kris

dcsobral
Joined: 2009-04-23,
User offline. Last seen 38 weeks 5 days ago.
Re: Enhancement Proposal: Regex Replace
Ok, taking this thread back again, in face of a recent commit that brought us options 1 and 4, I'd like to consider a name for option 5. Some thoughts, all receiving a function Match => Option[String] (which partial functions can now be lifted to, btw):   replaceSelectively replaceOption replaceSome replaceWhere replaceIf       // yuck! doesn't match at all with Option replaceMap  // yuck 2!

On Fri, Dec 4, 2009 at 3:09 PM, Kris Nuttycombe <kris [dot] nuttycombe [at] gmail [dot] com> wrote:
On Fri, Dec 4, 2009 at 4:15 AM, Daniel Sobral <dcsobral [at] gmail [dot] com> wrote:
> For a long while I have been considering ways in which Scala's Regex is
> non-intuitive, and way in which it could be improved. While I have many
> thoughts in this regard, I'd like to discuss one specific point: replacing
> matches.
> Right now, there are two ways to do it from a Regex object: replaceAllIn and
> replaceFirstIn. Both take two parameters: the string in which to replace
> stuff, and the string that will replace the match.
> Scala can do much better than that, however. If we overload these methods to
> accept a function or partial function, we greatly enhance their usefulness.
> I can think of the following possibilities for the type signatures of the
> new methods:
> 1. (String, String => String) => String
> 2. (String, String => Option[String]) => String
> 3. (String, PartialFunction[String, String]) => String
> 4. (String, Match => String) => String
> 5. (String, Match => Option[String]) => String
> 6. (String, PartialFunction[Match, String]) => String
> The advantage of receiving a PartialFunction is that one can then simply do
> regex.replaceAllIn(string, map), and have it do the right thing. The main
> disadvantage is that one can't, then, pass a Function to it. And if both
> function and partial function versions are defined, we loose type inference
> for function literals.
> The options 4 through 6 are just the same as the options 1 through 3, but
> passing a Match instead of a String, which would enable one to extract more
> information from the match. One can't mix them because the erasure is the
> same, but there can be replaceAllMatchesIn/replaceFirstMatchIn versions
> providing them.
> I'd much like to know what people think of these methods, what would be the
> better choice for type signature, etc.
> There's one more method I'd like to add -- in fact, I even went so far as
> open an enhancement ticket for it
> (https://lampsvn.epfl.ch/trac/scala/ticket/2761), which is adding a
> "replace" method to Match. Simply put, if I got a Match object, from either
> a findFirstMatchIn or a findAllIn(...).matchData, and I call a
> "replace(string)" on it, it returns the original string with that match
> replaced. If used together with a MatchIterator, each call would return the
> string with all the replacements made so far, plus the new one.
> I think these methods would greatly enhance the usability of Scala's Regex
> library for substitution.

This suggestion makes a lot of sense to me, and the solution would be
obvious if Function <: PartialFunction instead of the other way
around. As it stands currently, maybe use signature 5 and then provide
an implicit conversion in predef (or wherever) PartialFunction[A,B] =>
Function1[A, Option[B]]?

Kris



--
Daniel C. Sobral

I travel to the future all the time.

Copyright © 2012 École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland