This page is no longer maintained — Please continue to the home page at www.scala-lang.org

[scala-bts] #3286: scala.xml.PrettyPrinter changes attribute values by removing multiple whitespace

3 replies
Scala 2
Joined: 2009-03-05,
User offline. Last seen 42 years 45 weeks ago.

whitespace
-------------------------------------------+--------------------------------
Reporter: nikolaj | Owner: scala-xml_team
Type: defect | Status: new
Priority: normal | Component: XML support
Keywords: PrettyPrinter, xml, whitespace |
-------------------------------------------+--------------------------------
{{{scala.xml.PrettyPrinter}}} seems to change the values of attributes in
some instances, by replacing repeated white space. Not always, though.

Notice in the example below how
{{{

}}}
turns into
{{{

}}}
after {{{PrettyPrinting}}}:
{{{
Welcome to Scala version 2.8.0.Beta1-prerelease (Java HotSpot(TM) Client
VM, Java 1.6.0_16).
Type in expressions to have them evaluated.
Type :help for more information.

scala>
res0: scala.xml.Elem =

scala> new xml.PrettyPrinter(200, 2)
res1: scala.xml.PrettyPrinter = scala.xml.PrettyPrinter@1886a34

scala> res1.format(res0)
res2: String =

}}}

Crazy width and indentation nukes the multiple whitespaces in the
attributes of both nodes:
{{{
scala> new xml.PrettyPrinter(2, 20)
res8: scala.xml.PrettyPrinter = scala.xml.PrettyPrinter@1f0f0c8

scala> res8.format(res0)
res9: String =

}}}
We ran into this problem when checking whether some XML attributes were
identical to the original input.

I guess you should be able to trust {{{Pretty Mr Printer}}} not to change
the values of any attributes?

Since I'm far from an expert in XML, I might be wrong about what is the
correct way of treating whitespace inside attribute values. Sorry if this
already works according to the XML specs.

Kind regards,

/nikolaj lindberg

Anthony B. Coates
Joined: 2009-09-12,
User offline. Last seen 2 years 35 weeks ago.
Re: [scala-bts] #3286: scala.xml.PrettyPrinter changes attribut

The XML Information Set spec (http://www.w3.org/TR/xml-infoset/, Appendix
B) says

4. An XML processor must normalize the value of attributes according to
the rules in clause 3.3.3 before passing them to the application.

As such, it is certainly a mistake for an application to rely on
whitespace in attributes being preserved. If you need whitespace to be
preserved, use element content, not attribute content.

The question of whether a PrettyPrinter should perform this normalization
of whitespace could go either way, depending on whether you consider it to
be a "processor" or not. My personal view is that normalizing attribute
content is appropriate, since (in my view) one thing a PrettyPrinter
should allow you to do is format an XML document consistently so that you
can compare it with a document with the same or similar content. Since
applications should never rely on attribute whitespace being preserved, it
is fair then for a PrettyPrinter to normalize the attribute content to
facilitate comparison.

Cheers, Tony.

On Tue, 13 Apr 2010 17:21:34 +0100, Scala wrote:

> #3286: scala.xml.PrettyPrinter changes attribute values by removing
> multiple
> whitespace
> -------------------------------------------+--------------------------------
> Reporter: nikolaj | Owner: scala-xml_team
> Type: defect | Status: new
> Priority: normal | Component: XML support
> Keywords: PrettyPrinter, xml, whitespace |
> -------------------------------------------+--------------------------------
> {{{scala.xml.PrettyPrinter}}} seems to change the values of attributes
> in
> some instances, by replacing repeated white space. Not always, though.
>
> Notice in the example below how
> {{{
>
> }}}
> turns into
> {{{
>
> }}}
> after {{{PrettyPrinting}}}:
> {{{
> Welcome to Scala version 2.8.0.Beta1-prerelease (Java HotSpot(TM) Client
> VM, Java 1.6.0_16).
> Type in expressions to have them evaluated.
> Type :help for more information.
>
> scala>
> res0: scala.xml.Elem = A">
>
> scala> new xml.PrettyPrinter(200, 2)
> res1: scala.xml.PrettyPrinter = scala.xml.PrettyPrinter@1886a34
>
> scala> res1.format(res0)
> res2: String =
>
>
>
> }}}
>
>
> Crazy width and indentation nukes the multiple whitespaces in the
> attributes of both nodes:
> {{{
> scala> new xml.PrettyPrinter(2, 20)
> res8: scala.xml.PrettyPrinter = scala.xml.PrettyPrinter@1f0f0c8
>
> scala> res8.format(res0)
> res9: String =
>
> }}}
> We ran into this problem when checking whether some XML attributes were
> identical to the original input.
>
> I guess you should be able to trust {{{Pretty Mr Printer}}} not to
> change
> the values of any attributes?
>
> Since I'm far from an expert in XML, I might be wrong about what is the
> correct way of treating whitespace inside attribute values. Sorry if
> this
> already works according to the XML specs.
>
> Kind regards,
>
> /nikolaj lindberg
>

nikolaj
Joined: 2008-11-11,
User offline. Last seen 3 years 4 weeks ago.
Re: [scala-bts] #3286: scala.xml.PrettyPrinter changes attribu
Tony,

thanks for the clarification. (I did take a quick look at the spec you refer to, but I was not sure whether it was relevant in this case or not. That is, I didn't quite get it...)

Thanks,
/nikolaj



On Wed, Apr 14, 2010 at 9:39 PM, Anthony B. Coates (Londata) <abcoates [at] londata [dot] com> wrote:
The XML Information Set spec (http://www.w3.org/TR/xml-infoset/, Appendix B) says

4. An XML processor must normalize the value of attributes according to the rules in clause 3.3.3 before passing them to the application.

As such, it is certainly a mistake for an application to rely on whitespace in attributes being preserved.  If you need whitespace to be preserved, use element content, not attribute content.

The question of whether a PrettyPrinter should perform this normalization of whitespace could go either way, depending on whether you consider it to be a "processor" or not.  My personal view is that normalizing attribute content is appropriate, since (in my view) one thing a PrettyPrinter should allow you to do is format an XML document consistently so that you can compare it with a document with the same or similar content.  Since applications should never rely on attribute whitespace being preserved, it is fair then for a PrettyPrinter to normalize the attribute content to facilitate comparison.

Cheers, Tony.

On Tue, 13 Apr 2010 17:21:34 +0100, Scala <scala-devel [at] epfl [dot] ch> wrote:

#3286: scala.xml.PrettyPrinter changes attribute values by removing multiple
whitespace
-------------------------------------------+--------------------------------
Reporter:  nikolaj                         |       Owner:  scala-xml_team
   Type:  defect                          |      Status:  new
Priority:  normal                          |   Component:  XML support
Keywords:  PrettyPrinter, xml, whitespace  |
-------------------------------------------+--------------------------------
 {{{scala.xml.PrettyPrinter}}} seems to change the values of attributes in
 some instances, by replacing repeated white space. Not always, though.

 Notice in the example below how
 {{{
 <babba orth="B    A"/>
 }}}
 turns into
 {{{
 <babba orth="B A"></babba>
 }}}
 after {{{PrettyPrinting}}}:
 {{{
 Welcome to Scala version 2.8.0.Beta1-prerelease (Java HotSpot(TM) Client
 VM, Java 1.6.0_16).
 Type in expressions to have them evaluated.
 Type :help for more information.

 scala> <abba orth="A    B"><babba orth="B    A"/></abba>
 res0: scala.xml.Elem = <abba orth="A    B"><babba orth="B
 A"></babba></abba>

 scala> new xml.PrettyPrinter(200, 2)
 res1: scala.xml.PrettyPrinter = scala.xml.PrettyPrinter@1886a34

 scala> res1.format(res0)
 res2: String =
 <abba orth="A    B">
  <babba orth="B A"></babba>
 </abba>
 }}}


 Crazy width and indentation nukes the multiple whitespaces in the
 attributes of both nodes:
 {{{
 scala> new xml.PrettyPrinter(2, 20)
 res8: scala.xml.PrettyPrinter = scala.xml.PrettyPrinter@1f0f0c8

 scala> res8.format(res0)
 res9: String =
 <abba orth="A B"><babba orth="B A"></babba></abba>
 }}}
 We ran into this problem when checking whether some XML attributes were
 identical to the original input.

 I guess you should be able to trust {{{Pretty Mr Printer}}} not to change
 the values of any attributes?

 Since I'm far from an expert in XML, I might be wrong about what is the
 correct way of treating whitespace inside attribute values.  Sorry if this
 already works according to the XML specs.

 Kind regards,

 /nikolaj lindberg



nikolaj
Joined: 2008-11-11,
User offline. Last seen 3 years 4 weeks ago.
Re: [scala-bts] #3286: scala.xml.PrettyPrinter changes attribu


On Wed, Apr 14, 2010 at 9:39 PM, Anthony B. Coates (Londata) <abcoates [at] londata [dot] com> wrote:
The XML Information Set spec (http://www.w3.org/TR/xml-infoset/, Appendix B) says

4. An XML processor must normalize the value of attributes according to the rules in clause 3.3.3 before passing them to the application.

As such, it is certainly a mistake for an application to rely on whitespace in attributes being preserved.  If you need whitespace to be preserved, use element content, not attribute content.



PS, it seems as if PrettyPrinter, at least as a deafult, normalize whitespace in content also:

scala> <a b=" I    love     space  "><c d="   Me      to    ">    It's     lonely        here       </c></a>   
res0: scala.xml.Elem = <a b=" I    love     space  "><c d="   Me      to    ">    It's     lonely        here       </c></a>

scala> new xml.PrettyPrinter(200, 2)
res1: scala.xml.PrettyPrinter = scala.xml.PrettyPrinter@197d09f

scala> res1.format(res0)
res2: String =
<a b=" I    love     space  ">
  <c d=" Me to "> It's lonely here </c>
</a>

The lesson (for me) is to stay away from PrettyPrinter.

Kind regards,
/nikolaj




 

The question of whether a PrettyPrinter should perform this normalization of whitespace could go either way, depending on whether you consider it to be a "processor" or not.  My personal view is that normalizing attribute content is appropriate, since (in my view) one thing a PrettyPrinter should allow you to do is format an XML document consistently so that you can compare it with a document with the same or similar content.  Since applications should never rely on attribute whitespace being preserved, it is fair then for a PrettyPrinter to normalize the attribute content to facilitate comparison.

Cheers, Tony.

On Tue, 13 Apr 2010 17:21:34 +0100, Scala <scala-devel [at] epfl [dot] ch> wrote:

#3286: scala.xml.PrettyPrinter changes attribute values by removing multiple
whitespace
-------------------------------------------+--------------------------------
Reporter:  nikolaj                         |       Owner:  scala-xml_team
   Type:  defect                          |      Status:  new
Priority:  normal                          |   Component:  XML support
Keywords:  PrettyPrinter, xml, whitespace  |
-------------------------------------------+--------------------------------
 {{{scala.xml.PrettyPrinter}}} seems to change the values of attributes in
 some instances, by replacing repeated white space. Not always, though.

 Notice in the example below how
 {{{
 <babba orth="B    A"/>
 }}}
 turns into
 {{{
 <babba orth="B A"></babba>
 }}}
 after {{{PrettyPrinting}}}:
 {{{
 Welcome to Scala version 2.8.0.Beta1-prerelease (Java HotSpot(TM) Client
 VM, Java 1.6.0_16).
 Type in expressions to have them evaluated.
 Type :help for more information.

 scala> <abba orth="A    B"><babba orth="B    A"/></abba>
 res0: scala.xml.Elem = <abba orth="A    B"><babba orth="B
 A"></babba></abba>

 scala> new xml.PrettyPrinter(200, 2)
 res1: scala.xml.PrettyPrinter = scala.xml.PrettyPrinter@1886a34

 scala> res1.format(res0)
 res2: String =
 <abba orth="A    B">
  <babba orth="B A"></babba>
 </abba>
 }}}


 Crazy width and indentation nukes the multiple whitespaces in the
 attributes of both nodes:
 {{{
 scala> new xml.PrettyPrinter(2, 20)
 res8: scala.xml.PrettyPrinter = scala.xml.PrettyPrinter@1f0f0c8

 scala> res8.format(res0)
 res9: String =
 <abba orth="A B"><babba orth="B A"></babba></abba>
 }}}
 We ran into this problem when checking whether some XML attributes were
 identical to the original input.

 I guess you should be able to trust {{{Pretty Mr Printer}}} not to change
 the values of any attributes?

 Since I'm far from an expert in XML, I might be wrong about what is the
 correct way of treating whitespace inside attribute values.  Sorry if this
 already works according to the XML specs.

 Kind regards,

 /nikolaj lindberg



Copyright © 2012 École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland