This page is no longer maintained — Please continue to the home page at www.scala-lang.org

XML encoding problem.

2 replies
David Brown
Joined: 2009-04-26,
User offline. Last seen 42 years 45 weeks ago.

The following code:

val node = 
val str = scala.xml.Utility.toXML(node)
scala.xml.XML.loadString(str)

Causes an exception:

[Fatal Error] :1:7: An invalid XML character (Unicode: 0x1) was found in the element content of the document.
org.xml.sax.SAXParseException: An invalid XML character (Unicode: 0x1) was found in the element content of the document.

I guess my first question: is there a better way to encode the XML
into a string?

Should the string encoding in Utility be fixed to use a char ref? The
set of valid characters seems fairly complicated from what I can tell,
but perhaps some simple test is available.

Thanks,
David

Alex Cruise
Joined: 2008-12-17,
User offline. Last seen 2 years 26 weeks ago.
Re: XML encoding problem.

David Brown wrote:
> val node = 
> val str = scala.xml.Utility.toXML(node)
> scala.xml.XML.loadString(str)
>
> Causes an exception:
>
> [Fatal Error] :1:7: An invalid XML character (Unicode: 0x1) was
> found in the element content of the document.
> org.xml.sax.SAXParseException: An invalid XML character (Unicode:
> 0x1) was found in the element content of the document.
>
> Should the string encoding in Utility be fixed to use a char ref? The
> set of valid characters seems fairly complicated from what I can tell,
> but perhaps some simple test is available.
XML 1.0 doesn't permit most control characters at all, either as
literals or entities: http://www.w3.org/TR/xml/#charsets

XML 1.1 is less restrictive in this area but I wouldn't bet on your
document living for long in the wild. :)

-0xe1a

David Brown
Joined: 2009-04-26,
User offline. Last seen 42 years 45 weeks ago.
Re: XML encoding problem.

On Sun, May 24, 2009 at 11:38:01PM -0700, Alex Cruise wrote:

>> Should the string encoding in Utility be fixed to use a char ref? The
>> set of valid characters seems fairly complicated from what I can tell,
>> but perhaps some simple test is available.
> XML 1.0 doesn't permit most control characters at all, either as literals
> or entities: http://www.w3.org/TR/xml/#charsets
>
> XML 1.1 is less restrictive in this area but I wouldn't bet on your
> document living for long in the wild. :)

Thanks,

I've changed my XML to allow these blocks to be encoded in base64.
I scan the string for control characters, and encode the whole
thing if it has control characters.

David

Copyright © 2012 École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland