This page is no longer maintained — Please continue to the home page at www.scala-lang.org

Re: Re: XML attribute position

13 replies
normen.mueller
Joined: 2008-10-31,
User offline. Last seen 3 years 8 weeks ago.

On Jan 12, 2010, at 6:06 PM, brian [at] blumenfeld-maso [dot] com wrote:

> Can't you use XML canonicalization of docs -- I believe canonicalized version of an XML doc does have attribute order guarantees.

I'll investigate on that but up to know I have just changed my code to use attribute names rather than the position.

Thank you all very much for your help!

Cheers,
--
Normen Müller

Anthony B. Coates
Joined: 2009-09-12,
User offline. Last seen 2 years 35 weeks ago.
Re: Re: XML attribute position

I do take your point, Normen, that the ability to get the attributes in
original document order is valuable if you want to read in an XML file,
make a minor change, and write it out such that a text diff of before and
after is as minimal as possible.

Cheers, Tony.

On Tue, 12 Jan 2010 17:24:18 -0000, Normen Müller
wrote:

> On Jan 12, 2010, at 6:06 PM, brian [at] blumenfeld-maso [dot] com wrote:
>
>> Can't you use XML canonicalization of docs -- I believe canonicalized
>> version of an XML doc does have attribute order guarantees.
>
> I'll investigate on that but up to know I have just changed my code to
> use attribute names rather than the position.
>
> Thank you all very much for your help!
>
> Cheers,
> --
> Normen Müller

extempore
Joined: 2008-12-17,
User offline. Last seen 35 weeks 3 days ago.
Re: Re: XML attribute position

On Tue, Jan 12, 2010 at 06:04:57PM -0000, Anthony B. Coates (Londata) wrote:
> I do take your point, Normen, that the ability to get the attributes
> in original document order is valuable if you want to read in an XML
> file, make a minor change, and write it out such that a text diff of
> before and after is as minimal as possible.

That right there is exactly what I was talking about. At some point I
resolved to get scala trunk off of ant, and spent a while writing
software to read in build.xml distill the meaningful logic into another
format, and drop 96% of the weight. Enclosed below is what I had to
resort to in order to see the meaningful differences between the input
xml and my regeneration of it from the model. This was necessary
because of attribute reordering and also because I never found a way to
keep it from dropping the comments.

Maybe you're not supposed to use XML as a config file -- yes I'm sure
you're not -- but people do and we're stuck with it. So wrt:

On Tue, Jan 12, 2010 at 04:12:42PM -0000, Anthony B. Coates (Londata) wrote:
> Let's put that in perspective. It is a *huge mistake*, as well as a
> violation of the XML specification, to process XML in a way that
> assigns any information content to the order of the attributes. You
> should simply process attributes by name, not position.

...I find this to be a very ivory tower thing to say. I never use XML
voluntarily, and yet I have to interact with it no some level pretty
much daily. A significant percentage of those XML files are organized
in a way beyond that dictated by the XML spec. There is no good reason
to make it practically impossible to work with these files with scala's
built-in XML support, nor to drive people to set up freaking ant tasks
which run xmlstarlet to canonicalize their files. (Or to learn all the
fiddly and unthrilling details I had to learn to get to that point.)

Mark Howe
Joined: 2009-10-22,
User offline. Last seen 42 years 45 weeks ago.
Re: Re: XML attribute position

Le mardi 12 janvier 2010 à 13:26 -0800, Paul Phillips a écrit :

> On Tue, Jan 12, 2010 at 06:04:57PM -0000, Anthony B. Coates (Londata) wrote:

> > I do take your point, Normen, that the ability to get the attributes
> > in original document order is valuable if you want to read in an XML
> > file, make a minor change, and write it out such that a text diff of
> > before and after is as minimal as possible.

> That right there is exactly what I was talking about.

Isn't that what perl regexes are for? From

http://sites.google.com/site/burakemir/scalaxbook.docbk.html?attredirects=0

< XML is regarded as text. We ignore the tree structure completely. Some
< text/regular expression search is used to retrieve or manipulate
< information. This can get you quite far for small tasks. Go away, use
< perl :-)

The project I'm working on today has a make file that at one point uses
perl regexes to extract information from an XML config file. It's a long
way from lovely, but there are jobs for which a one-line solution is
good enough.

I can't see that optimising Scala XML to appeal to people who don't want
to use XML at all is the best way to end up with Scala XML support that
people want to use.

That aside, I suspect that using the Xerces library from within Scala
would do what you want - not only does the SAX implementation preserve
attribute order, they are indexed by their position in the element. And
this makes using Xerces from within Scala a total pain AFAIC.

extempore
Joined: 2008-12-17,
User offline. Last seen 35 weeks 3 days ago.
Re: Re: XML attribute position

On Tue, Jan 12, 2010 at 10:44:10PM +0100, Mark Howe wrote:
> Isn't that what perl regexes are for?

Um, no. I stubbed my toe on your abstraction bar.

> I can't see that optimising Scala XML to appeal to people who don't
> want to use XML at all is the best way to end up with Scala XML
> support that people want to use.

It's hard not to laugh. What is the alternative, to design for people
who are thrilled to use XML? Decorum prohibits me from articulating how
these interfaces might differ.

Mark Howe
Joined: 2009-10-22,
User offline. Last seen 42 years 45 weeks ago.
Re: Re: XML attribute position

Le mardi 12 janvier 2010 à 14:42 -0800, Paul Phillips a écrit :

> What is the alternative, to design for people who are thrilled to use XML?

Yes, or at least to include people who are thrilled about XML.
Otherwise, while we're about it, why not design Scala for people who
only want to use global variables and goto statements? What's the
alternative - design Scala to do OO and functional programming well? How
mad would that be?!

> Decorum prohibits me from articulating how
> these interfaces might differ.

The second would solve a much wider class of problems. Encouraging
people to, say, rely on the order of attributes isn't going to make for
powerful, portable, elegant code.

The next step would be to provide control over the encoding of line
termination, in case that matters to someone, and maybe whitespace
outside the document element, in case someone uses that to format output
of XML via bash, and the kind of quotation marks used for attributes,
and... all of which would make XML processing messier for everyone,
whether or not they care about any of that stuff.

As I said before, I think Xerces from within Scala will do what you
want, so it's possible to do what you want, with Scala, now.

extempore
Joined: 2008-12-17,
User offline. Last seen 35 weeks 3 days ago.
Re: Re: XML attribute position

On Wed, Jan 13, 2010 at 10:59:34AM +0100, Mark Howe wrote:
> As I said before, I think Xerces from within Scala will do what you
> want, so it's possible to do what you want, with Scala, now.

It's nice that you're telling the guy who has fixed dozens of bugs in
the scala XML implementation (and the only person to touch it in years)
that he shouldn't be able to use it because it's so important to you
that it arbitrarily reorder attributes. Maybe you could find some other
transformations which are technically within spec which we could insist
on doing every time, like randomly injecting whitespace where possible
and making all long tags short and short tags long. Nobody should be
able to rely on anything! We can teach them all a valuable lesson.

Seth Tisue
Joined: 2008-12-16,
User offline. Last seen 34 weeks 3 days ago.
Re: Re: XML attribute position

>>>>> "Mark" == Mark Howe writes:

Mark> The next step would be to provide control over the encoding of
Mark> line termination, in case that matters to someone, and maybe
Mark> whitespace outside the document element, in case someone uses
Mark> that to format output of XML via bash, and the kind of quotation
Mark> marks used for attributes, and...

Pretty soon, we'll be drowning little baby kittens!

Straw man. Nobody proposed any of this stuff. We just want the tag
order preserved.

Mark Howe
Joined: 2009-10-22,
User offline. Last seen 42 years 45 weeks ago.
Re: Re: XML attribute position

Le mercredi 13 janvier 2010 à 04:46 -0800, Paul Phillips a écrit :

> It's nice that you're telling the guy who has fixed dozens of bugs in
> the scala XML implementation (and the only person to touch it in years)

Great! (I'm new here...)

> that he shouldn't be able to use it because it's so important to you
> that it arbitrarily reorder attributes.

It doesn't bother me in the slightest whether a future version of Scala
XML reorders attributes or not. My point was that simply there are good
reasons for not *requiring* XML technology to respect order. I doubt
that there's a randomizing routine messing with your attributes. It's
far more likely that the structures used to store attributes don't
preserve order, and that this implementational decision was taken to
make the code neater, faster, use less memory or something.

> Maybe you could find some other
> transformations which are technically within spec which we could insist
> on doing every time, like randomly injecting whitespace where possible
> and making all long tags short and short tags long. Nobody should be
> able to rely on anything! We can teach them all a valuable lesson.

That does sound like a lot of fun. But the problem you are describing
isn't about actively doing the wrong thing, it's about not actively
doing the right thing by a relatively obscure definition of "right", and
my response is about the possible costs of doing the right thing by that
relatively obscure definition of "right".

I agree that being able to reproduce XML documents on a character by
character basis can sometimes be useful. Personally, I'd quite like on
occasions for XML technology to preserve the non-semantic whitespace
within elements so my manual pretty-printing of multiple attributes
survives the journey through the parser. But doing that moves beyond the
logical structure of XML to treating it like a text document, which is
why my suggestion about Perl was not entirely frivolous.

Most of the recent discussion here has been about how to move to a more
W3C-like way of handling XML. I guess there's no reason why preserving
attribute order shouldn't happen at the same time, as long as it doesn't
have a major effect on performance.

What kind of comparison requires you to preserve attribute order but
isn't amenable to a text-based diff-type solution? If you want to
preserve the character-level information except for specific changes,
doesn't that make character-based matching on the unchanged portions of
the document a relatively simple task? The main reason character-based
parsing of XML documents is a bad idea is that you shouldn't rely on
things like attribute order, whitespace and types of quoting, but if you
are keeping all those things fixed the parsing problem surely becomes a
lot simpler.

Mark Howe
Joined: 2009-10-22,
User offline. Last seen 42 years 45 weeks ago.
Re: Re: XML attribute position

Le mercredi 13 janvier 2010 à 07:15 -0600, Seth Tisue a écrit :
> >>>>> "Mark" == Mark Howe writes:
>
> Mark> The next step would be to provide control over the encoding of
> Mark> line termination, in case that matters to someone, and maybe
> Mark> whitespace outside the document element, in case someone uses
> Mark> that to format output of XML via bash, and the kind of quotation
> Mark> marks used for attributes, and...
>
> Pretty soon, we'll be drowning little baby kittens!
>
> Straw man. Nobody proposed any of this stuff. We just want the tag
> order preserved.

Tag order or attribute order? If tag order is lost, that does sound like
a problem.

If it's attributes, and if it's a straw man, it's a straw man straight
out of appendix D of the W3C infoset spec. Attribute order is point #10,
immediately after line termination in point #9. Whitespace is points #4
and #5... In other words, they are the same kind of issue.

And, as I just posted, I have no problem with preserving attribute order
as long as it doesn't involve a serious performance hit.

Anthony B. Coates
Joined: 2009-09-12,
User offline. Last seen 2 years 35 weeks ago.
Re: XML attribute position

I think there is an important point that is being missed here, and I
regret that I contributed to it. My initial comments about the XML spec
and attribute order were made in advance of knowing what the use case was
for wanting to have a consistent, predictable attribute order after XML
has been parsed. Like others, I mentioned that the XML spec makes it
clear that attribute order should not be used as a source if information
when interpreting the document.

However, the XML spec is only talking about interpretation of information
in XML documents. By contrast, the XML spec does not talk about the needs
of software like XML editing applications. I think it is fair to say that
all XML editing applications have to go beyond the strict requirement of
the XML spec in order to produce a usable tool. Can you imagine if every
time you opened an XML document in an XML editor, the attributes were in a
random order? It would be maddening. We expect a different behaviour of
XML editors to applications that are simply consumers of XML.

So, let me ask this question - should you be able to write an XML editor
in Scala? My answer is yes. At least, I don't see why there should be
anything in Scala that impedes you from using it to write an XML editor.
It would also be useful to be able to read in an XML document, write it
out again, and be able to have confidence that the output XML identical to
the input XML (subject to have appropriate settings).

So, I am definitely in favour of Scala doing predictable things with
attribute order, just to support this class of tools, XML editing
applications.

Cheers, Tony.

On Wed, 13 Jan 2010 09:59:34 -0000, Mark Howe wrote:

> The next step would be to provide control over the encoding of line
> termination, in case that matters to someone, and maybe whitespace
> outside the document element, in case someone uses that to format output
> of XML via bash, and the kind of quotation marks used for attributes,
> and... all of which would make XML processing messier for everyone,
> whether or not they care about any of that stuff.
>
> As I said before, I think Xerces from within Scala will do what you
> want, so it's possible to do what you want, with Scala, now.

Mark Howe
Joined: 2009-10-22,
User offline. Last seen 42 years 45 weeks ago.
Re: Re: XML attribute position

Le mercredi 13 janvier 2010 à 21:39 +0000, Anthony B. Coates (Londata) a
écrit :

> So, let me ask this question - should you be able to write an XML editor
> in Scala? My answer is yes. At least, I don't see why there should be
> anything in Scala that impedes you from using it to write an XML editor.

For an XML editor, the "straw man" examples I mentioned earlier surely
all become important. If I manually insert quirky whitespace like

\x0A\x0D\x09\x0D\x0A

between two elements, my editor should respect that (although I'm not
sure how it would be expected to render the above in terms of new
lines). But when I'm writing typical application code to modify XML I
don't want to have to wade through that level of superfluous detail.
This IMO is one of the frustrations of DOM - it's often hard to see the
"real" data, and you end up writing application code to skip past the
stuff that you wish wasn't there at all.

Does anyone know what internal representation XML editors such as oXygen
use? My hunch is that it looks a bit different to a generic XML parsing
solution.

> So, I am definitely in favour of Scala doing predictable things with
> attribute order, just to support this class of tools, XML editing
> applications.

Yes, but there's a difference between "predictable" and "the same as the
input". Always sorting attributes by prefix and then by local name would
be predictable.

The big plus of the current Scala XML implementation is that it makes
the simple stuff simple. Offhand I can't see why preserving attribute
order for immutable XML representations should make the simple stuff any
more complicated (although I maintain that it might make it slower
and/or bigger).

But making output character-level equivalent to input for XML editor
writers is IMO going to complicate things enormously, both for the
library implementer and the application programmer. For example, if
whitespace is to be preserved on a per-character basis, it suddenly
matters how whitespace is used in the application code, which in some
cases will leave the application programmer fighting the source code
prettyprinter. The much-touted first-class Scala XML representation
starts behaving a lot more like a string.

Also, producing an exact copy of an unmodified file obviously isn't very
useful in practice- you can do that without any XML processing at all.
So, if we are going to grasp this nettle at all, we surely need to think
about what preserving attribute order means when, for example, an
element with 3 attributes is read in and used to produce an output
element that preserves 2 of the 3 initial attributes and adds 2 more. I
suspect that defining the "right" behaviour in all such cases is
non-trivial, and we won't get much help from published standards.

Anthony B. Coates
Joined: 2009-09-12,
User offline. Last seen 2 years 35 weeks ago.
Re: Re: XML attribute position

I don't really get your point. You say that these things would make the
API harder to use, but I don't see why. Nobody has suggested any API
changes, simply suggesting that the implementation could respect document
order than it does. I can't see that having a negative impact on existing
Scala applications.

Cheers, Tony.

On Thu, 14 Jan 2010 08:25:58 -0000, Mark Howe wrote:

> The big plus of the current Scala XML implementation is that it makes
> the simple stuff simple. Offhand I can't see why preserving attribute
> order for immutable XML representations should make the simple stuff any
> more complicated (although I maintain that it might make it slower
> and/or bigger).
>
> But making output character-level equivalent to input for XML editor
> writers is IMO going to complicate things enormously, both for the
> library implementer and the application programmer. For example, if
> whitespace is to be preserved on a per-character basis, it suddenly
> matters how whitespace is used in the application code, which in some
> cases will leave the application programmer fighting the source code
> prettyprinter. The much-touted first-class Scala XML representation
> starts behaving a lot more like a string.

Mark Howe
Joined: 2009-10-22,
User offline. Last seen 42 years 45 weeks ago.
Re: Re: XML attribute position

Le jeudi 14 janvier 2010 à 22:09 +0000, Anthony B. Coates (Londata) a
écrit :

> I don't really get your point. You say that these things would make the
> API harder to use, but I don't see why. Nobody has suggested any API
> changes, simply suggesting that the implementation could respect document
> order than it does. I can't see that having a negative impact on existing
> Scala applications.

As I think I said, preserving attribute order feels to me like it ought
to be possible. I still think that there are some interesting questions
about what that means if the document is used to make another document
(which, with immutable XML, is the only case that anyone cares about -
performing identity transformations is best done by the cp command). For
example, if the original document is

and I use some sort of processing system to remove the b attribute and
add an e and an f attribute, would you say that the "right" result would
be

or

(which preserves the attribute number of both existing attributes) or
something else? Do we need the API to let us control this sort of thing
for all those cases where attribute order is apparently important? Do we
then need to control the attribute order whether we care or not? How
about if it is important to change the attribute order? And so on.

But I can suspend my disbelief for long enough to imagine that
preserving attribute order might be something we can make work
consistently. By contrast, most of the other "not included" items in the
Infoset spec would seem to me to almost inevitably introduce more
clutter to the API.

For example, where does whitespace outside the document element live in
an XDM representation where the document info item is the top level
item? How about the document type name? Whitespace within start tags?
CDATA section boundaries? At least some of that information would surely
require extra info set-type objects within the data structure, which
means that everyone has to do something intelligent with them (even if
it's filtering them out _every time_. And AFAICS you'd need to handle
all that and more in order to have an XML editor that returns any valid
XML document unchanged at the character level.

Copyright © 2012 École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland