This page is no longer maintained — Please continue to the home page at www.scala-lang.org

First draft of wiki text for 'scala-xml' project in Scala Incubator

6 replies
Anthony B. Coates
Joined: 2009-09-12,
User offline. Last seen 2 years 35 weeks ago.

I have written some 1st-draft text for the wiki for the 'scala-xml'
project in the Scala incubator.

http://wiki.github.com/scala-incubator/scala-xml

Comments, etc. would be very welcome.

Thanks, Cheers, Tony.

Tony Graham
Joined: 2009-12-28,
User offline. Last seen 42 years 45 weeks ago.
Re: First draft of wiki text for 'scala-xml' project in Scala I

On Sat, Dec 26 2009 15:25:02 +0000, abcoates [at] londata [dot] com wrote:
> I have written some 1st-draft text for the wiki for the 'scala-xml'
> project in the Scala incubator.
>
> http://wiki.github.com/scala-incubator/scala-xml
>
> Comments, etc. would be very welcome.

- The sentence beginning "If you run ‘scala’" doesn't scan because of
the "and" in "and the result is".

- You could add "currently" in "However, there are no Scala
implementations of XSLT or XQuery."

- "you currently would use have to use Scala’s ability" could lose
"use" or "have to use", and possibly "currently" if you add it to the
previous sentence.

- "full or partially re-implement the underlying functionality in
Scala." sometimes sounds to me that "underlying functionality" means
just re-implementing whatever's in Java. Maybe "underlying
functionality" could be "technology" since you talk about
technologies, including some that aren't standardised by the W3C, in
the preceding list.

If so, the "functionality" in the following paragraph would probably
have to also change to "technology".

- "a separate read-only API that provides some performance advantages"
could be "a separate read-only API, e.g., for higher performance"
since there may be other reasons for wanting a read-only API.

- In "Is your API built on your language’s existing
sequence/list/tree/etc. data structures", isn't "your language"
Scala?

- Does one "implement XPaths" or either "support XPaths" or "implement
XPath support"?

- Another question to ask in whether or not the XPath support is a
complete implementation of the appropriate version, since some may
consider that you can do enough with location path steps and numeric
predicates or may not want to bother implementing all of the
functions defined in the chosen version of the spec.

- Is it that XPath 2.0 contains much of XQuery 1.0 or that XPath 2.0
has a lot in common with XQuery 1.0? After all, neither XQuery 1.0
nor XPath 2.0 normatively depends on the other.

- You can alternatively implement XSLT 2.0 without implementing XQuery
1.0.

Regards,

Tony Graham Tony [dot] Graham [at] MenteithConsulting [dot] com
Director W3C XSL FO SG Invited Expert
Menteith Consulting Ltd XML Guild member
XML, XSL and XSLT consulting, programming and training
Registered Office: 13 Kelly's Bay Beach, Skerries, Co. Dublin, Ireland
Registered in Ireland - No. 428599 http://www.menteithconsulting.com

Anthony B. Coates
Joined: 2009-09-12,
User offline. Last seen 2 years 35 weeks ago.
Re: First draft of wiki text for 'scala-xml' project in Scala I

Thanks, Tony! I've tried to address all of those points (updated version
at http://wiki.github.com/scala-incubator/scala-xml).

With regards to the question about how XQuery 1.0 relates to XPath 2.0, my
understanding is that XQuery 1.0 is a strict superset of XPath 2.0, i.e.
you can use any XPath 2.0 expression in XQuery 1.0.

Thanks for that, Cheers, Tony.

On Mon, 28 Dec 2009 17:39:42 -0000, Tony Graham
wrote:

> On Sat, Dec 26 2009 15:25:02 +0000, abcoates [at] londata [dot] com wrote:
>> I have written some 1st-draft text for the wiki for the 'scala-xml'
>> project in the Scala incubator.
>>
>> http://wiki.github.com/scala-incubator/scala-xml
>>
>> Comments, etc. would be very welcome.
>
> - The sentence beginning "If you run ‘scala’" doesn't scan because of
> the "and" in "and the result is".
>
> - You could add "currently" in "However, there are no Scala
> implementations of XSLT or XQuery."
>
> - "you currently would use have to use Scala’s ability" could lose
> "use" or "have to use", and possibly "currently" if you add it to the
> previous sentence.
>
> - "full or partially re-implement the underlying functionality in
> Scala." sometimes sounds to me that "underlying functionality" means
> just re-implementing whatever's in Java. Maybe "underlying
> functionality" could be "technology" since you talk about
> technologies, including some that aren't standardised by the W3C, in
> the preceding list.
>
> If so, the "functionality" in the following paragraph would probably
> have to also change to "technology".
>
> - "a separate read-only API that provides some performance advantages"
> could be "a separate read-only API, e.g., for higher performance"
> since there may be other reasons for wanting a read-only API.
>
> - In "Is your API built on your language’s existing
> sequence/list/tree/etc. data structures", isn't "your language"
> Scala?
>
> - Does one "implement XPaths" or either "support XPaths" or "implement
> XPath support"?
>
> - Another question to ask in whether or not the XPath support is a
> complete implementation of the appropriate version, since some may
> consider that you can do enough with location path steps and numeric
> predicates or may not want to bother implementing all of the
> functions defined in the chosen version of the spec.
>
> - Is it that XPath 2.0 contains much of XQuery 1.0 or that XPath 2.0
> has a lot in common with XQuery 1.0? After all, neither XQuery 1.0
> nor XPath 2.0 normatively depends on the other.
>
> - You can alternatively implement XSLT 2.0 without implementing XQuery
> 1.0.
>
> Regards,
>
>
> Tony Graham Tony [dot] Graham [at] MenteithConsulting [dot] com
> Director W3C XSL FO SG Invited Expert
> Menteith Consulting Ltd XML Guild member
> XML, XSL and XSLT consulting, programming and training
> Registered Office: 13 Kelly's Bay Beach, Skerries, Co. Dublin, Ireland
> Registered in Ireland - No. 428599 http://www.menteithconsulting.com
> -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
> xmlroff XSL Formatter http://xmlroff.org
> xslide Emacs mode http://www.menteith.com/wiki/xslide
> Unicode: A Primer urn:isbn:0-7645-4625-2

Mark Howe
Joined: 2009-10-22,
User offline. Last seen 42 years 45 weeks ago.
Re: First draft of wiki text for 'scala-xml' project in Scala I

My answers to some of your questions:

# Are you providing support for reading/writing in-memory strings in XML?

# Are you providing support for reading/writing files in XML? (this is like reading/writing strings, but you need to deal with character encodings)

Yes

# Are you providing support for reading/writing huge files (i.e. too large to hold in memory at one time)? Is it forwards-only support, or full random-access support? Does it apply to reading huge files, writing huge files, or both?

No, because I don't think there's currently one good way to do this.

# Are you providing support for XML namespaces? If so, how are the prefix bindings specified?

Absolutely and, um, not sure.

# Do you provide fine control of how the XML is formatted when written to files, e.g. wrapping, indenting, ordering of attributes in an element, location of namespace declarations, prefixes used for particular namespaces?

I could live without most of that with the exception of control over prefixes.

# Are you providing support for validating XML, e.g. using DTDs, XML Schemas, RELAX NG, or Schematron? If so, does the validation operate before reading, any time from after reading to before writing, or after writing?

I'd expect the first three at least to happen during reading, and I suspect that we therefore need to at least provide some hooks into the reading process.

# Do you provide an API for accessing XML based on some standard information model, e.g. the W3C XML Infoset , the XML Schema Post-Schema Validation Infoset or the XQuery/XPath Data Model ?

Yes. Infoset is fine as long as it's possible to build validation etc on top of it. If not, I'd vote for XDM.

# Is your API built on your language’s existing sequence/list/tree/etc. data structures, or do XML objects have their own structures? Do you allow non-XML data objects (strings, decimals, booleans, etc.) to be direct members of sequences/lists/etc., or is every object XML-specific?

It may be a question to look at later, but my hunch is that custom structures will be useful, and that a potentially lossy conversion to, say, lists and hashes could be handy. Would a potentially limited conversion from, say, lists and hashes to immutable XML be a way to provide DOM-like functionality?

# How does your API deal with unnamed types in XML Schemas (unnamed “local” complex types or simple types). Can it support them unnamed, or does your API require them to be named (either manually or automatically)? APIs with automatically-generated names are not friendly for developers using those APIs.

I think the answer will depend on how hideous the syntax of the various options turns out to be.

# Are you implementing support for binding XML data to user-defined object models?

Personally, I think this is mad and wrong in any language (assuming it is based upon reflection that effectively breaks OO encapsulation), but I guess someone must think it's useful or it wouldn't be so popular in Java.

# Are you implementing standard APIs like SAX, DOM, StAX, JAXP, JAXB, .NET XML API?

Not necessarily as such, but something at about the SAX level seems to me to be a good level for the base technology.

# Are you implementing XPath support? If so, XPath 1.0 or 2.0? Are you implementing all of XPath, or a subset? If you are implementing XPath 2.0, which contains much of XQuery 1.0, do you implement XQuery as well, or not? If you implement XQuery, do you implement XSLT 2.0 as well?

In 2009 I think it's hard to argue for XPath 1.0. If we do XPath 2.0, we may as well at least leave the door open to doing XQuery 1.0 too. I think XSLT would be a great thing to implement in Scala, but as an application rather than as core functionality, and having XDM and XPath 2.0 would be a very big step in that direction.

# Are you trying to hide the fact that the data is XML from developers, or are you trying to expose all of the XML-specifics? Are you trying to make it possible to work either way?

I think that attempts to hide the XML always tend to bite you in the end, but we certainly want to let application programmers who think XML is just elements, elements and CDATA to code as if they are right (while enabling them to use namespaces, validation etc if/when they need to without throwing away all their earlier code).

# Do you need to support specific XML formats specially, e.g. XHTML?

Yes, but maybe as an output filter?

# Are you providing equal support for working with both “data-oriented” and “document-oriented” XML (in the sense of XML with little or no mixed content, versus XML that is mostly mixed content like XHTML)?

What would that mean in practice?

# Are you providing support for people who want to work with XML in an order-insensitive way? (the XML default is that order is important, except for order of attributes in an element)

What would that mean in practice?

If I may add one question of my own:

# Are you providing a binary serialisation mechanism for saving and restoring Scala internal XML representations without the cost of serialising and parsing XML?

Anthony B. Coates
Joined: 2009-09-12,
User offline. Last seen 2 years 35 weeks ago.
Re: First draft of wiki text for 'scala-xml' project in Scala I

Thanks, although in part I was wanting to find out what people think Scala
should do, not just what it currently does now, so I would be interested
in that side of the discussion too. Thanks,

Cheers, Tony.

On Sat, 02 Jan 2010 21:46:07 -0000, Mark Howe wrote:

> My answers to some of your questions:
>
> # Are you providing support for reading/writing in-memory strings in XML?
>
> # Are you providing support for reading/writing files in XML? (this is
> like reading/writing strings, but you need to deal with character
> encodings)
>
> Yes
>
> # Are you providing support for reading/writing huge files (i.e. too
> large to hold in memory at one time)? Is it forwards-only support, or
> full random-access support? Does it apply to reading huge files, writing
> huge files, or both?
>
> No, because I don't think there's currently one good way to do this.
>
> # Are you providing support for XML namespaces? If so, how are the
> prefix bindings specified?
>
> Absolutely and, um, not sure.
>
> # Do you provide fine control of how the XML is formatted when written
> to files, e.g. wrapping, indenting, ordering of attributes in an
> element, location of namespace declarations, prefixes used for
> particular namespaces?
>
> I could live without most of that with the exception of control over
> prefixes.
>
> # Are you providing support for validating XML, e.g. using DTDs, XML
> Schemas, RELAX NG, or Schematron? If so, does the validation operate
> before reading, any time from after reading to before writing, or after
> writing?
>
> I'd expect the first three at least to happen during reading, and I
> suspect that we therefore need to at least provide some hooks into the
> reading process.
>
> # Do you provide an API for accessing XML based on some standard
> information model, e.g. the W3C XML Infoset , the XML Schema Post-Schema
> Validation Infoset or the XQuery/XPath Data Model ?
>
> Yes. Infoset is fine as long as it's possible to build validation etc on
> top of it. If not, I'd vote for XDM.
>
> # Is your API built on your language’s existing sequence/list/tree/etc.
> data structures, or do XML objects have their own structures? Do you
> allow non-XML data objects (strings, decimals, booleans, etc.) to be
> direct members of sequences/lists/etc., or is every object XML-specific?
>
> It may be a question to look at later, but my hunch is that custom
> structures will be useful, and that a potentially lossy conversion to,
> say, lists and hashes could be handy. Would a potentially limited
> conversion from, say, lists and hashes to immutable XML be a way to
> provide DOM-like functionality?
>
> # How does your API deal with unnamed types in XML Schemas (unnamed
> “local” complex types or simple types). Can it support them unnamed, or
> does your API require them to be named (either manually or
> automatically)? APIs with automatically-generated names are not friendly
> for developers using those APIs.
>
> I think the answer will depend on how hideous the syntax of the various
> options turns out to be.
>
> # Are you implementing support for binding XML data to user-defined
> object models?
>
> Personally, I think this is mad and wrong in any language (assuming it
> is based upon reflection that effectively breaks OO encapsulation), but
> I guess someone must think it's useful or it wouldn't be so popular in
> Java.
>
> # Are you implementing standard APIs like SAX, DOM, StAX, JAXP, JAXB,
> .NET XML API?
>
> Not necessarily as such, but something at about the SAX level seems to
> me to be a good level for the base technology.
>
> # Are you implementing XPath support? If so, XPath 1.0 or 2.0? Are you
> implementing all of XPath, or a subset? If you are implementing XPath
> 2.0, which contains much of XQuery 1.0, do you implement XQuery as well,
> or not? If you implement XQuery, do you implement XSLT 2.0 as well?
>
> In 2009 I think it's hard to argue for XPath 1.0. If we do XPath 2.0, we
> may as well at least leave the door open to doing XQuery 1.0 too. I
> think XSLT would be a great thing to implement in Scala, but as an
> application rather than as core functionality, and having XDM and XPath
> 2.0 would be a very big step in that direction.
>
> # Are you trying to hide the fact that the data is XML from developers,
> or are you trying to expose all of the XML-specifics? Are you trying to
> make it possible to work either way?
>
> I think that attempts to hide the XML always tend to bite you in the
> end, but we certainly want to let application programmers who think XML
> is just elements, elements and CDATA to code as if they are right (while
> enabling them to use namespaces, validation etc if/when they need to
> without throwing away all their earlier code).
>
> # Do you need to support specific XML formats specially, e.g. XHTML?
>
> Yes, but maybe as an output filter?
>
> # Are you providing equal support for working with both “data-oriented”
> and “document-oriented” XML (in the sense of XML with little or no mixed
> content, versus XML that is mostly mixed content like XHTML)?
>
> What would that mean in practice?
>
> # Are you providing support for people who want to work with XML in an
> order-insensitive way? (the XML default is that order is important,
> except for order of attributes in an element)
>
> What would that mean in practice?
>
> If I may add one question of my own:
>
> # Are you providing a binary serialisation mechanism for saving and
> restoring Scala internal XML representations without the cost of
> serialising and parsing XML?
>

Mark Howe
Joined: 2009-10-22,
User offline. Last seen 42 years 45 weeks ago.
Re: First draft of wiki text for 'scala-xml' project in Scala I

Anthony B. Coates wrote:

> Thanks, although in part I was wanting to find out what people think Scala
> should do, not just what it currently does now, so I would be interested
> in that side of the discussion too.

That was my list of what it should do.

As for:

* do nothing in particular, let the user use the underlying non-Scala (Java or .NET) API;

* provide a Scala layer that makes it easier or more concise to use the underlying API;

* full or partially re-implement the underlying non-Scala API as an equivalent Scala API.

I think that's a choice for later, after we make some more abstract decisions between options along the lines of

1: Avoid trying to standardise anything in XML Scala because there are so many religious issues around how XML should be handled that we'll never agree on anything

2: Stick with the current XML Scala setup because it does what it does and at least some people like it

3: Do XML the W3C/Java way (ie an API a lot like the Xerces one)

4: Do XML one or more of the more inventive Java ways, (eg make XML disappear by turning it into objects transparently)

5: Come up with a new, distinctive but standards-friendly way to do XML that makes sense within the larger Scala system.

It's a bit late for #1

#2 really looks like a dead end to me - although the current setup does some stuff extremely well, the namespace thing alone means that it's always going to be regarded as a toy by heavyweight XML users (and full XML compliance is certainly not going to matter less in the future of web-based XML for which the current setup works best). Whatever the merits of the current setup, I don't see it as a good basis on which to build for the future.

I've done a bit of experimentation with #3, and it seems to me that the Xerces interface has too much of the Java mindset that Scala would like to leave behind to be a natural way forward for Scala XML.

As I said before, I think that the "turn XML into objects so we can pretend it was never XML" approach is mad and wrong, because it does an end run around the basics of OO by either exposing or assuming the internal structure of class objects and methods. It's also unlikely to be the fastest way to handle XML.

So I'm all for #5. The trickiest bit of making it happen seems to me to be agreeing what "Scala way" means when applied to XML. I'd suggest that some of the elements might be

* First-class XML object support, as at present

* Concise and natural literals and interpolation, as at present (although it does seem curious that the rest of Scala makes little use of interpolation, ie no Perl/PHP-style string interpolation, so is this really the Scala way? eg

http://old.nabble.com/Why-Martin-hates-string-interpolation-%28Was:-Re:-...

which IMO is right about Swiss keyboards but wrong about interpolation)

* Optimised, immutable XML data structures that are 100% compatible with one of the W3C data models (XDM if it's up to me, but Infoset with or without PSVI support could work too) and that are amenable to processing by recursion

* Tools to convert between those immutable structures and mutable, partial representations using existing Scala data structures to allow DOM-style piecemeal construction and manipulation of XML information, albeit with a performance overhead.

* A streaming parser that might form the basis of everything else, and which could also form the basis of various huge document implementations or XML database back ends, and which is amenable to non-blocking concurrency on a micro level (eg using multiple cores within one document parse or one XPath query)

* A relatively small and simple API for manipulating the basic data structures, from which higher levels of abstraction can be built naturally in the best traditions of functional programming.

* Support for extending whatever data model is adopted through the use of traits, which requires thought when the basic libraries are defined, ie structuring methods so that traits need only redefine the functionality they need to change rather than, say, replacing the entire parsing process.

* Tools for flexible handling of serialisation, including the quirks of XHTML and maybe even non-XML HTML (maybe plugging into the small and simple core API as handlers with a simple handler provided by default).

In terms of implementation, I still think that Xerces SAX2 might be a good basis on which to build, because it's robust, mature, compliant (certainly compared with the alternatives) and at a low enough level not to preclude many higher-level data models.

If we don't do that, I'd be interested in looking at some of the research out there on space-optimised XML representations - there was a paper and a poster about this at XML Prague last year, for example - because built-in compression could make Scala XML very attractive compared with Java for certain applications, and this may be a rare opportunity to implement it at the core of a programming language. AFAIR the Prague poster described a system implemented in C that offered radically reduced size with a relatively small performance hit.

But I think we need agreement on what we are planning to build at the higher level before we try to pick the enabling technology.

Tony Graham
Joined: 2009-12-28,
User offline. Last seen 42 years 45 weeks ago.
Re: First draft of wiki text for 'scala-xml' project in Scala I

On Thu, Dec 31 2009 10:19:00 +0000, abcoates [at] londata [dot] com wrote:
> Thanks, Tony! I've tried to address all of those points (updated
> version at http://wiki.github.com/scala-incubator/scala-xml).

Thanks.

> With regards to the question about how XQuery 1.0 relates to XPath
> 2.0, my understanding is that XQuery 1.0 is a strict superset of XPath
> 2.0, i.e. you can use any XPath 2.0 expression in XQuery 1.0.

Maybe I was trying to split that hair too finely.

From http://www.w3.org/TR/xpath20/#id-introduction:

XPath is designed to be embedded in a host language such as [XSLT
2.0] or [XQuery]. XPath has a natural subset that can be used for
matching (testing whether or not a node matches a pattern); this use
of XPath is described in [XSLT 2.0].

XQuery Version 1.0 is an extension of XPath Version 2.0. Any
expression that is syntactically valid and executes successfully in
both XPath 2.0 and XQuery 1.0 will return the same result in both
languages.

So, yes, you are correct. Sorry for sowing confusion.

Regards,

Tony Graham Tony [dot] Graham [at] MenteithConsulting [dot] com
Director W3C XSL FO SG Invited Expert
Menteith Consulting Ltd XML Guild member
XML, XSL and XSLT consulting, programming and training
Registered Office: 13 Kelly's Bay Beach, Skerries, Co. Dublin, Ireland
Registered in Ireland - No. 428599 http://www.menteithconsulting.com

Copyright © 2012 École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland