This page is no longer maintained — Please continue to the home page at www.scala-lang.org

Putting the cart before the horse?

32 replies

Fri, 2009-12-11, 20:45

Alex Cruise

Joined: 2008-12-17,

I think it's good to keep all these grandiose requirements in mind, and FSM knows my natural inclination is to start there, but I think we really need to sharpen our focus very tightly on transforming or replacing the basic scala.xml library into/with something that substantially meets our axiomatic design goals, namely:

- Comply fully with minimally applicable standards (e.g. XML syntax, namespaces)
- Sane node object model, e.g. simplify the Node/NodeSeq/Seq[Node] relationship
- Node object model should remain at least publicly immutable
- XML literal capability should be at least as good as current
- Node object model must be able to represent, preserve and accurately reproduce arbitrary XML Infosets.
- Good test coverage

... does anything here not belong on this list? Have I missed anything?

And, I think most would agree that v.next should place a high priority on meeting some other important requirements, to the extent that they don't conflict with the axiomatic design goals:

- Retain source compatibility, to the extent feasible (e.g. with implicits)
- Pattern matching should work better, especially w.r.t. namespaces.
- Fix or design around as many known, serious bugs as possible

... any others?

Here's a list of aspects that I think are definitely deserving of effort, but that I would suggest be kept as *non-goals* in the short term, although we want to keep them in mind so as not to design ourselves into a corner:

- Validation (DTD, XML Schema, Relax-NG)
- Transformation
- Full XPath support
- XML<->POSO binding
- Schema/Scala type integration (sorry Greg! :)

I submit that once we've achieved sanity and minimal spec compliance in the absolute basics of object model, literals, parsing, serialization, etc., we'll have a good platform from which to attack all the more interesting use cases that we all want to see implemented, and those efforts will stand a much better chance of success.

TIA for your thoughts, flames, and Math.signum return values. :)

-0xe1a

Fri, 2009-12-11, 20:57

#1

Alex Cruise

Joined: 2008-12-17,

Re: Putting the cart before the horse?

On 09-12-11 11:44 AM, Alex Cruise wrote:

4B22A12B [dot] 1010901 [at] cluonflux [dot] com" type="cite"> - XML<->POSO binding
- Schema/Scala type integration (sorry Greg! :)

Emphasis not added intentionally. :)

-0xe1a

Fri, 2009-12-11, 21:07

#2

Meredith Gregory

Joined: 2008-12-17,

Re: Putting the cart before the horse?

Dear Alex,
Not to worry. But do note that CDuce was perfectly usable in 2005 -- and was done by 1 (count it, 1) guy.
Best wishes,
--greg

On Fri, Dec 11, 2009 at 11:44 AM, Alex Cruise <alex [at] cluonflux [dot] com> wrote:

I think it's good to keep all these grandiose requirements in mind, and FSM knows my natural inclination is to start there, but I think we really need to sharpen our focus very tightly on transforming or replacing the basic scala.xml library into/with something that substantially meets our axiomatic design goals, namely:

- Comply fully with minimally applicable standards (e.g. XML syntax, namespaces)
- Sane node object model, e.g. simplify the Node/NodeSeq/Seq[Node] relationship
- Node object model should remain at least publicly immutable
- XML literal capability should be at least as good as current
- Node object model must be able to represent, preserve and accurately reproduce arbitrary XML Infosets.
- Good test coverage

... does anything here not belong on this list? Have I missed anything?

And, I think most would agree that v.next should place a high priority on meeting some other important requirements, to the extent that they don't conflict with the axiomatic design goals:

- Retain source compatibility, to the extent feasible (e.g. with implicits)
- Pattern matching should work better, especially w.r.t. namespaces.
- Fix or design around as many known, serious bugs as possible

... any others?

Here's a list of aspects that I think are definitely deserving of effort, but that I would suggest be kept as *non-goals* in the short term, although we want to keep them in mind so as not to design ourselves into a corner:

- Validation (DTD, XML Schema, Relax-NG)
- Transformation
- Full XPath support
- XML<->POSO binding
- Schema/Scala type integration (sorry Greg! :)

I submit that once we've achieved sanity and minimal spec compliance in the absolute basics of object model, literals, parsing, serialization, etc., we'll have a good platform from which to attack all the more interesting use cases that we all want to see implemented, and those efforts will stand a much better chance of success.

TIA for your thoughts, flames, and Math.signum return values. :)

-0xe1a

--
L.G. Meredith
Managing Partner
Biosimilarity LLC
1219 NW 83rd St
Seattle, WA 98117

+1 206.650.3740

http://biosimilarity.blogspot.com

Fri, 2009-12-11, 21:07

#3

Mark Howe

Joined: 2009-10-22,

Re: Putting the cart before the horse?

Le vendredi 11 décembre 2009 à 11:44 -0800, Alex Cruise a écrit :

> I submit that once we've achieved sanity and minimal spec compliance
> in the absolute basics of object model, literals, parsing,
> serialization, etc., we'll have a good platform from which to attack
> all the more interesting use cases that we all want to see
> implemented, and those efforts will stand a much better chance of
> success.

I think the strategy sounds excellent and the two lists look about right
to me too.

Some of the items in your second list might make more sense outside the
core(ish) Scala functionality, at least initially. The key thing IMO is
to put in place the basics that let non-core XML projects develop using
Scala XML tools rather than being forced to use Java ones.

Fri, 2009-12-11, 21:17

#4

Anthony B. Coates

Joined: 2009-09-12,

Re: Putting the cart before the horse?

I'm certainly thinking that good test coverage is a priority. If there
are problems with the current XML implementation, I would like to a bank
of tests that flush out those issues. I've had a quick skim through the
current tests, certainly not in enough detail yet to be able to say what
might be need better coverage in the tests.

Cheers, Tony.

On Fri, 11 Dec 2009 19:44:43 -0000, Alex Cruise wrote:

> I think it's good to keep all these grandiose requirements in mind, and
> FSM knows my natural inclination is to start there, but I think we
> really need to sharpen our focus very tightly on transforming or
> replacing the basic scala.xml library into/with something that
> substantially meets our axiomatic design goals, namely:
>
> - Comply fully with minimally applicable standards (e.g. XML syntax,
> namespaces)
> - Sane node object model, e.g. simplify the Node/NodeSeq/Seq[Node]
> relationship
> - Node object model should remain at least publicly immutable
> - XML literal capability should be at least as good as current
> - Node object model must be able to represent, preserve and accurately
> reproduce arbitrary XML Infosets.
> - Good test coverage
>
> ... does anything here not belong on this list? Have I missed anything?

Fri, 2009-12-11, 21:17

#5

Danny Ayers

Joined: 2009-12-11,

Re: Putting the cart before the horse?

I'm a newbie to Scala, but have spent way too much time around XML to
let this thread go -

2009/12/11 Alex Cruise :

> - Comply fully with minimally applicable standards (e.g. XML syntax,
> namespaces)
> - Sane node object model, e.g. simplify the Node/NodeSeq/Seq[Node]
> relationship
> - Node object model should remain at least publicly immutable
> - XML literal capability should be at least as good as current
> - Node object model must be able to represent, preserve and accurately
> reproduce arbitrary XML Infosets.
> - Good test coverage
>
> ... does anything here not belong on this list? Have I missed anything?

IMHO, that's spot on.

> And, I think most would agree that v.next should place a high priority on
> meeting some other important requirements, to the extent that they don't
> conflict with the axiomatic design goals:
>
> - Retain source compatibility, to the extent feasible (e.g. with implicits)
> - Pattern matching should work better, especially w.r.t. namespaces.
> - Fix or design around as many known, serious bugs as possible
>
> ... any others?

Dunno, sound reasonable.

> Here's a list of aspects that I think are definitely deserving of effort,
> but that I would suggest be kept as *non-goals* in the short term, although
> we want to keep them in mind so as not to design ourselves into a corner:
>
> - Validation (DTD, XML Schema, Relax-NG)
> - Transformation
> - Full XPath support

For convenience maybe, but I don't see any great necessity to get them
into the libs when there's Java kit around like Saxon. Forgive my
ignorance, but if it seems like such stuff should be in easy reach of
Scala developers, might it not be possible to simply alias to existing
(open) Java libs?

> - XML<->POSO binding
> - Schema/Scala type integration (sorry Greg! :)

Dunno. Similar stuff has proved useful around Java, but I can't help
thinking it'd be a lot of work for disproportionately low benefit.

> I submit that once we've achieved sanity and minimal spec compliance in the
> absolute basics of object model, literals, parsing, serialization, etc.,
> we'll have a good platform from which to attack all the more interesting use
> cases that we all want to see implemented, and those efforts will stand a
> much better chance of success.

Absolutely.

> TIA for your thoughts, flames, and Math.signum return values. :)

Good XML should be straightforwardly doable.

Can I please have you strongly bear in mind the Web, specifically the
notion of URIs as first-class constant names, and alongside that
bringing the RDF node & arc model into the fold, maybe through
allowing N3/Turtle syntax blocks (but still not essential - I've had
success using the Jena toolkit from Scala).

Cheers,
Danny.

Fri, 2009-12-11, 21:27

#6

Alex Cruise

Joined: 2008-12-17,

Re: Putting the cart before the horse?

On 09-12-11 11:53 AM, Meredith Gregory wrote:
> Not to worry. But do note that CDuce was perfectly usable in 2005 --
> and was done by 1 (count it, 1) guy.

Sorry, my post wasn't specifically a reaction to your comment, and I
actually think building a strong correspondence between Schema and Scala
types is a fantastic idea.

I just think we need to get bailing before everyone starts dreaming
about how many more masts to put up. :)

-0xe1a

Fri, 2009-12-11, 23:57

#7

Meredith Gregory

Joined: 2008-12-17,

Re: Putting the cart before the horse?

Dear SX'ers,
Here's a clarification of my 2 cents.

It's fine to support the DOM. There are cases where applications actually need to manipulate the XML syntax. However, that's not what XML is about, nor is it the 80% case.
XML is primarily a transfer syntax for in-memory structures. Imho, in the Scala setting people want to be able to serialize/deserialize what amount to case class instances.
In this use case XML Schema should match up with the case class structure. Here's an example

<complexType name="Point">   <complexContent>    <sequence>   <element name="x" type="float"/>      <element name="y" type="float"/>   </sequence>    </complexContent></complexType>
ought to map to
case class Point( x : Float, y : Float )
And then you ought to have the following verifiable property
{ ( x : Float, y : Float ) =>   { validate( serialize( Point( x, y ) ), "Point.xsd" ) == true } }
as well as the verifiable property
{ if ( validate( "point.xml", "Point.xsd" ) ) {    deserialize( "point.xml" ).isInstanceOf[Point] == true } }
Now, the good news is that JAXB already provides a mapping from XML Schema to Java classes. These have representation at the Scala level. So, most of the heavy lifting has already been done. The "tricky" bits are synthesizing the extractor goop so that you line up this generated structure with pattern matching. There are a lot of quick and dirty ways to do this that have a pretty long lifespan.
But, it's really time to support something other than working at the DOM level. In particular, working at the DOM level for this particular use case (which i believe represents the 80% case) is really subverting the type system. It's like working with row structures instead of a case class representation of something in a storage solution.
Best wishes,
--greg

On Fri, Dec 11, 2009 at 12:16 PM, Danny Ayers <danny [dot] ayers [at] gmail [dot] com> wrote:

I'm a newbie to Scala, but have spent way too much time around XML to
let this thread go -

2009/12/11 Alex Cruise <alex [at] cluonflux [dot] com>:

> - Comply fully with minimally applicable standards (e.g. XML syntax,
> namespaces)
> - Sane node object model, e.g. simplify the Node/NodeSeq/Seq[Node]
> relationship
> - Node object model should remain at least publicly immutable
> - XML literal capability should be at least as good as current
> - Node object model must be able to represent, preserve and accurately
> reproduce arbitrary XML Infosets.
> - Good test coverage
>
> ... does anything here not belong on this list? Have I missed anything?

IMHO, that's spot on.

> And, I think most would agree that v.next should place a high priority on
> meeting some other important requirements, to the extent that they don't
> conflict with the axiomatic design goals:
>
> - Retain source compatibility, to the extent feasible (e.g. with implicits)
> - Pattern matching should work better, especially w.r.t. namespaces.
> - Fix or design around as many known, serious bugs as possible
>
> ... any others?

Dunno, sound reasonable.

> Here's a list of aspects that I think are definitely deserving of effort,
> but that I would suggest be kept as *non-goals* in the short term, although
> we want to keep them in mind so as not to design ourselves into a corner:
>
> - Validation (DTD, XML Schema, Relax-NG)
> - Transformation
> - Full XPath support

For convenience maybe, but I don't see any great necessity to get them
into the libs when there's Java kit around like Saxon. Forgive my
ignorance, but if it seems like such stuff should be in easy reach of
Scala developers, might it not be possible to simply alias to existing
(open) Java libs?

> - XML<->POSO binding
> - Schema/Scala type integration (sorry Greg! :)

Dunno. Similar stuff has proved useful around Java, but I can't help
thinking it'd be a lot of work for disproportionately low benefit.

> I submit that once we've achieved sanity and minimal spec compliance in the
> absolute basics of object model, literals, parsing, serialization, etc.,
> we'll have a good platform from which to attack all the more interesting use
> cases that we all want to see implemented, and those efforts will stand a
> much better chance of success.

Absolutely.

> TIA for your thoughts, flames, and Math.signum return values. :)

Good XML should be straightforwardly doable.

Can I please have you strongly bear in mind the Web, specifically the
notion of URIs as first-class constant names, and alongside that
bringing the RDF node & arc model into the fold, maybe through
allowing N3/Turtle syntax blocks (but still not essential - I've had
success using the Jena toolkit from Scala).

Cheers,
Danny.

--
http://danny.ayers.name

--
L.G. Meredith
Managing Partner
Biosimilarity LLC
1219 NW 83rd St
Seattle, WA 98117

+1 206.650.3740

http://biosimilarity.blogspot.com

Sat, 2009-12-12, 01:47

#8

Danny Ayers

Joined: 2009-12-11,

Re: Putting the cart before the horse?

2009/12/11 Meredith Gregory :

> XML is primarily a transfer syntax for in-memory structures.

I disagree. It's no more and no less than what the spec says.

http://www.w3.org/TR/REC-xml/

> Imho, in the
> Scala setting people want to be able to serialize/deserialize what amount to
> case class instances.

I'm sure that's one desirable use, but a major motivation behind XML
was interoperability - not just between in-memory structures but
between any given systems that can understand bytes. Pushing things
towards some kind of specific language-oriented class model thing
would be missing the point. I can create an XML document in Scala, you
can make sense of it in PHP, or vice versa.

XML Schema only get you so far (they're also a bit clunky in practice
- Relax NG is the choice of anyone who wishes to remain sane). This is
just opinion, but I think that should be out of scope for Scala.

If there is any doubt, why not ask people that have expertise in the
domain - there's a mailing list called xml-dev :
http://www.xml.org/xml-dev/

Cheers,
Danny.

Sat, 2009-12-12, 09:47

#9

Mark Howe

Joined: 2009-10-22,

Re: Putting the cart before the horse?

I suggest aiming somewhere between "reinventing a way to parse XML at
the byte level" and "slapping minimal Scala syntax on top of the whole
of Xerces-J". My preferred middle way would be coming up with a decent
equivalent to Xerces SAX2. (There is currently a Scala equivalent, but
it is described as experimental and doesn't seem to have enough hooks to
do XDM.)

A SAX-type API is a level above the bytes, and we could almost certainly
reuse some or all of Xerces SAX2 functionality to do that work in a
robust and compliant way. We thus avoid most of the picky XML syntax
issues mentioned by Antony Coates.

Xerces-J SAX2 handles DTDs and W3C Schema as part of the SAX2 parsing
process, which is important for anyone who wants XPath 2.0 or anything
that builds on it. Top of the list of things not to invent is yet
another neat way of doing XML validation!

A SAX-type API is a level below namespaces (you get the attributes from
which to implement "stratification" properly) and is several levels
below DOM. If we had a W3C-friendly* SAX-type API we could implement DOM
quite easily for people who want DOM. We could also implement other data
structures efficiently for people who want other data structures. I
think this goes some way towards addressing the "80% case" concern
raised by Meredith Gregory.

(DOM isn't IMO a good general-purpose starting point because it implies
a complete tree traversal before the application code can start work.
That's fine if a DOM tree is what the application code needs. But if you
want to build a different sort of data structure, you end up building a
complete DOM tree just so you can walk through the entire tree again,
possibly in the same order as the unserialised XML document before DOM
messed with it. Also, it means holding the entire XML document in
memory, which isn't always an option.)

It would make sense to change the SAX API around a bit for Scala use.
For example, all attributes of an element turn up as references to a
larger array, so there's no way I can think of to iterate through
attributes without writing index-incrementing Scala that looks like
Java-going-on-C. A callback per attribute would add a small overhead,
but would make the application code a lot neater, and would potentially
allow the post-processing of each attribute to be farmed out to an actor
while deserialisation continues.

The extensions system for Xerces could also benefit a lot from
Scalarisation. To get enough information from SAX2 to implement XDM you
need to implement a number of interfaces, and in practice this means the
classic google-copy-paste-hack approach that traits are intended to
address. ISTM that a Scala version should allow the application
programmer to mix in a DTD handler or an entity resolver and get at
least default functionality.

(The default Xerces-J DTD behaviour appears to have DOS'd the W3C
servers to the point where W3C blocks access to the most common DTDs via
a Java user agent. So a Scala EntityResolver trait that, by default,
either downloads once and caches each DTD, demands a map of all
permitted DTDs or disables DTD resolution would be A Good Thing.)

I'd certainly be willing to put some time into making robust Scala
SAX-like parsing happen, not because I particularly like working at the
SAX level of abstraction, but because it would make a lot of much cooler
things much easier to implement.

* By "W3C-friendly" I mean "Exposing the bits of an XML document needed
to do W3C-compliant things in a natural way." SAX itself isn't a W3C
standard, but it generally hangs out at the same parties. There's a
brief overview of how this works at

http://www.w3.org/DOM/faq.html#SAXandDOM

from which, re the ubiquity or otherwise of DOM:

"If you intend to allow other code, such as utility routines,
middleware, applications, and scripts within the document, to explore
and possibly alter the document's contents, the DOM is almost certainly
the way to go; it provides a W3C-standardized, complete, and editable
view of the document's contents.

"Conversely, if your task processes the document on a straight-line
flow-through basis, without permitting users to write "scripts" against
it and without needing much contextual information at each stage -- for
example, if you're parsing the XML document directly into a database for
storage -- SAX may provide a more direct interface to the parser.

"Between these extremes, it's a judgment call; you have to think about
how much trouble it will be to implement your own document model versus
using the DOM, and about how you expect your application to grow in the
future."

(paragraph breaks mine.)

Sat, 2009-12-12, 16:57

#10

Anthony B. Coates

Joined: 2009-09-12,

Re: Putting the cart before the horse?

I like the general thrust of this. However, instead of the SAX API, it
might be better to try the StAX API instead, and derive something from
that, because it is much closer in operation to the .NET XML API (indeed,
it's fair to say it was inspired by .NET's XML API), and that will make it
easier to align code across Java and .NET.

I'm still planning to have a look at the current built-in parser, once
I've worked out exactly how the tests should be set up for the Scala
source tree. I can see that people would like a cross-platform built-in
parser, at least a basic one (i.e. non-validating, etc.), as long as it
doesn't do anything wrong. Since namespace handling has been brought into
question with the current parser, it would be good to set up some tests to
know for sure.

Cheers, Tony.

On Sat, 12 Dec 2009 08:42:48 -0000, Mark Howe wrote:

> I suggest aiming somewhere between "reinventing a way to parse XML at
> the byte level" and "slapping minimal Scala syntax on top of the whole
> of Xerces-J". My preferred middle way would be coming up with a decent
> equivalent to Xerces SAX2. (There is currently a Scala equivalent, but
> it is described as experimental and doesn't seem to have enough hooks to
> do XDM.)
>
> A SAX-type API is a level above the bytes, and we could almost certainly
> reuse some or all of Xerces SAX2 functionality to do that work in a
> robust and compliant way. We thus avoid most of the picky XML syntax
> issues mentioned by Antony Coates.
>
> Xerces-J SAX2 handles DTDs and W3C Schema as part of the SAX2 parsing
> process, which is important for anyone who wants XPath 2.0 or anything
> that builds on it. Top of the list of things not to invent is yet
> another neat way of doing XML validation!
>
> A SAX-type API is a level below namespaces (you get the attributes from
> which to implement "stratification" properly) and is several levels
> below DOM. If we had a W3C-friendly* SAX-type API we could implement DOM
> quite easily for people who want DOM. We could also implement other data
> structures efficiently for people who want other data structures. I
> think this goes some way towards addressing the "80% case" concern
> raised by Meredith Gregory.
>
> (DOM isn't IMO a good general-purpose starting point because it implies
> a complete tree traversal before the application code can start work.
> That's fine if a DOM tree is what the application code needs. But if you
> want to build a different sort of data structure, you end up building a
> complete DOM tree just so you can walk through the entire tree again,
> possibly in the same order as the unserialised XML document before DOM
> messed with it. Also, it means holding the entire XML document in
> memory, which isn't always an option.)
>
> It would make sense to change the SAX API around a bit for Scala use.
> For example, all attributes of an element turn up as references to a
> larger array, so there's no way I can think of to iterate through
> attributes without writing index-incrementing Scala that looks like
> Java-going-on-C. A callback per attribute would add a small overhead,
> but would make the application code a lot neater, and would potentially
> allow the post-processing of each attribute to be farmed out to an actor
> while deserialisation continues.
>
> The extensions system for Xerces could also benefit a lot from
> Scalarisation. To get enough information from SAX2 to implement XDM you
> need to implement a number of interfaces, and in practice this means the
> classic google-copy-paste-hack approach that traits are intended to
> address. ISTM that a Scala version should allow the application
> programmer to mix in a DTD handler or an entity resolver and get at
> least default functionality.
>
> (The default Xerces-J DTD behaviour appears to have DOS'd the W3C
> servers to the point where W3C blocks access to the most common DTDs via
> a Java user agent. So a Scala EntityResolver trait that, by default,
> either downloads once and caches each DTD, demands a map of all
> permitted DTDs or disables DTD resolution would be A Good Thing.)
>
> I'd certainly be willing to put some time into making robust Scala
> SAX-like parsing happen, not because I particularly like working at the
> SAX level of abstraction, but because it would make a lot of much cooler
> things much easier to implement.
>
> * By "W3C-friendly" I mean "Exposing the bits of an XML document needed
> to do W3C-compliant things in a natural way." SAX itself isn't a W3C
> standard, but it generally hangs out at the same parties. There's a
> brief overview of how this works at
>
> http://www.w3.org/DOM/faq.html#SAXandDOM
>
> from which, re the ubiquity or otherwise of DOM:
>
> "If you intend to allow other code, such as utility routines,
> middleware, applications, and scripts within the document, to explore
> and possibly alter the document's contents, the DOM is almost certainly
> the way to go; it provides a W3C-standardized, complete, and editable
> view of the document's contents.
>
> "Conversely, if your task processes the document on a straight-line
> flow-through basis, without permitting users to write "scripts" against
> it and without needing much contextual information at each stage -- for
> example, if you're parsing the XML document directly into a database for
> storage -- SAX may provide a more direct interface to the parser.
>
> "Between these extremes, it's a judgment call; you have to think about
> how much trouble it will be to implement your own document model versus
> using the DOM, and about how you expect your application to grow in the
> future."
>
> (paragraph breaks mine.)
>

Sat, 2009-12-12, 17:57

#11

Mark Howe

Joined: 2009-10-22,

Re: Putting the cart before the horse?

Le samedi 12 décembre 2009 à 15:50 +0000, Anthony B. Coates (Londata) a
écrit :

> However, instead of the SAX API, it
> might be better to try the StAX API instead, and derive something from
> that

Looks plausible to me at a quick glance. Is there a StAX reference
manual or spec somewhere? Or a decent book? It's not the greatest of
words to google.

http://java.sun.com/webservices/docs/1.6/tutorial/doc/SJSXP3.html

was the best I found.

> Since namespace handling has been brought into
> question with the current parser, it would be good to set up some tests to
> know for sure.

That would be great! The cryptic aside documenting of this behaviour
suggests that the test needs to look specifically at serialisation. It
sounds like the namespace information is read in correctly, but is not
used systematically when the tree is serialised.

Sat, 2009-12-12, 18:07

#12

Anthony B. Coates

Joined: 2009-09-12,

Re: Putting the cart before the horse?

Replies below:

On Sat, 12 Dec 2009 16:50:50 -0000, Mark Howe wrote:

>> However, instead of the SAX API, it
>> might be better to try the StAX API instead, and derive something from
>> that
>
> Looks plausible to me at a quick glance. Is there a StAX reference
> manual or spec somewhere? Or a decent book? It's not the greatest of
> words to google.

Sorry, StAX is the old name. It's part of Java 6 now, the
'javax.xml.stream' package:
http://java.sun.com/javase/6/docs/api/javax/xml/stream/package-summary.html

>> Since namespace handling has been brought into
>> question with the current parser, it would be good to set up some tests
>> to
>> know for sure.
>
> That would be great! The cryptic aside documenting of this behaviour
> suggests that the test needs to look specifically at serialisation. It
> sounds like the namespace information is read in correctly, but is not
> used systematically when the tree is serialised.

OK, I'm glad you pointed that out specifically. I must say, I'm still
struggling to understand how 'partest' works, and what is the right have
to write tests for the Scala source tree, so if you can give me any
pointers, I would be really grateful. Thanks,

Cheers, Tony.

Sun, 2009-12-13, 00:17

#13

Meredith Gregory

Joined: 2008-12-17,

Re: Putting the cart before the horse?

Dear Danny,
Thanks for your note!
Regarding schema -- i hate them all. RelaxNG sucks a little less than XSD, but i've found that in the wild people use XSD, when the use schema at all. Fortunately, there are many many tools, including open source ones, that will convert between the varieties of schema -- when there is a nice conversion. So, picking one that works for the 80% case is fine. Since i knew that JAXB works with XSD and couldn't remember whether it worked with RelaxNG, i couched my example in terms of XSD. i have no affiliation with any of them.
The crucial point is the Types : Schema :: Instances : Documents. That should be a basic collection of features that Scala XML supports -- since there is more than enough toolage for the JVM to do it with little fuss and there is an existing design artifact (OCamlDuce/CDuce) that shows a clean and well worked out design. Working at this level aligns with Scala's basic message about the power of types. Then we can get the Scala compiler (and schema/document validation) to do more work for us. Specifically, the analogy above means that type-checking coincides with document validation for the place where types and schema overlap.
Regarding XML as transfer syntax -- my point is that there are very very few cases in which the XML syntax itself is the point. The syntax is designed to denote something. There are very few cases where an application is interested in the document as document. Usually they are interested in document as some external representation of an application artifact (e.g. a bibliography, an RSS feed, ...). That's what makes it a transfer syntax. Supporting the semantic entities the syntax captures is something that Scala XML ought to support -- over and above the DOM's model of XML syntax.
Both of these points are completely independent of cross language stuff. Each language comes with its own notions of types -- however rich or impoverished they are. The fact that there is mapping from an XML schema language to the types of some language does not preclude or necessitate the existence (or not) to the types of another. It just so happens that there is a natural mapping that Scala inherits from the mapping to Java that the JAXB community did. This is echoed in the mapping that the .net people did for C#/.net (although the JAXB mapping is much more thorough and the implementation much more robust).
Best wishes,
--greg

On Fri, Dec 11, 2009 at 4:38 PM, Danny Ayers <danny [dot] ayers [at] gmail [dot] com> wrote:

2009/12/11 Meredith Gregory <lgreg [dot] meredith [at] gmail [dot] com>:

> XML is primarily a transfer syntax for in-memory structures.

I disagree. It's no more and no less than what the spec says.

http://www.w3.org/TR/REC-xml/

> Imho, in the
> Scala setting people want to be able to serialize/deserialize what amount to
> case class instances.

I'm sure that's one desirable use, but a major motivation behind XML
was interoperability - not just between in-memory structures but
between any given systems that can understand bytes. Pushing things
towards some kind of specific language-oriented class model thing
would be missing the point. I can create an XML document in Scala, you
can make sense of it in PHP, or vice versa.

XML Schema only get you so far (they're also a bit clunky in practice
- Relax NG is the choice of anyone who wishes to remain sane). This is
just opinion, but I think that should be out of scope for Scala.

If there is any doubt, why not ask people that have expertise in the
domain - there's a mailing list called xml-dev :
http://www.xml.org/xml-dev/

Cheers,
Danny.

--
http://danny.ayers.name

--
L.G. Meredith
Managing Partner
Biosimilarity LLC
1219 NW 83rd St
Seattle, WA 98117

+1 206.650.3740

http://biosimilarity.blogspot.com

Sun, 2009-12-13, 11:17

#14

Anthony B. Coates

Joined: 2009-09-12,

Re: Putting the cart before the horse?

Greg, I'm not sure I get your point here. You comment about XML syntax
just denoting something could be applied equally to Scala class models.

Ultimately, a lot of projects involve different physical data models -
application data models, message data models, and database data models.
In my experience, the people who specialise in one of these models tend
the denigrate the importance of the other models. However, they are all
important, and all deserve to be taken equally seriously. For example,
while XML is sometimes used as little more than a serialisation of an
application data model, sometimes the XML is what binds a variety of
applications together, and so part of the application data model is simply
an implementation of the XML data model.

XML has a wide variety of uses. I would hope that Scala will be able to
do justice to all of those uses, as all the better XML tools do now. One
thing that put me off about Groovy's XML implementation, for example, was
that it was made clear in comments that it was originally implemented by
someone who thought that XML namespaces were a waste of time, and so
didn't need to supported. While Groovy has some namespace support now,
last time I checked it still didn't work properly, and that was a key
reason for my deciding to try Scala instead.

If we are going to do XML, let's try and do it as broadly as we can,
without too much pigeonholing of XML based on what each of us might
currently use it for. There will be other people who use it differently,
but who might still benefit from being able to use Scala.

Cheers, Tony.

On Sat, 12 Dec 2009 23:13:50 -0000, Meredith Gregory
wrote:

> Regarding XML as transfer syntax -- my point is that there are very very
> few
> cases in which the XML syntax *itself* is the point. The syntax is
> designed
> to *denote* something. There are very few cases where an application is
> interested in the document as document. Usually they are interested in
> document as some external representation of an application artifact
> (e.g. a
> bibliography, an RSS feed, ...). That's what makes it a transfer syntax.
> Supporting the semantic entities the syntax captures is something that
> Scala
> XML ought to support -- over and above the DOM's model of XML syntax.

Sun, 2009-12-13, 22:27

#15

Meredith Gregory

Joined: 2008-12-17,

Re: Putting the cart before the horse?

Dear Tony,
Thanks very much for your note. i agree with much of your sentiment, but think we might be missing each other on the principal point -- and that point may be too subtle to capture in email. The key question for me is not an allegiance to a specific data model or data modeling technique but to use inherent biases in data modeling technology to best effect. In Scala there is this huge tool -- the type system. It's a major piece of machinery that has many, many more uses than it is currently put to. Likewise, in XML, there is this major piece of toolage -- schema validation. It turns out that the analogy
Type : Schema :: Instance : Document [1]
is not just an analogy but an instance of a deeper principle that informs the design of both systems[2]. Believe me, Allen Brown (XML working group committee member from MSFT) who with Phil Wadler produced the formal model for XML Schema -- was a friend and colleague with whom i worked closely for a number of years [3]. i know what considerations were informing the design.
If one works with these biases, instead of against them, which are in turn aligned with much deeper principles, then the way is paved for much greater productivity and use cases that we have not yet discussed. Continued work at the merely syntactic level will serve to limit ScalaXML usability -- which i find sad -- especially since

CDuce/OCamlDuce was much further along than ScalaXML as early as 2005, and it was done by just 1 guy (btw, has anyone in this thread, besides myself, actually looked at these language offerings? Scala's design as a language owes a great deal to OCaml.)
There's so much existing open source tooling that is well-vetted, well-exercised and robust that make this job a no-brainer.

Anyway, since i'm not committing code to this effort i'll shut up now and trust the committers to do a fine job.
Best wishes,
--greg
[1] Instance is actually the wrong technical word. Witness or Inhabitant is better, but since that language is not well established in this community i haven't used it. There's nothing here that's actually about OO class-instance notions. It' actually about types and type inhabitants. [2] If you want to know what that principle is look up Curry-Howard isomorphism.[3] Actually, Allen worked for me on a top secret project, after his Schema work, but that's another story.

On Sun, Dec 13, 2009 at 2:13 AM, Anthony B. Coates (Londata) <abcoates [at] londata [dot] com> wrote:

Greg, I'm not sure I get your point here. You comment about XML syntax just denoting something could be applied equally to Scala class models.

Ultimately, a lot of projects involve different physical data models - application data models, message data models, and database data models. In my experience, the people who specialise in one of these models tend the denigrate the importance of the other models. However, they are all important, and all deserve to be taken equally seriously. For example, while XML is sometimes used as little more than a serialisation of an application data model, sometimes the XML is what binds a variety of applications together, and so part of the application data model is simply an implementation of the XML data model.

XML has a wide variety of uses. I would hope that Scala will be able to do justice to all of those uses, as all the better XML tools do now. One thing that put me off about Groovy's XML implementation, for example, was that it was made clear in comments that it was originally implemented by someone who thought that XML namespaces were a waste of time, and so didn't need to supported. While Groovy has some namespace support now, last time I checked it still didn't work properly, and that was a key reason for my deciding to try Scala instead.

If we are going to do XML, let's try and do it as broadly as we can, without too much pigeonholing of XML based on what each of us might currently use it for. There will be other people who use it differently, but who might still benefit from being able to use Scala.

Cheers, Tony.

On Sat, 12 Dec 2009 23:13:50 -0000, Meredith Gregory <lgreg [dot] meredith [at] gmail [dot] com> wrote:

Regarding XML as transfer syntax -- my point is that there are very very few
cases in which the XML syntax *itself* is the point. The syntax is designed
to *denote* something. There are very few cases where an application is
interested in the document as document. Usually they are interested in
document as some external representation of an application artifact (e.g. a
bibliography, an RSS feed, ...). That's what makes it a transfer syntax.
Supporting the semantic entities the syntax captures is something that Scala
XML ought to support -- over and above the DOM's model of XML syntax.

Sun, 2009-12-13, 22:47

#16

Alex Cruise

Joined: 2008-12-17,

Re: Putting the cart before the horse?

Meredith Gregory wrote:
>
> If one works with these biases, instead of against them, which are in
> turn aligned with much deeper principles, then the way is paved for
> much greater productivity and use cases that we have not yet
> discussed. Continued work at the merely syntactic level will serve to
> limit ScalaXML usability -- which i find sad -- especially since
>
> * CDuce/OCamlDuce was much further along than ScalaXML as early as
> 2005, and it was done by just 1 guy (btw, has anyone in this
> thread, besides myself, actually looked at these language
> offerings? Scala's design as a language owes a great deal to OCaml.)
>
Greg,

I sincerely wish I was able to intuitively grasp your point, and I feel
like we would be well served by giving serious consideration to your
advice, but I lack the experience in reading OCaml and/or the
intelligence to fully understand what I'm looking at. I'm loath to add
to your workload, but maybe if you could distill what you feel to be the
central contributions of OCamlDuce, and produce a quick sketch of what
some analogous Scala library/language features might look like, I know
it would help me (and, I suspect, others) a great deal.

Thanks!

-0xe1a

Sun, 2009-12-13, 23:47

#17

Meredith Gregory

Joined: 2008-12-17,

Re: Putting the cart before the horse?

Dear Alex,
Since you asked i will do that. i'm out of time on this for today, but will get you a small sample, tomorrow. In the meantime, on the OCamlDuce site i found both an atomrss reader sample and an aaxl (the xml verison of aadl). If you can read Scala and whenever you see in OCaml 'let ptn = expr' you think 'val ptn = expr' you ought to be able to transliterate in your head (ok, that might be a slight exaggeration...). Beyond that, reconsider my example
Point.xsd:
<complexType name="Point">   <complexContent>   <sequence>    <element name="x" type="float"/>     <element name="y" type="float"/>    </sequence>   </complexContent></complexType>
ought to map to
case class Point( x : Float, y : Float )
And then you ought to have the following verifiable property, (sprinkle quantifiers as needed)
{ ( x : Float, y : Float ) =>   { validate( serialize( Point( x, y ) ), "Point.xsd" ) == true } } // The fact that this function must return true for all x and y in Float says that validation coincides with typing
as well as the verifiable property
{ if ( validate( "point.xml", "Point.xsd" ) ) {    deserialize( "point.xml" ).isInstanceOf[Point] == true } } // The fact that whenever the test is true the true branch always returns true says typing coincides with validation
This will allow someone to write things like
deserialize( "point.xml" ) match {    case Point( x, y ) => sqrt( x*x + y*y )}
And the fact that it compiles will mostly do away with the need for runtime validation. Again, if people want a different schema language, you can achieve this in several ways, including:

JAXB supports multiple schema languages, including RelaxNG (i think);
Trang -- put out by RelaxNG's author -- supports schema conversion, as does Oxygen.

But, take this example and recast it in the existing framework. Someone has to build by hand the case class and then someone has to build by hand the serialization/deserialization mechanism for Point. That's repetitious and error-prone and no-one gets any support from the type system. Their deserialization mechanism could completely subvert typing, for example, which would be a mistake. If you want to convert to a different type -- say from Point to Vector -- then keeping track of the conversion *with types* is important for many reasons.
Best wishes,
--greg

On Sun, Dec 13, 2009 at 1:44 PM, Alex Cruise <alex [at] cluonflux [dot] com> wrote:

Meredith Gregory wrote:

<snip/>
If one works with these biases, instead of against them, which are in turn aligned with much deeper principles, then the way is paved for much greater productivity and use cases that we have not yet discussed. Continued work at the merely syntactic level will serve to limit ScalaXML usability -- which i find sad -- especially since
* CDuce/OCamlDuce was much further along than ScalaXML as early as
2005, and it was done by just 1 guy (btw, has anyone in this
thread, besides myself, actually looked at these language
offerings? Scala's design as a language owes a great deal to OCaml.)

Greg,

I sincerely wish I was able to intuitively grasp your point, and I feel like we would be well served by giving serious consideration to your advice, but I lack the experience in reading OCaml and/or the intelligence to fully understand what I'm looking at. I'm loath to add to your workload, but maybe if you could distill what you feel to be the central contributions of OCamlDuce, and produce a quick sketch of what some analogous Scala library/language features might look like, I know it would help me (and, I suspect, others) a great deal.

Thanks!

-0xe1a

--
L.G. Meredith
Managing Partner
Biosimilarity LLC
1219 NW 83rd St
Seattle, WA 98117

+1 206.650.3740

http://biosimilarity.blogspot.com

Mon, 2009-12-14, 12:27

#18

Mark Howe

Joined: 2009-10-22,

Re: Putting the cart before the horse?

Le dimanche 13 décembre 2009 à 14:41 -0800, Meredith Gregory a écrit :

> Beyond that, reconsider my example

Briefly (because of my ignorance in several pertinent areas)...

Making Scala validated XML types map onto Scala types full stop would
certainly be very neat, and if we can do it we should do it for the
reasons you state. Implementing at least some validation via Scala type
checking would certainly interest me.

But does this imply any changes to the *general* Scala type model? The
XDM type system isn't up for renegotiation, so presumably any gap would
have to be closed by Scala moving towards XDM.

Obviously it's always possible to build a set of types just for XML, but
doing that will reduce the integration of Scala XML into the language as
a whole, and there seems to be a consensus that we don't want to do
that. When I call other libraries, I want a Scala Int, not an
XMLSchemaInt. Implicit conversions might help, but the mapping between
the two type systems needs to be quite clean for this to work.

Mon, 2009-12-14, 13:57

#19

milessabin

Joined: 2008-08-11,

Re: Putting the cart before the horse?

A few comments on this thread ...

* Greg is quite right to point to the example of CDuce and it's
various progeny. The existence of first-class support for XML in
Scala, esp. the interaction with pattern matching was very much
inspired by the example of CDuce. I think it would make sense to
explore how to improve on this rather then revert to a more Java-like
model.

* There have already been efforts towards integrating XML schema types
with Scala's type system, although there hasn't been much activity in
that space recently,

https://lampsvn.epfl.ch/trac/scala/browser/xmltypes/trunk?order=date&desc=1

* Tony and others are also right to point out that the typed,
data-bound approach to XML is not the only legitimate perspective.
There is also a more dynamically (or even un-)typed document-oriented
point of view with an equally long history. Googling for "xml-dev
bohemians" might give a flavour of the debate and an indication that
neither point of view is likely to prevail (and rightly so). From the
document-oriented point of view, XML is *all* about syntax, the
semantics being ambiguous and open to (re-)interpretation.

It will be interesting to see if we can come up with anything which
keeps everybody happy.

Cheers,

Miles

Mon, 2009-12-14, 17:57

#20

Anthony B. Coates

Joined: 2009-09-12,

ScalaDuce? was: Putting the cart before the horse?

I've had a browse through the CDuce and OCamlDuce docs (well, the
tutorials at least). The syntax of both is a bit troubling, Scala's is
definitely neater. Ultimately, and XML document is a hierarchical
(tree-like) data structure. You can represent it in XML syntax, or you
could represent it in some alternative sequence notation. I really don't
like CDuce's "1/2 XML, 1/2 sequence" notation, not for myself.

That aside, I would be interesting to investigate a bit further how
something like OCamlDuce could be done in Scala, "ScalaDuce", but keeping
the syntax cleaner and more Scala-like or XML-like (not
yet-another-something-else-like). That is to say, how can the same kind of
functionality be implemented.

One thing I wasn't clear about, in the CDuce docs, was what the underlying
XML model is. It looked a bit like a DTD-based model, but maybe that's
just the superficial look of it. There are different views of what is in
an XML document, data-wise. For example, XML Schema has one data model
for XML, XSLT has another slightly different one. Neither is wrong, but
you have to decide which way to lean. XML Schema has a type-based
system. RELAX NG takes a different 'patterns of elements and attributes'
approach, not unlike XML Schema if you used groups rather than complex
types. At a high-level, XML Schema and RELAX NG are much of a muchness,
much as some people are passionate about one of the other. They both miss
out a lot of features that people want, like richer validation options.
XML Schema 1.1 will be building in Schematron-style validation, which will
be an improvement, but still not enough for some situations. Also, XML
Schema 1.1. will support order-free formats, which will be a big change
from the fixed-sequence formats that are widely used today with XML
Schema, even when the data is unordered.

This has the makings of a Scala incubator project, except its too early to
be thinking about code. Is there some kind of wiki-space that can be used
for deliberating about this kind of thing? One day Google Wave might
serve that purpose, but for now a wiki or similar would be good. Is there
something like that as part of the incubator infrastructure, Miles?

Thanks, Cheers, Tony.

On Mon, 14 Dec 2009 12:55:05 -0000, Miles Sabin
wrote:

> * Greg is quite right to point to the example of CDuce and it's
> various progeny. The existence of first-class support for XML in
> Scala, esp. the interaction with pattern matching was very much
> inspired by the example of CDuce. I think it would make sense to
> explore how to improve on this rather then revert to a more Java-like
> model.

Mon, 2009-12-14, 18:37

#21

Jürgen Purtz

Joined: 2009-12-03,

Re: Putting the cart before the horse?

Alex Cruise writes:

>
>
> I think it's good to keep all these grandiose requirements in mind, and
> FSM knows my natural inclination is to start there, but I think we
> really need to sharpen our focus very tightly on transforming or
> replacing the basic scala.xml library into/with something that
> substantially meets our axiomatic design goals, namely:
> - Comply fully with minimally applicable standards (e.g. XML syntax,
> namespaces)
> - Sane node object model, e.g. simplify the Node/NodeSeq/Seq[Node]
> relationship
> - Node object model should remain at least publicly immutable
> - XML literal capability should be at least as good as current
> - Node object model must be able to represent, preserve and accurately
> reproduce arbitrary XML Infosets.
> - Good test coverage

I agree - so fare.

As I have seen in the discussions of last week, many people decline DOM because
it is mutal. I feel it's important to point out, that no one declined parent
references for an immutual data model.

If we really want to develop our own *general* Scala XML data model (only data
model - the API is a second task) as an extention to the existing immutual
classes in scala.xml, it should conform to a lot of standards:

XML 1.0 [1]
XML 1.1 [2]
Namespaces in XML 1.0 [3]
Namespaces in XML 1.1 [4]
XML Information Set [5]
XQuery 1.0 and XPath 2.0 Data Model (XDM) [6] (possibly at a later stage)
All of them require parent references.

I assume that we are able to do the job. But I denial that we can do much more.
Any one of parsing, XPath or XQuery evaluation, validateing, data binding or
XSLT needs too much effort from our community to bring it to a sucessful end.
Even filling the data model with data is manageable only when it is based on an
external libray: parsing a XML source, recognising and assigning all data items
to their dedicated class is a hard job, eg: external entities, XML 1.0 vs. 1.1,
namespace handling, XInclude, ... . And: Why do the job twice?

What have we won with our own data model? The smart XML integration into Scala
will raise to a higher level. OK. A lot of application developer will be pleased
by the smart integration. But the developers requirements will grow also. At the
end complex XML operations will be based on external libraries (as today) - also
when we have done a good job.

Therefore, I thing, we have to focus at two activies:
- defining the degree of standard compliance of Scalas XML data model
- defining a DSL for the integration of external libraries and their APIs (and
let them use their own data model)

> ... does anything here not belong on this list? Have I missed anything?
> And, I think most would agree that v.next should place a high priority
> on meeting some other important requirements, to the extent that they
> don't conflict with the axiomatic design goals:
> - Retain source compatibility, to the extent feasible (e.g. with
> implicits)
> - Pattern matching should work better, especially w.r.t. namespaces.
> - Fix or design around as many known, serious bugs as possible
> ... any others?
> Here's a list of aspects that I think are definitely deserving of
> effort, but that I would suggest be kept as *non-goals* in the short
> term, although we want to keep them in mind so as not to design
> ourselves into a corner:
> - Validation (DTD, XML Schema, Relax-NG)
> - Transformation
> - Full XPath support
> - XML<->POSO binding
> - Schema/Scala type integration (sorry Greg! :)

I agree.

> I submit that once we've achieved sanity and minimal spec compliance in
> the absolute basics of object model, literals, parsing, serialization,
> etc., we'll have a good platform from which to attack all the more
> interesting use cases that we all want to see implemented, and those
> efforts will stand a much better chance of success.

> TIA for your thoughts, flames, and Math.signum return values. :)
> -0xe1a
>

[1] http://www.w3.org/TR/xml/
[2] http://www.w3.org/TR/xml11/
[3] http://www.w3.org/TR/xml-names/
[4] http://www.w3.org/TR/xml-names11/
[5] http://www.w3.org/TR/xml-infoset/
[6] http://www.w3.org/TR/xpath-datamodel/

Cheers, Jürgen

Tue, 2009-12-15, 09:07

#22

milessabin

Joined: 2008-08-11,

Re: ScalaDuce? was: Putting the cart before the horse?

On Mon, Dec 14, 2009 at 4:54 PM, Anthony B. Coates (Londata)
wrote:
> This has the makings of a Scala incubator project, except its too early to
> be thinking about code. Is there some kind of wiki-space that can be used
> for deliberating about this kind of thing? One day Google Wave might serve
> that purpose, but for now a wiki or similar would be good. Is there
> something like that as part of the incubator infrastructure, Miles?

There are google group "Pages"

http://groups.google.com/group/scala-incubator/web

and the github wiki (we'd need to create a project first, which is fine).

Cheers,

Miles

Tue, 2009-12-15, 11:07

#23

Mark Howe

Joined: 2009-10-22,

Re: Re: Putting the cart before the horse?

Le lundi 14 décembre 2009 à 17:31 +0000, Jürgen Purtz a écrit :

> As I have seen in the discussions of last week, many people decline DOM because
> it is mutal. I feel it's important to point out, that no one declined parent
> references for an immutual data model.

That's true, but the definition of "immutable" might need unpacking a
bit.

Functional languages have immutable trees, ie you can't* hack the tree
pointers manually. But you can build new trees using components of old
trees. For any XML data representation to be efficient for the sort of
operation that many tasks require (eg a rapid "parent" operation), ISTM
that we need an "up" pointer. And to build a new tree using bits of an
old tree in a representation that includes an "up" pointer, I think you
have to be able to change the value of an "up" pointer in at least one
place.

So, while the API may not expose mutability, ISTM that the underlying
structures must be mutable unless we are willing to take a serious
performance hit. Without some sort of "make a new node that links to
some existing nodes" functionality, don't we basically end up cloning
the entire document tree just to change one pointer? Even if the tree
contains gigabytes of data? If so, that sounds to me like an
instantiation of the reactionary Java programmer's Scala straw man, ie
that immutability is neat in theory but massively inefficient in
practice.

> If we really want to develop our own *general* Scala XML data model (only data
> model - the API is a second task) as an extention to the existing immutual
> classes in scala.xml, it should conform to a lot of standards:
>
> XML 1.0 [1]
> XML 1.1 [2]
> Namespaces in XML 1.0 [3]
> Namespaces in XML 1.1 [4]
> XML Information Set [5]
> XQuery 1.0 and XPath 2.0 Data Model (XDM) [6] (possibly at a later stage)
> All of them require parent references.
>
> I assume that we are able to do the job. But I denial that we can do much more.

If we want to keep the core functionality small, I think the first five
items on your list would do fine. They provide the basis for anyone to
experiment with the higher-level technologies, and if those experiments
go somewhere useful they can be grafted into the core technology because
they are compatible with the core technology.

> - defining the degree of standard compliance of Scalas XML data model
> - defining a DSL for the integration of external libraries and their APIs (and
> let them use their own data model)

I'd be inclined to say that we need 100% standards compliance *for the
core functionality*, which is another argument for keeping the core
functionality as small as possible. So, for example, Scala doesn't need
to provide XDM, but I think it does need to provide a data model from
which XDM can be constructed (ie Infosets). Offhand, I can't think what
100% standards compliance up to the Infoset level would rule out.

Multiple core XML data models seem to me to be something that should be
avoided at all cost. That approach would pretty much rule out the
possibility of code reuse between projects (ie where a neat trick in my
XDM parser finds application in your Schematron validator, or vice
versa).

It also raises the spectre of future Scala books following the Java and
Perl model of offering half a dozen incompatible ways of processing XML,
none of which necessarily covers all the functionality needed for a
particular project. I say this with feeling having at one point had 3
different low-level XML data models in one Perl project.

It would be much better if basic XML representation was common, and for
the higher-level APIs to all assume that basic representation as their
starting point. So, for example, my third-party XDM representation might
not be directly portable to every other project, but it should at least
be able to produce output in generic Scala infoset format, so that I
don't have to serialise it and re-parse it in order to use someone
else's third-party technology, and so that the input and output can be
represented using general-purpose first-class XML Scala objects.

* Lisp has rplca and rplcd operators which give direct access to tree
pointers. It is considered very bad form to use them in application
code. But their presence in the language allow efficient implementation
of immutable APIs in Lisp, rather than in the language used to write the
Lisp interpreter. Lisp machines simply could not have worked without
those operators - most of the interesting stuff would have to have been
written in assembler or C instead of in Lisp.

Tue, 2009-12-15, 12:27

#24

Anthony B. Coates

Joined: 2009-09-12,

Re: ScalaDuce? was: Putting the cart before the horse?

Thanks, Miles! A github project wiki sounds good to me. Anyone think otherwise? If there are no objections, can you set this up for us, please, or otherwise tell me how to set it up?

Thanks, Cheers, Tony.
--
(sent from my mobile phone)
Anthony B. Coates
Director and CTO
Londata Ltd
abcoates [at] londata [dot] com
UK: +44 (20) 8816 7700, US: +1 (239) 344 7700
Mobile/Cell: +44 (79) 0543 9026
Skype: abcoates
Data standards participant: genericode, ISO 20022 (ISO 15022 XML),
UN/CEFACT, MDDL, FpML, UBL.
http://www.londata.com/
----- Original message -----
> On Mon, Dec 14, 2009 at 4:54 PM, Anthony B. Coates (Londata)
> <abcoates [at] londata [dot] com> wrote:
> > This has the makings of a Scala incubator project, except its too early to
> > be thinking about code. Is there some kind of wiki-space that can be used
> > for deliberating about this kind of thing? One day Google Wave might serve
> > that purpose, but for now a wiki or similar would be good. Is there
> > something like that as part of the incubator infrastructure, Miles?
>
> There are google group "Pages"
>
> http://groups.google.com/group/scala-incubator/web
>
> and the github wiki (we'd need to create a project first, which is fine).
>
> Cheers,
>
>
> Miles
>
> --
> Miles Sabin
> tel: +44 (0)7813 944 528
> skype: milessabin
> http://www.chuusai.com/
> http://twitter.com/milessabin

Tue, 2009-12-15, 17:47

#25

milessabin

Joined: 2008-08-11,

Re: ScalaDuce? was: Putting the cart before the horse?

On Tue, Dec 15, 2009 at 8:35 AM, Anthony B. Coates wrote:
> Thanks, Miles! A github project wiki sounds good to me. Anyone think
> otherwise? If there are no objections, can you set this up for us, please,
> or otherwise tell me how to set it up?

I've created a scala-xml repository here,

http://github.com/scala-incubator/scala-xml

If you send me your github id I'll add you as a collaborator.

Cheers,

Miles

Tue, 2009-12-15, 20:07

#26

Meredith Gregory

Joined: 2008-12-17,

Re: ScalaDuce? was: Putting the cart before the horse?

Dear SX'ers, Miles,

i'm glad to see this effort happening. The trickiest part of this -- ihmo -- is crafting a solution that integrates smoothly with and allows for a Scala-LINQ-alike that has XQuery as the target instead of SQL. Of course, it may turn out that with direct XML support XQuery adds no value, but that seems unlikely in the short term. The XQuery engines like BDBXML have a pretty long head start and are pretty performant.

Frankly, i would prioritize this over supporting the new XPath stuff. i'm not at all convinced that interweaving XPath the way it's been done is such a good idea. However, i'm always happy to be educated.

Best wishes,

--greg

On Tue, Dec 15, 2009 at 8:44 AM, Miles Sabin <miles [at] milessabin [dot] com> wrote:

On Tue, Dec 15, 2009 at 8:35 AM, Anthony B. Coates <abcoates [at] londata [dot] com> wrote:
> Thanks, Miles! A github project wiki sounds good to me. Anyone think
> otherwise? If there are no objections, can you set this up for us, please,
> or otherwise tell me how to set it up?

I've created a scala-xml repository here,

http://github.com/scala-incubator/scala-xml

If you send me your github id I'll add you as a collaborator.

Cheers,

Miles

--
Miles Sabin
tel: +44 (0)7813 944 528
skype: milessabin
http://www.chuusai.com/
http://twitter.com/milessabin

--
L.G. Meredith
Managing Partner
Biosimilarity LLC
1219 NW 83rd St
Seattle, WA 98117

+1 206.650.3740

http://biosimilarity.blogspot.com

Tue, 2009-12-15, 20:37

#27

Anthony B. Coates

Joined: 2009-09-12,

Re: ScalaDuce? was: Putting the cart before the horse?

Greg, I'm not really sure why you are making a distinction between XQuery
and XPath. Most of XQuery 1.0 and XPath 2.0 are the same, they only
differ around the edges.

Cheers, Tony.

On Tue, 15 Dec 2009 19:02:37 -0000, Meredith Gregory
wrote:

> Dear SX'ers, Miles,
>
> i'm glad to see this effort happening. The trickiest part of this --
> ihmo --
> is crafting a solution that integrates smoothly with and allows for a
> Scala-LINQ-alike that has XQuery as the target instead of SQL. Of
> course, it
> may turn out that with direct XML support XQuery adds no value, but that
> seems unlikely in the short term. The XQuery engines like BDBXML have a
> pretty long head start and are pretty performant.
>
> Frankly, i would prioritize this over supporting the new XPath stuff. i'm
> not at all convinced that interweaving XPath the way it's been done is
> such
> a good idea. However, i'm always happy to be educated.
>
> Best wishes,
>
> --greg
>
> On Tue, Dec 15, 2009 at 8:44 AM, Miles Sabin
> wrote:
>
>> On Tue, Dec 15, 2009 at 8:35 AM, Anthony B. Coates
>>
>> wrote:
>> > Thanks, Miles! A github project wiki sounds good to me. Anyone think
>> > otherwise? If there are no objections, can you set this up for us,
>> please,
>> > or otherwise tell me how to set it up?
>>
>> I've created a scala-xml repository here,
>>
>> http://github.com/scala-incubator/scala-xml
>>
>> If you send me your github id I'll add you as a collaborator.
>>
>> Cheers,
>>
>>
>> Miles
>>
>> --
>> Miles Sabin
>> tel: +44 (0)7813 944 528
>> skype: milessabin
>> http://www.chuusai.com/
>> http://twitter.com/milessabin
>>
>
>
>

Tue, 2009-12-15, 22:37

#28

Stefan Zeiger

Joined: 2008-12-21,

Re: Re: Putting the cart before the horse?

Mark Howe wrote:
> that we need an "up" pointer. And to build a new tree using bits of an
> old tree in a representation that includes an "up" pointer, I think you
> have to be able to change the value of an "up" pointer in at least one
> place.
>
If you keep a reference to a parent in a node, you cannot share
sub-trees. Relocating a sub-tree only works for the limited case of
attaching a previously detached node at some other place. But you cannot
detach a node if it is (or appears to be) immutable! In most cases you
still have to copy the sub-tree. You could copy it only when needed
(attaching it if it is currently detached) but that would change its
parent reference so it does not appear immutable anymore. It would also
be a leaky abstraction because the cost of attaching a node depends on
whether it is already attached somewhere else.

I still think the basic XML model should use persistent trees (as it
does now). You get immutability, cheap navigation and full sub-tree
sharing. A zipper over the tree structure could provide (somewhat more
expensive) navigation along all axes of the XPath model and a convenient
way of creating "modified" trees (still with full sub-tree sharing).

A mutable XML model (similar to DOM) should be separate from this, with
common functionality abstracted through traits, as in Scala's mutable
and immutable collection libraries.
> Multiple core XML data models seem to me to be something that should be
> avoided at all cost. That approach would pretty much rule out the
> possibility of code reuse between projects (ie where a neat trick in my
> XDM parser finds application in your Schematron validator, or vice
> versa).
>
The same can be said of the collection libraries. You need to design
your code carefully if you want it to work with both, mutable and
immutable collections.

Best regards,
Stefan Zeiger

Tue, 2009-12-15, 22:47

#29

Mark Howe

Joined: 2009-10-22,

Re: Re: Putting the cart before the horse?

Thanks, you make some important points that I had overlooked.

Le mardi 15 décembre 2009 à 22:28 +0100, Stefan Zeiger a écrit :

> If you keep a reference to a parent in a node, you cannot share
> sub-trees.

Not directly, but can't you clone the node and give it the same children
etc as the old node but a different parent? That way the old node is
unchanged, old structures that rely on it will work as before, but the
new structure gets the old branch.

The trickier case seems to be preserving the structure of the tree near
the root that *hasn't* changed without resorting to deep copy, but if
you have a quick way of moving up and down trees it should be possible
to link to a small number of large, unchanged branches, and to minimize
the number of new nodes that are required.

I think the above qualifies as immutable, but it could be presented as
"move/delete/add branch" functionality.

Tue, 2009-12-15, 23:07

#30

Mark Howe

Joined: 2009-10-22,

Re: Re: Putting the cart before the horse?

Sorry - after another two minutes of thought the answer to

Le mardi 15 décembre 2009 à 22:45 +0100, Mark Howe a écrit :
> Thanks, you make some important points that I had overlooked.
>
> Le mardi 15 décembre 2009 à 22:28 +0100, Stefan Zeiger a écrit :
>
> > If you keep a reference to a parent in a node, you cannot share
> > sub-trees.
>
> Not directly, but can't you clone the node and give it the same children
> etc as the old node but a different parent?

is obviously "no" (because a tree structure is recursive and you just
push the multiple parent problem down a level).

So I guess immutable it is (and that's not a problem for my own area of
interest, because XSLT doesn't modify the XML source document).

Wed, 2009-12-16, 00:27

#31

Meredith Gregory

Joined: 2008-12-17,

Re: ScalaDuce? was: Putting the cart before the horse?

Dear Anthony,

The interweaving of XPath and XQuery strikes me as possibly problematic. i'm happy to be disabused of this notion -- i've devised much more highly recursive interdependencies with excellent motivation and sound semantic grounding -- but it seems like there might be some semantic issues.

Best wishes,

--greg

On Tue, Dec 15, 2009 at 11:31 AM, Anthony B. Coates (Londata) <abcoates [at] londata [dot] com> wrote:

Greg, I'm not really sure why you are making a distinction between XQuery and XPath. Most of XQuery 1.0 and XPath 2.0 are the same, they only differ around the edges.

Cheers, Tony.

On Tue, 15 Dec 2009 19:02:37 -0000, Meredith Gregory <lgreg [dot] meredith [at] gmail [dot] com> wrote:

Dear SX'ers, Miles,

i'm glad to see this effort happening. The trickiest part of this -- ihmo --
is crafting a solution that integrates smoothly with and allows for a
Scala-LINQ-alike that has XQuery as the target instead of SQL. Of course, it
may turn out that with direct XML support XQuery adds no value, but that
seems unlikely in the short term. The XQuery engines like BDBXML have a
pretty long head start and are pretty performant.

Frankly, i would prioritize this over supporting the new XPath stuff. i'm
not at all convinced that interweaving XPath the way it's been done is such
a good idea. However, i'm always happy to be educated.

Best wishes,

--greg

On Tue, Dec 15, 2009 at 8:44 AM, Miles Sabin <miles [at] milessabin [dot] com> wrote:

On Tue, Dec 15, 2009 at 8:35 AM, Anthony B. Coates <abcoates [at] londata [dot] com>
wrote:
> Thanks, Miles! A github project wiki sounds good to me. Anyone think
> otherwise? If there are no objections, can you set this up for us,
please,
> or otherwise tell me how to set it up?

I've created a scala-xml repository here,

http://github.com/scala-incubator/scala-xml

If you send me your github id I'll add you as a collaborator.

Cheers,

Miles

--
Miles Sabin
tel: +44 (0)7813 944 528
skype: milessabin
http://www.chuusai.com/
http://twitter.com/milessabin

Wed, 2009-12-16, 09:57

#32

Anthony B. Coates

Joined: 2009-09-12,

Re: ScalaDuce? was: Putting the cart before the horse?

It's not really a matter of interweaving. XPath & XQuery are handled by the same W3C working group, & XPath 2.0 is a (large) subset of XQuery 1.0.

Cheers, Tony.
--
(sent from my mobile phone)
Anthony B. Coates
Director and CTO
Londata Ltd
abcoates [at] londata [dot] com
UK: +44 (20) 8816 7700, US: +1 (239) 344 7700
Mobile/Cell: +44 (79) 0543 9026
Skype: abcoates
Data standards participant: genericode, ISO 20022 (ISO 15022 XML),
UN/CEFACT, MDDL, FpML, UBL.
http://www.londata.com/
----- Original message -----
> Dear Anthony,
>
> The interweaving of XPath and XQuery strikes me as possibly problematic. i'm
> happy to be disabused of this notion -- i've devised much more highly
> recursive interdependencies with excellent motivation and sound semantic
> grounding -- but it seems like there might be some semantic issues.
>
> Best wishes,
>
> --greg
>
> On Tue, Dec 15, 2009 at 11:31 AM, Anthony B. Coates (Londata) <
> abcoates [at] londata [dot] com> wrote:
>
> > Greg, I'm not really sure why you are making a distinction between XQuery
> > and XPath. Most of XQuery 1.0 and XPath 2.0 are the same, they only differ
> > around the edges.
> >
> > Cheers, Tony.
> >
> >
> > On Tue, 15 Dec 2009 19:02:37 -0000, Meredith Gregory <
> > lgreg [dot] meredith [at] gmail [dot] com> wrote:
> >
> > Dear SX'ers, Miles,
> > >
> > > i'm glad to see this effort happening. The trickiest part of this -- ihmo
> > > --
> > > is crafting a solution that integrates smoothly with and allows for a
> > > Scala-LINQ-alike that has XQuery as the target instead of SQL. Of course,
> > > it
> > > may turn out that with direct XML support XQuery adds no value, but that
> > > seems unlikely in the short term. The XQuery engines like BDBXML have a
> > > pretty long head start and are pretty performant.
> > >
> > > Frankly, i would prioritize this over supporting the new XPath stuff. i'm
> > > not at all convinced that interweaving XPath the way it's been done is
> > > such
> > > a good idea. However, i'm always happy to be educated.
> > >
> > > Best wishes,
> > >
> > > --greg
> > >
> > > On Tue, Dec 15, 2009 at 8:44 AM, Miles Sabin <miles [at] milessabin [dot] com>
> > > wrote:
> > >
> > > On Tue, Dec 15, 2009 at 8:35 AM, Anthony B. Coates <abcoates [at] londata [dot] com
> > > > >
> > > > wrote:
> > > > > Thanks, Miles! A github project wiki sounds good to me. Anyone think
> > > > > otherwise? If there are no objections, can you set this up for us,
> > > > please,
> > > > > or otherwise tell me how to set it up?
> > > >
> > > > I've created a scala-xml repository here,
> > > >
> > > > http://github.com/scala-incubator/scala-xml
> > > >
> > > > If you send me your github id I'll add you as a collaborator.
> > > >
> > > > Cheers,
> > > >
> > > >
> > > > Miles
> > > >
> > > > --
> > > > Miles Sabin
> > > > tel: +44 (0)7813 944 528
> > > > skype: milessabin
> > > > http://www.chuusai.com/
> > > > http://twitter.com/milessabin
> > > >
> > > >
> > >
> > >
> > >
> >
> > --
> > Anthony B. Coates
> > Director and CTO
> > Londata Ltd
> > abcoates [at] londata [dot] com
> > UK: +44 (20) 8816 7700, US: +1 (239) 344 7700
> > Mobile/Cell: +44 (79) 0543 9026
> > Skype: abcoates
> > Data standards participant: genericode, ISO 20022 (ISO 15022 XML),
> > UN/CEFACT, MDDL, FpML, UBL.
> > http://www.londata.com/
> >
>
>
>
> --
> L.G. Meredith
> Managing Partner
> Biosimilarity LLC
> 1219 NW 83rd St
> Seattle, WA 98117
>
> +1 206.650.3740
>
> http://biosimilarity.blogspot.com

Copyright © 2012 École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland