This page is no longer maintained — Please continue to the home page at www.scala-lang.org

XML design, part 2: Parsing, Internal Data Model

14 replies
Jürgen Purtz
Joined: 2009-12-03,
User offline. Last seen 1 year 44 weeks ago.

The actual (2.7.7) object hierarchie in package scala.xml is:

Seq[Node]
NodeSeq (abstract)
Document
Node (abstract)
Elem
Group
SpecialNode (abstract)
Atom
PCData
Text
Unparsed
Comment
EnitityRef
ProcInstr

There are only references from 'parent' to 'child'

The DOM interfaces defined by org.w3c.dom is:

* All interfaces extends 'Node'. From 'Node' they inherit a reference to its
parent node.
* Document -- Element (maximum of one), ProcessingInstruction, Comment,
DocumentType (maximum of one)
* DocumentFragment -- Element, ProcessingInstruction, Comment, Text,
CDATASection, EntityReference
* DocumentType -- no children
* EntityReference -- Element, ProcessingInstruction, Comment, Text,
CDATASection, EntityReference
* Element -- Element, Text, Comment, ProcessingInstruction, CDATASection,
EntityReference
* Attr -- Text, EntityReference
* ProcessingInstruction -- no children
* Comment -- no children
* Text -- no children
* CDATASection -- no children
* Entity -- Element, ProcessingInstruction, Comment, Text, CDATASection,
EntityReference
* Notation -- no children

You can find more details at: http://www.w3.org/TR/DOM-Level-2-Core/core.html

My personal estimation is that we must have a DOM compatible interface:
1. DOM is a standard
2. DOM covers all XML aspects
3. DOM offers references in both directions, which is necessary for XPath and
all standards based on XPath
4. DOM can serve as a stable basis for all subsequent activities.
5. DOM is modular defined.

I know that it is a great effort to implement the complete DOM model. It may
overburden our community. Fortunately help is very close: The DOM interface and
a stable implementation contributed by Xerces are within the JRE. Maybe we need
a small layer to add Scala specific things, that's all. We should not change our
actual data model. Keep it 'as is' - maybe some day we can deprecate it. Let us
focus our activities to a wrapper layer over DOM.

The same holds true for parsing (tree based: DOM, push event based: SAX, pull
event based = StAX) in Scala. Actually most of the job is done by trait
MarkupParser in package scala.xml.parsing. Why this? Why not using the things in
JRE? Again there is an interface and an implementation by Xerces. btw: routine
scala.xml.XML.loadFile already uses Xerces.

The only problem I see is Scala's comming .NET version. But there are also DOM
implementations and XML parser - hopefully the interfaces are not very far from
each other.

Cheers, Jürgen

David Pollak
Joined: 2008-12-16,
User offline. Last seen 42 years 45 weeks ago.
Re: XML design, part 2: Parsing, Internal Data Model


On Thu, Dec 10, 2009 at 9:25 AM, Jürgen Purtz <juergen [at] purtz [dot] de> wrote:
The actual (2.7.7) object hierarchie in package scala.xml is:

Seq[Node]
 NodeSeq (abstract)
   Document
   Node (abstract)
     Elem
     Group
     SpecialNode (abstract)
       Atom
         PCData
         Text
         Unparsed
       Comment
       EnitityRef
       ProcInstr

There are only references from 'parent' to 'child'


The DOM interfaces defined by org.w3c.dom is:

 * All interfaces extends 'Node'. From 'Node' they inherit a reference to its
   parent node.
 * Document -- Element (maximum of one), ProcessingInstruction, Comment,
   DocumentType (maximum of one)
 * DocumentFragment -- Element, ProcessingInstruction, Comment, Text,
   CDATASection, EntityReference
 * DocumentType -- no children
 * EntityReference -- Element, ProcessingInstruction, Comment, Text,
   CDATASection, EntityReference
 * Element -- Element, Text, Comment, ProcessingInstruction, CDATASection,
   EntityReference
 * Attr -- Text, EntityReference
 * ProcessingInstruction -- no children
 * Comment -- no children
 * Text -- no children
 * CDATASection -- no children
 * Entity -- Element, ProcessingInstruction, Comment, Text, CDATASection,
   EntityReference
 * Notation -- no children

You can find more details at: http://www.w3.org/TR/DOM-Level-2-Core/core.html

My personal estimation is that we must have a DOM compatible interface:
 1. DOM is a standard
 2. DOM covers all XML aspects
 3. DOM offers references in both directions, which is necessary for XPath and
    all standards based on XPath
 4. DOM can serve as a stable basis for all subsequent activities.
 5. DOM is modular defined.

I know that it is a great effort to implement the complete DOM model. It may
overburden our community. Fortunately help is very close: The DOM interface and
a stable implementation contributed by Xerces are within the JRE. Maybe we need
a small layer to add Scala specific things, that's all. We should not change our
actual data model. Keep it 'as is' - maybe some day we can deprecate it. Let us
focus our activities to a wrapper layer over DOM.

Actually, there is a huge difference between Scala's XML representation and the W3C DOM: mutability.

The reason that you don't have references to the parent nodes in Scala's XML representation is that it would be impossible to do and still keep the XML nodes immutable.

Immutability is a huge boon for performance (we use and rely on it extensively in Lift).  It means that you don't have to make defensive copies of the XML.

Compare this with the gymnastics that are necessary to create DOM nodes in the browser and attach them once (and only once) to a particular parent.  This has led to a tremendous (and disproportionate) number of bugs in Lift's Scala XML -> Browser JavaScript support.

So, after using the W3C DOM model for 10+ years and using Scala's immutable XML model for 3 years, I can say pretty safely that the immutable model leads to more performant code and fewer defects.  If you want to use the W3C's model, use the Java stuff, but please do not force this model on Scala's XML model.
 


The same holds true for parsing (tree based: DOM, push event based: SAX, pull
event based = StAX) in Scala. Actually most of the job is done by trait
MarkupParser in package scala.xml.parsing. Why this? Why not using the things in
JRE? Again there is an interface and an implementation by Xerces. btw: routine
scala.xml.XML.loadFile already uses Xerces.

The only problem I see is Scala's comming .NET version. But there are also DOM
implementations and XML parser - hopefully the interfaces are not very far from
each other.


Cheers, Jürgen





--
Lift, the simply functional web framework http://liftweb.net
Beginning Scala http://www.apress.com/book/view/1430219890
Follow me: http://twitter.com/dpp
Surf the harmonics
Alex Cruise
Joined: 2008-12-17,
User offline. Last seen 2 years 26 weeks ago.
Re: XML design, part 2: Parsing, Internal Data Model

On 12/10/2009 9:56 AM, David Pollak wrote:
> So, after using the W3C DOM model for 10+ years and using Scala's
> immutable XML model for 3 years, I can say pretty safely that the
> immutable model leads to more performant code and fewer defects. If
> you want to use the W3C's model, use the Java stuff, but please do not
> force this model on Scala's XML model.
FWIW I agree that scala.xml's public immutability is and should remain
an axiomatic design goal.

It may be worthwhile to consider transforming or exposing
temporarily-mutable versions of scala.xml trees for various reasons, but
if so, the library and/or documentation should be written to prevent the
mutable versions escaping into typical user code, to the extent feasible.

-0xe1a

Jorge Ortiz
Joined: 2008-12-16,
User offline. Last seen 29 weeks 3 days ago.
Re: XML design, part 2: Parsing, Internal Data Model
It is possible to create an immutable XML representation with the ability for a child to reference it's parents using the Zipper data structure. However, this would most likely have an impact on the API and performance (for certain use cases) of the XML libraries.

--j

On Thu, Dec 10, 2009 at 9:56 AM, David Pollak <feeder [dot] of [dot] the [dot] bears [at] gmail [dot] com> wrote:


On Thu, Dec 10, 2009 at 9:25 AM, Jürgen Purtz <juergen [at] purtz [dot] de> wrote:
The actual (2.7.7) object hierarchie in package scala.xml is:

Seq[Node]
 NodeSeq (abstract)
   Document
   Node (abstract)
     Elem
     Group
     SpecialNode (abstract)
       Atom
         PCData
         Text
         Unparsed
       Comment
       EnitityRef
       ProcInstr

There are only references from 'parent' to 'child'


The DOM interfaces defined by org.w3c.dom is:

 * All interfaces extends 'Node'. From 'Node' they inherit a reference to its
   parent node.
 * Document -- Element (maximum of one), ProcessingInstruction, Comment,
   DocumentType (maximum of one)
 * DocumentFragment -- Element, ProcessingInstruction, Comment, Text,
   CDATASection, EntityReference
 * DocumentType -- no children
 * EntityReference -- Element, ProcessingInstruction, Comment, Text,
   CDATASection, EntityReference
 * Element -- Element, Text, Comment, ProcessingInstruction, CDATASection,
   EntityReference
 * Attr -- Text, EntityReference
 * ProcessingInstruction -- no children
 * Comment -- no children
 * Text -- no children
 * CDATASection -- no children
 * Entity -- Element, ProcessingInstruction, Comment, Text, CDATASection,
   EntityReference
 * Notation -- no children

You can find more details at: http://www.w3.org/TR/DOM-Level-2-Core/core.html

My personal estimation is that we must have a DOM compatible interface:
 1. DOM is a standard
 2. DOM covers all XML aspects
 3. DOM offers references in both directions, which is necessary for XPath and
    all standards based on XPath
 4. DOM can serve as a stable basis for all subsequent activities.
 5. DOM is modular defined.

I know that it is a great effort to implement the complete DOM model. It may
overburden our community. Fortunately help is very close: The DOM interface and
a stable implementation contributed by Xerces are within the JRE. Maybe we need
a small layer to add Scala specific things, that's all. We should not change our
actual data model. Keep it 'as is' - maybe some day we can deprecate it. Let us
focus our activities to a wrapper layer over DOM.

Actually, there is a huge difference between Scala's XML representation and the W3C DOM: mutability.

The reason that you don't have references to the parent nodes in Scala's XML representation is that it would be impossible to do and still keep the XML nodes immutable.

Immutability is a huge boon for performance (we use and rely on it extensively in Lift).  It means that you don't have to make defensive copies of the XML.

Compare this with the gymnastics that are necessary to create DOM nodes in the browser and attach them once (and only once) to a particular parent.  This has led to a tremendous (and disproportionate) number of bugs in Lift's Scala XML -> Browser JavaScript support.

So, after using the W3C DOM model for 10+ years and using Scala's immutable XML model for 3 years, I can say pretty safely that the immutable model leads to more performant code and fewer defects.  If you want to use the W3C's model, use the Java stuff, but please do not force this model on Scala's XML model.
 


The same holds true for parsing (tree based: DOM, push event based: SAX, pull
event based = StAX) in Scala. Actually most of the job is done by trait
MarkupParser in package scala.xml.parsing. Why this? Why not using the things in
JRE? Again there is an interface and an implementation by Xerces. btw: routine
scala.xml.XML.loadFile already uses Xerces.

The only problem I see is Scala's comming .NET version. But there are also DOM
implementations and XML parser - hopefully the interfaces are not very far from
each other.


Cheers, Jürgen





--
Lift, the simply functional web framework http://liftweb.net
Beginning Scala http://www.apress.com/book/view/1430219890
Follow me: http://twitter.com/dpp
Surf the harmonics

Mark Howe
Joined: 2009-10-22,
User offline. Last seen 42 years 45 weeks ago.
Re: XML design, part 2: Parsing, Internal Data Model

Le jeudi 10 décembre 2009 à 09:56 -0800, David Pollak a écrit :

> The reason that you don't have references to the parent nodes in
> Scala's XML representation is that it would be impossible to do and
> still keep the XML nodes immutable.

Immutable Good. (I think I learned that from your excellent book :-))

But just because Java requires application-level mutability to implement
a parent relationship, I'm not sure it follows that this is the only way
to implement a parent relationship. I'm sure I was modifying parent
relationships using immutable Lisp structures before I knew what a
parent relationship was.

Section 5 of the W3C XDM standard says, of the list of accessors that
includes "parent":

"These are not functions in the literal sense; they are not available
for users or applications to call directly. Rather they are descriptions
of the information that an implementation of the data model must expose
to the application."

In other words, XDM compliance only requires that the parent
relationship be exposed by some mechanism or other, not that it be a
mutable value. Implementing it efficiently probably does require some
sort of mutable stuff under the hood, eg two-way pointers, but there's
no reason why that mutable stuff has to be available for explicit
manipulation and misuse by the application programmer. My understanding
is that at least some of the immutable functionality of Scala already
turns into mutable code under the hood.

> Compare this with the gymnastics that are necessary to create DOM
> nodes in the browser and attach them once (and only once) to a
> particular parent. This has led to a tremendous (and
> disproportionate) number of bugs in Lift's Scala XML -> Browser
> JavaScript support.

I believe you. But it sounds like you were trying to build W3C-like
structures from scratch using Javascript, so I'm impressed you achieved
anything other than bugs! The suggestion elsewhere of putting a
Scala-like front end on, say, Xerces-J surely wouldn't suffer from the
difficulty you describe because all the messy stuff is either handled by
Xerces-J or would be handled by the Scala-like front end.

> If you want to use the W3C's model, use the Java stuff, but please do
> not force this model on Scala's XML model.

I'd consider living with almost any Scala XML model if someone could
just confirm that it can be trusted to handle namespaces correctly. Does

"will not properly stratify namespace bindings"

mean that Scala doesn't fully implement XML namespaces? If so, this is
surely something that needs fixing. It's hard to make first-class XML
handling a selling point for Scala if the first-class XML handling
doesn't work for some XML applications. It's not great if someone who
starts developing using Scala XML handling gets 95% of the way through a
project and then discovers that the last 5% is impossible, only to be
told "If you want to use the W3C's model, use the Java stuff". That
would sound like Scala required more refactoring than Java, which is
definitely off-message...

FWIW, I'm not a great fan of DOM APIs either, especially as no two
implementations seem to work quite the same way and most implementations
seem to be non-strict supersets of the "standard". I'm more interested
in XDM, because without XDM it's going to be very hard to implement W3C
specs that build on XDM, notably XPath 2.0 and XQuery. For those
applications of XDM I don't think you need *mutable* parents. You just
need a quick way of finding the parent you found when you parsed the
original XML document.

Stefan Zeiger
Joined: 2008-12-21,
User offline. Last seen 27 weeks 3 days ago.
Re: XML design, part 2: Parsing, Internal Data Model

Jürgen Purtz wrote:
> There are only references from 'parent' to 'child'
>

And that's a big advantage in many cases. I wish there was a good
immutable XML API for Java. That could make some XML manipulation
scenarios a lot simpler (with better performance and lower memory
requirements, too).
> My personal estimation is that we must have a DOM compatible interface:
> 1. DOM is a standard
> 2. DOM covers all XML aspects
> 3. DOM offers references in both directions, which is necessary for XPath and
> all standards based on XPath
> 4. DOM can serve as a stable basis for all subsequent activities.
> 5. DOM is modular defined.
>

I think 3 and 4 contradict each other. A mutable model (or at least one
with references going upwards, so that subtrees cannot be shared) does
not make a good foundation to build upon.

I'd also like to add a #6: Java already provides this. So do several
3d-party libraries (which exist because nobody wants to use the ugly
Java DOM API).

A library like JDOM or XOM could probably be "pimped" for Scala with
little effort (or even ported to Scala) for those times when you really
want a DOM. What you cannot currently do with a non-standard library is
to use XML literals for constructors and patterns. It would be great if
the Scala compiler could be instructed to use a different object model.
In the simplest case this could be accomplished with an annotation that
tells the compiler to prefix all constructor and extractor calls
generated from XML syntax within the annotation's scope with a package
other than scala.xml. Then a mutable DOM-like library could be a first
class citizen in Scala, whether it's part of the standard library or a
3rd-party project.
> a small layer to add Scala specific things, that's all. We should not change our
> actual data model. Keep it 'as is' - maybe some day we can deprecate it. Let us
> focus our activities to a wrapper layer over DOM.
>

I think the model should be fixed rather sooner than later.

-sz

Jürgen Purtz
Joined: 2009-12-03,
User offline. Last seen 1 year 44 weeks ago.
Data Model
Hi,

I have plans to build a XML data model which differs from Scalas actual data model in the following items:
  • It should comply with the specifications in [xml], [ns] and [infoset].
  • It should be extensible to hold  information specified in [xdm].
  • It should be ONLY a data model: There will be no parsing class. Parsing will be done by an existing SAX parser.
  • According to Scalas existing data model it will be imutable. (But nodes will contain an 'up reference' to their parent node.)
  • Additionally there should be a package containing some 'syntactical sugar' for typical XML actions, eg.: conversion to and from other data models, XPath/XSLT/XQuery evaluation, ... . In essence the package is nothing more than a thin wrapper layer over existing JARs.
  • [xdm] specifies a smaller number of node types than [infoset]. Nevertheless the intended data model should contain all node types - with type information only for those node types brought up by [xdm].

Until now I have done a case study to build a data model which is loosly inspired by SAX events. It handles basic information items about most node types of [infoset]. This will be the basis for my future work.

As W3C defines a lot of information items in their publications but no API to them (which is contrary to the DOM specification where an IDL is specified), I'm doubtful about the API to the planed data model. Has anyone a proposal or some ideas?


Will my plans be helpfull to the community? Please give me some feedback.

Thanks,  Jürgen

[XML] http://www.w3.org/TR/xml11/
[NS] http://www.w3.org/TR/xml-names11/
[INFOSET] http://www.w3.org/TR/xml-infoset/
[XDM] http://www.w3.org/TR/xpath-datamodel/

Kevin Wright
Joined: 2009-06-09,
User offline. Last seen 49 weeks 3 days ago.
Re: Data Model
You might want to have a chat with Anthony Coates, I know he's actively working on the Scala XML implementation right now.


2010/1/8 Jürgen Purtz <juergen [at] purtz [dot] de>
Hi,

I have plans to build a XML data model which differs from Scalas actual data model in the following items:
  • It should comply with the specifications in [xml], [ns] and [infoset].
  • It should be extensible to hold  information specified in [xdm].
  • It should be ONLY a data model: There will be no parsing class. Parsing will be done by an existing SAX parser.
  • According to Scalas existing data model it will be imutable. (But nodes will contain an 'up reference' to their parent node.)
  • Additionally there should be a package containing some 'syntactical sugar' for typical XML actions, eg.: conversion to and from other data models, XPath/XSLT/XQuery evaluation, ... . In essence the package is nothing more than a thin wrapper layer over existing JARs.
  • [xdm] specifies a smaller number of node types than [infoset]. Nevertheless the intended data model should contain all node types - with type information only for those node types brought up by [xdm].

Until now I have done a case study to build a data model which is loosly inspired by SAX events. It handles basic information items about most node types of [infoset]. This will be the basis for my future work.

As W3C defines a lot of information items in their publications but no API to them (which is contrary to the DOM specification where an IDL is specified), I'm doubtful about the API to the planed data model. Has anyone a proposal or some ideas?


Will my plans be helpfull to the community? Please give me some feedback.

Thanks,  Jürgen

[XML] http://www.w3.org/TR/xml11/
[NS] http://www.w3.org/TR/xml-names11/
[INFOSET] http://www.w3.org/TR/xml-infoset/
[XDM] http://www.w3.org/TR/xpath-datamodel/




--
Kevin Wright

mail/google talk: kev [dot] lee [dot] wright [at] googlemail [dot] com
wave: kev [dot] lee [dot] wright [at] googlewave [dot] com
skype: kev.lee.wright
twitter: @thecoda

Mark Howe
Joined: 2009-10-22,
User offline. Last seen 42 years 45 weeks ago.
Re: Data Model

Jürgen, I think this sounds great. I started experimenting along these
lines myself, but suspect that you have a rather better grasp of The
Scala Way than me at this point.

On the detail:

Le vendredi 08 janvier 2010 à 17:08 +0100, Jürgen Purtz a écrit :

> * Additionally there should be a package containing some
> 'syntactical sugar' for typical XML actions, eg.: conversion
> to and from other data models, XPath/XSLT/XQuery
> evaluation, ... . In essence the package is nothing more than
> a thin wrapper layer over existing JARs.

I'm not sure how the thin wrapper layer works on top of your own data
representation. If, for example, you wanted to use Saxon to do your XSLT
work, you'd surely have to use a data representation that Saxon knows
about, or extend the open source version of Saxon to use your data
representation, or convert to and from a Saxon data representation (and
that last option is IMO to be avoided at all costs).

> As W3C defines a lot of information items in their publications but no
> API to them (which is contrary to the DOM specification where an IDL
> is specified), I'm doubtful about the API to the planed data model.
> Has anyone a proposal or some ideas?

the XML spec does list some accessor functions. It says they are not
intended to be an API, but my thinking was to build an API that stuck
quite closely to that accessor function list.

I'd be happy to help with this sort of initiative wherever it is useful,
subject to the usual constraints of time and competence.

Jürgen Purtz
Joined: 2009-12-03,
User offline. Last seen 1 year 44 weeks ago.
UML

Does anyone know a UML tool which is able to create a class diagram out
of Scala source?

Cheers, Jürgen

Kevin Wright
Joined: 2009-06-09,
User offline. Last seen 49 weeks 3 days ago.
Re: UML
hmm, shouldn't be too difficult.
Use a plugin to grap the AST just after the typer phase, output as some form of XML, then transform into the file-type of your choice.

2010/1/14 Jürgen Purtz <juergen [at] purtz [dot] de>
Does anyone know a UML tool which is able to create a class diagram out of Scala source?

Cheers, Jürgen






--
Kevin Wright

mail/google talk: kev [dot] lee [dot] wright [at] googlemail [dot] com
wave: kev [dot] lee [dot] wright [at] googlewave [dot] com
skype: kev.lee.wright
twitter: @thecoda

Jürgen Purtz
Joined: 2009-12-03,
User offline. Last seen 1 year 44 weeks ago.
PSVI InfoSet

Information about element and attribute type, type definition, validity
and other meta information can be accessed via SAX2 interface
PSVIProvider. But this is (as the name Post Schema Validation Infoset
says) only possible after the instance is validated against a schema.
Does anyone know how to figure out meta information, if a SAX parser has
to use DTD instead of schema?

Kind regards, Jürgen

Mark Howe
Joined: 2009-10-22,
User offline. Last seen 42 years 45 weeks ago.
Re: PSVI InfoSet

I thought that's what

import org.xml.sax.DTDHandler;

does, although in practice the amount of metainformation you get from
the DTD is very limited compared with a PSVI. Thus, at various points in
the W3C in XDM spec, we read "type-name: All [whatever] nodes
constructed from an infoset have the type xs:untyped". The DTD remains
useful for things like the is-id and is-idrefs settings.

(While I think of it, do write your own entity resolver with its own
user agent name, and make it cache, as mindless thrashing of the W3C DTD
servers by Java apps seems to have resulting in them blocking the
default resolver.)

Jürgen Purtz
Joined: 2009-12-03,
User offline. Last seen 1 year 44 weeks ago.
Re: PSVI InfoSet

Hi Mark,

ok, DTDHandler events work as expected. But it will be a long way to
synchronize this information with PSVI information.

Thanks, Jürgen

Mark Howe schrieb:
> I thought that's what
>
> import org.xml.sax.DTDHandler;
>
> does, although in practice the amount of metainformation you get from
> the DTD is very limited compared with a PSVI. Thus, at various points in
> the W3C in XDM spec, we read "type-name: All [whatever] nodes
> constructed from an infoset have the type xs:untyped". The DTD remains
> useful for things like the is-id and is-idrefs settings.
>
> (While I think of it, do write your own entity resolver with its own
> user agent name, and make it cache, as mindless thrashing of the W3C DTD
> servers by Java apps seems to have resulting in them blocking the
> default resolver.)
>
>
>

Jürgen Purtz
Joined: 2009-12-03,
User offline. Last seen 1 year 44 weeks ago.
Re: Data Model
Hi,

there is a first draft of the announced implementation of a data model.

What does it contain?
  • Container for all node types of [infoset] beside special instance variables:
    • Attribute: specified, attributeType, references
    • Document: documentElement, notation, unparsedEntity, allDeclarationProcessed
    • Element: namespace, prefix, namespaceAttribute, inscopeNamespace
    • ProcessingInstruction: baseURI, notation
    • Namespace, UnexpandedEntityRef, UnpasedEntity, Notation: all
  • Construction of tree and serialization for the implemented node types. The construction process is based on SAX2.
  • Immutability: Collections are implemented as scala.List, instance variables either as val or as var with public getters and protected setters
  • Checking for validity against an internal or external DTD or a schema
  • Navigation in the child and parent axis
  • A small number of JUnit4 test cases

What is missing?
  • All type related information resulting of a DTD or a schema
  • A well defined API
  • Conversions, e.g. SAXSource or DOMSource (actually there is only 'toString' respectively 'serialize' - to compare results with the original XML instance)
  • Optimizations: a XML file of about 100 MB needs about 1 GB RAM and 30 seconds to parse and build the tree (on an 3-4 years old AMD processor with 1.8 Ghz)
  • Typical Scala features: I'm relative new to Scala and I did the job mainly to improve my Scala know how. So I'm sure that I have overseen a lot of elaborated Scala features.

If anyone is interested in the source code I will send him a ZIP file. Please give me an advice.


Kind regards, Jürgen


a4f6b02a1001080821sc0dc1b5h75b56af2712710fe [at] mail [dot] gmail [dot] com" type="cite">
2010/1/8 Jürgen Purtz <juergen [at] purtz [dot] de" rel="nofollow">juergen [at] purtz [dot] de>
Hi,

I have plans to build a XML data model which differs from Scalas actual data model in the following items:
  • It should comply with the specifications in [xml], [ns] and [infoset].
  • It should be extensible to hold  information specified in [xdm].
  • It should be ONLY a data model: There will be no parsing class. Parsing will be done by an existing SAX parser.
  • According to Scalas existing data model it will be imutable. (But nodes will contain an 'up reference' to their parent node.)
  • Additionally there should be a package containing some 'syntactical sugar' for typical XML actions, eg.: conversion to and from other data models, XPath/XSLT/XQuery evaluation, ... . In essence the package is nothing more than a thin wrapper layer over existing JARs.
  • [xdm] specifies a smaller number of node types than [infoset]. Nevertheless the intended data model should contain all node types - with type information only for those node types brought up by [xdm].

Until now I have done a case study to build a data model which is loosly inspired by SAX events. It handles basic information items about most node types of [infoset]. This will be the basis for my future work.

As W3C defines a lot of information items in their publications but no API to them (which is contrary to the DOM specification where an IDL is specified), I'm doubtful about the API to the planed data model. Has anyone a proposal or some ideas?


Will my plans be helpfull to the community? Please give me some feedback.

Thanks,  Jürgen

[XML] http://www.w3.org/TR/xml11/
[NS] http://www.w3.org/TR/xml-names11/
[INFOSET] http://www.w3.org/TR/xml-infoset/
[XDM] http://www.w3.org/TR/xpath-datamodel/

    


Copyright © 2012 École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland