This page is no longer maintained — Please continue to the home page at

Re: Data Model

1 reply
Alex Cruise
Joined: 2008-12-17,
User offline. Last seen 2 years 26 weeks ago.
Jürgen, that sounds great, I'd like to see it. Why not put it up on Github?


-0xe1a----- Reply message -----
From: "Jürgen Purtz" <juergen [at] purtz [dot] de>
Date: Tue, Jan 19, 2010 5:51 AM
Subject: [scala-xml] Data Model
To: <scala-xml [at] listes [dot] epfl [dot] ch>


there is a first draft of the announced implementation of a data model.

What does it contain?
  • Container for all node types of [infoset] beside special instance variables:
    • Attribute: specified, attributeType, references
    • Document: documentElement, notation, unparsedEntity, allDeclarationProcessed
    • Element: namespace, prefix, namespaceAttribute, inscopeNamespace
    • ProcessingInstruction: baseURI, notation
    • Namespace, UnexpandedEntityRef, UnpasedEntity, Notation: all
  • Construction of tree and serialization for the implemented node types. The construction process is based on SAX2.
  • Immutability: Collections are implemented as scala.List, instance variables either as val or as var with public getters and protected setters
  • Checking for validity against an internal or external DTD or a schema
  • Navigation in the child and parent axis
  • A small number of JUnit4 test cases

What is missing?
  • All type related information resulting of a DTD or a schema
  • A well defined API
  • Conversions, e.g. SAXSource or DOMSource (actually there is only 'toString' respectively 'serialize' - to compare results with the original XML instance)
  • Optimizations: a XML file of about 100 MB needs about 1 GB RAM and 30 seconds to parse and build the tree (on an 3-4 years old AMD processor with 1.8 Ghz)
  • Typical Scala features: I'm relative new to Scala and I did the job mainly to improve my Scala know how. So I'm sure that I have overseen a lot of elaborated Scala features.

If anyone is interested in the source code I will send him a ZIP file. Please give me an advice.

Kind regards, Jürgen

a4f6b02a1001080821sc0dc1b5h75b56af2712710fe [at] mail [dot] gmail [dot] com" type="cite">
2010/1/8 Jürgen Purtz <juergen [at] purtz [dot] de" rel="nofollow">juergen [at] purtz [dot] de>

I have plans to build a XML data model which differs from Scalas actual data model in the following items:
  • It should comply with the specifications in [xml], [ns] and [infoset].
  • It should be extensible to hold  information specified in [xdm].
  • It should be ONLY a data model: There will be no parsing class. Parsing will be done by an existing SAX parser.
  • According to Scalas existing data model it will be imutable. (But nodes will contain an 'up reference' to their parent node.)
  • Additionally there should be a package containing some 'syntactical sugar' for typical XML actions, eg.: conversion to and from other data models, XPath/XSLT/XQuery evaluation, ... . In essence the package is nothing more than a thin wrapper layer over existing JARs.
  • [xdm] specifies a smaller number of node types than [infoset]. Nevertheless the intended data model should contain all node types - with type information only for those node types brought up by [xdm].

Until now I have done a case study to build a data model which is loosly inspired by SAX events. It handles basic information items about most node types of [infoset]. This will be the basis for my future work.

As W3C defines a lot of information items in their publications but no API to them (which is contrary to the DOM specification where an IDL is specified), I'm doubtful about the API to the planed data model. Has anyone a proposal or some ideas?

Will my plans be helpfull to the community? Please give me some feedback.

Thanks,  Jürgen


Anthony B. Coates
Joined: 2009-09-12,
User offline. Last seen 2 years 35 weeks ago.
Re: Re: Data Model

Yes, that would be a great idea. Cheers, Tony.

On Tue, 19 Jan 2010 14:45:41 -0000, Alex Cruise wrote:

> Jürgen, that sounds great, I'd like to see it. Why not put it up on
> Github?
> Thanks,
> -0xe1a----- Reply message -----
> From: "Jürgen Purtz"
> Date: Tue, Jan 19, 2010 5:51 AM
> Subject: [scala-xml] Data Model
> To:

Copyright © 2012 École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland