Monday, June 18, 2007

Graph based serialization examples

My previous post on Web3S hierarchical data vs. graph data didn't provide any solutions.

The explanation in the FAQ about graphs isn't sitting well with me. There is a simple way to represent graph data in serialized formats: name the data the first time it's seen and refer to it subsequent times. Examples include:
This is not a complicated pattern to serialize or write code against.

From Dare's posted example, we could add another article with an existing author like this (notice the Web3S:IDREF element):
<articles>
<article>
<Web3S:ID>8383</Web3S:ID>
<title>Manual of Surgery Volume First: General Surgery. Sixth Edition.</title>
<authors>
<author>
<Web3S:ID>23455</Web3S:ID>
<firstname>Alexander</firstname>
<lastname>Miles</lastname>
</author>
<author>
<Web3S:ID>88828</Web3S:ID>
<firstname>Alexis</firstname>
<lastname>Thomson</lastname>
</author>
</authors>
</article>
<article>
<Web3S:ID>8384</Web3S:ID>
<title>...</title>
<authors>
<author>
<Web3S:IDREF>88828</Web3S:ID>
</author>
</authors>
</article>
</articles>
But, that won't work in Web3S without many more changes. It's not the data format that's a problem, but
The problem here is identity, not data formats.

1 comment:

Anonymous said...

I need to think of a good way to define the distinction between an infoset and a data model. One infoset can support many data models. For example, RDF is a data model that can be mapped to the XML infoset which can then map to a XML serialization.

Web3S’s infoset is just the core set of primitive data containers, they don’t define the data models that sit on top. So, for example, one could easily define a graph based data model on top of the tree based infoset that used links to say things like “These two things are the same”. In fact section 10.1 and 10.2 of the Web3S spec define a standard HREF style element for exactly this reason.

In current Web3S there are two likely approaches:

By Value – If the canonical author entry and the references to author all exist within the same ‘system’ (I’m being vague intentionally but think of examples like a single DB) then likely would one just use by value. The author values would show up where needed and changing one author value would change the other. Astoria does this today but they add the additional guarantee that if two instances of a particular element (e.g. author) are in fact the same underlying object then the ID will be the same. That is completely legal in Web3S. Web3S just says that the caller can only assume that IDs are locally unique. But the server is free to offer a higher guarantee if it wants and then advertise that fact. Heck a server could choose to give every element instance a GUID/UUID and so guarantee global uniqueness.

Also, for whatever it’s worth, Astoria supports both hard and soft linking. Our current thinking about this in Web3S is that we would allow servers to advertise schemas that define object relationships, explain ID guarantees, specify hard versus soft linking, etc. We will also probably provide mechanisms to allow servers to annotate data with this information directly rather than requiring schema look up but given the bandwidth expense I’m not sure how often we would use this.

By Reference – Alternatively there would be a single canonical author entry and anyone who wanted to refer to that entry would just use a URL ala section 10.1.

In either case you can get there from here.