Thursday, June 28, 2007

Web3S pushes Secondary Resources

Summary: Web3S doesn't naturally support RESTful primary resource identifiers for EIIs. It can, but only with optional elements and server-specific guarantees about identifiers. A generic Web3S client could never "know" what canonical and primary resources were identified.

Yaron responded to my Web3S posts on identity in hierarchies and graph serialization support. (Yaron's comment is on the second link).

My original concern was Web3S duplicates the values of a single resource when serializing a tree, and this obscures the identity of resources.

Now, I realize the issue isn't just about obscuring the identity, but Web3S has no standard way to expose the primary resources in a system.

The properties of a Web3S:ID are
  • unique only within the containing element,
  • used to generate URIs, but also path-relative to the containing elements
Then, from the Web3S spec, section 7: The author elements have a single URI, but that is both path (within article) dependent and based on a locally unique object identifier.

This URI (a non-prefixed, short version from the example in section 7):
http://example.net/stuff/morestuff/articles/article(8383)/authors/author(23455)
doesn't provide a primary resource identifier for the author. Why? This path dependent URI is hardly different than the same URI with a fragment identifier (for a secondary resource):
http://example.net/stuff/morestuff/articles/article(8383)#author23455
From Architecture of the World Wide Web, section 2.6 Fragment Identifiers:
The fragment identifier component of a URI allows indirect identification of a secondary resource by reference to a primary resource and additional identifying information. The secondary resource may be some portion or subset of the primary resource, some view on representations of the primary resource, or some other resource defined or described by those representations. The terms "primary resource" and "secondary resource" are defined in section 3.5 of [URI].
The author in all of these examples should be (able to be) identified as a primary resource, not just a secondary one. Whether or not authors are primary or secondary resources is really a question of server implementation. The spec for Web3S makes that that decision by default for all services though.

Here is the rest of Yaron's comment, with my additional thinking added in.

Yaron writes: Web3S’s infoset is just the core set of primitive data containers, they don’t define the data models that sit on top. So, for example, one could easily define a graph based data model on top of the tree based infoset that used links to say things like “These two things are the same”. In fact section 10.1 and 10.2 of the Web3S spec define a standard HREF style element for exactly this reason.
True enough, it could be done. However, the Web3S infoset does explicitly define a notion of identify, Web3S:ID, that doesn't provide a way to express those thing.

The definition you mentioned of HREF elements makes no statement about what resource is identified, so no client can assume anything about it.
Yaron writes: By Value – If the canonical author entry and the references to author all exist within the same ‘system’ (I’m being vague intentionally but think of examples like a single DB) then likely would one just use by value. The author values would show up where needed and changing one author value would change the other. Astoria does this today but they add the additional guarantee that if two instances of a particular element (e.g. author) are in fact the same underlying object then the ID will be the same. That is completely legal in Web3S. Web3S just says that the caller can only assume that IDs are locally unique. But the server is free to offer a higher guarantee if it wants and then advertise that fact. Heck a server could choose to give every element instance a GUID/UUID and so guarantee global uniqueness.
All based on optional elements and out of band (published schema) communication.
  1. I can't write a general purpose Web3S tool that takes advantage of that, and
  2. Tools that do take advantage of that are more tightly coupled to that one service.
Yaron writes: Also, for whatever it’s worth, Astoria supports both hard and soft linking. Our current thinking about this in Web3S is that we would allow servers to advertise schemas that define object relationships, explain ID guarantees, specify hard versus soft linking, etc. We will also probably provide mechanisms to allow servers to annotate data with this information directly rather than requiring schema look up but given the bandwidth expense I’m not sure how often we would use this.
I hope you would use it all the time, otherwise you will be promoting a code generation solution with published Schemas. See my own blog posts, as well as the recent storm of discourse on WADL and REST.

I'm not exactly sure what hard and soft linking refer to here, can you expand? I don't have experience with Astoria yet, and I'm thinking inode filesystem hard links...
Yaron writes: By Reference – Alternatively there would be a single canonical author entry and anyone who wanted to refer to that entry would just use a URL ala section 10.1.
This should be the default for how Web3S works: the identifier can be a primary or secondary resource identifier in all cases. It should be absolutely server-dependent whether authors can be independently identifiable, but providing canonical and absolute identity for any EII should not require client-server coupling.

I'll think more about how to achieve this, but a first random idea is to allow the HREF element inside (and in place of) the Web3S:ID element.
Yaron writes: In either case you can get there from here.
Yes, you can, but the default, standard, and primary means of identification should directly support a canonical identifier system without reporting to optional elements and shared schemas.

2 comments:

Anonymous said...

REFERENCE ISSUES:

In the case that one resource contains a URL that points at another resource we will provide a standard element whose semantics are "I am a URL pointing to a resource with the name X". This will be standard for all Web3S resources.

In the case that another resource is referenced by value the same previously described element can still be used as an annotation to in effect say "You can read/edit the value in place here or if you want a canonical URL here it is."

SECONDARY RESOURCES:

In the most restrictive sense a secondary resource is one that can only be referenced in the context of some other resource via a local fragment identifier. That is clearly not the case with Web3S as each element has a full fledged URL that is available for robust use with arbitrary HTTP methods. Fragment identifiers, on the other hand, are stripped before transmission and only used locally.

One can argue however that RFC 2396 provides a wider definition of secondary resource such that a secondary resource is in effect any resource that is contained within some other resource. If that is the intended definition then, for example, all WebDAV resources are secondary resources since they exist within a container and can be destroyed (modulo linking, which Web3S also supports) if the parent is destroyed. If that is your intended semantics than Web3S is certainly guilty but I don't understand why it matters.

John Heintz, President Gist Labs said...

Yaron,

I'm very glad to hear that you plan to put a standard solution in place for canonical references.

Why do I care about secondary resources in Web3S? Resources should have stable identifiers (at least as stable as the resource itself). A secondary resource identifier is less stable because changes in the primary or secondary resource (both) break the identifier.

With a standard reference solution designed into Web3S then my concerns over secondary resources are mitigated.