Dublin Core identifier naming proposal
Authors
Ben Clifford, in person at Amsterdam PC3 workshop June 2009 and online 16th June 2009
Subject
OPM core or a not-presently-existing naming profile
Background
There was some debate at the Amsterdam PC3 workshop about whether artifacts should have global identifiers or not without any clear consensus. No one denied that artifacts should be prohibited from having global identifiers; and some agreed that if global identifiers were present then they should be somehow interoperable between implementations. This proposal suggests a path to do that.
Problem addressed
How to globally identify artifacts, if they can be identified.
Proposed solution
Dublin core "identifier" (
http://dublincore.org/documents/dcmi-terms/#terms-identifier) should be adopted as the naming model. Concrete serialisation specifications should define how such annotations are associated with each artifact, when such artifacts have them.
Point for debate: choose some other sufficiently open naming mechanism, such as URIs
Point for debate: should other entities have global names and if so, should they use the same system?
Rationale for the solution
Other people have thought about naming of things in general; OPM is not a naming expert group. OPM should use naming systems defined by someone else.
Comments
Community is invited to provide comments on proposals.
comment 1
dc:identifier is ambiguous when it comes to concrete serialisation in that it does not define the domain over which an identifier is unique, thereby perhaps impeding interoperability. Adopting something like URIs might provide a more concrete format that promotes interoperability. On the other hand, is interoperability actually impeded in practice by this ambiguity?
Comment 2 by Simon Miles
I agree with the need for this proposal, and with re-using an existing approach.
It seems that we would be using
dc:identifier
as the type of an annotation, and the annotations could be on any entity in the OPM graph (as described in
ChangeProposalAnnotations).
Presumably, if the data value of two annotations of type
dc:identifier
are the same, then they would be deemed to be identifying the same entity, regardless of whether those data values were URIs or something else. This seems to match the semantics of
dc:identifier
in practice. I would expect URIs to be commonly used, but it seems beyond the scope of OPM to be concerned with this (though all our examples could use URIs).
Comment 3 by Simon Miles in reply to comment 1
We could simply state that the identifier value should be interpretable in a globally unique way, i.e. there is always a global scope. I think this is justified by a purpose of provenance graphs: to connect together events happening disparately and in the past. This is a stronger requirement than placed on most representations, in that we have to prepare carefully for unpredictable future usage. Global identification seems not such a heavy burden in achieving this.
I have no strong feeling about mandating URIs. It seems unnecessary, but not objectionable.
Comment 3 by Jim Myers - 17 Sep 2009
In RDF realizations, I'd expect the URI identifying the artifact (or other entity) to be a global identifier and that this proposal would require a triple that said
ArtifactGlobalURI? dc:identifier
ArtifactGlobalURI?. I dfinitely agree that global IDs are important - we want accounts to be linkable based on shared IDs of edge artifacts - but this proposal sees like it applies more to a specific realization of OPM than to OPM per se. Another use I could see for this would be for handling aliases - I want an internally minted globalID but know that a DOI also exists for the artifact - but this use case seems at bests like a profile rather than a minimal requirement for OPM. I'm leaning towards a 'no' vote, but that is based on an assumption that the change is phrased as 'all entities have to have a dc:identifier property' versus somthing like "in realizations that do not use global IDs directly (or in cases where aliases are known) dc:indetifier should be used to associate additional globalIDs with the entity. It is valid in OPM to assume that two entities (i.e. entities in different accounts) that share a global ID are the same and that normal OPM rules for inference can be applied". This particular phrasing open the door for issues if two entities are considered the same in one identifier scheme and not another...
Comment 4 by Simon Miles in reply to Comment 3
I agree that "all entities have to have a dc:identifier property" would probably be bad, though I did not interpret the proposal this way. I don't understand your phrase "in realizations that do not use global IDs directly", but I see your point that exactly what is being mandated is unclear. How about "where global IDs are intended for use specifically for OPM manipulation purposes (determining in-graph equivalence, inferring), you should use dc:identifier"? This just makes it clear we are aiming for interoperability only with regard to OPM, and not places arbitrary restrictions on identifying data. I see nothing wrong for the same identifier to be expressed in multiple ways in a system if it aids the use of multiple tools.
comment from Joe Futrelle
As long as using a dc:identifier annotation to carry the global ID is "strongly recommended" rather than "required", I see no problem with this proposal.
comment from Paolo
I don't feel strongly either way. This seems to focus on how to represent (global) IDs, which I don't think is a big issue, as opposed to how identifiers are created, by whom with which authority, etc. What was clear in PC3, for me, was that without mappings between local naming schemes defined independently by each group (for the same workflow!), OPM interoperability, for example the ability to rewrite queries so they operate on third-party graphs, will not be achieved.
comment from Luc
My understanding is that a dc:identifier has a range which is
http://www.w3.org/2000/01/rdf-schema#Literal (The class of literal values, eg. textual strings and integers). How should we interpret that literal? I would have preferred to see a URI here.
Vote
Jim Myers, NO if required, YES as an optional annotation
Simon Miles, yes (but no if we require it for every entity or every time something is identified, as discussed above; I also suggest saying that the IDs will be globally unique but not restrict them to URIs)
Joe Futrelle, no if required, yes as an optional (ideally "strongly recommended") annotation
Paul Groth, no
PaoloMissier, yes if the use of dc:identifier is not mandated
LucMoreau, no
Outcome
The outcome is: No: 4/Yes-modulo change: 2.
The proposal is that it is not adopted. However, as annotations get define, the issue of a global ID will come up, and a proposal will be put forward.
to top