Annotations for OPM (Draft proposal, Work in Progress)
Authors
Luc Moreau, June 19th.
Subject
OPM v1.01
Background
Provenance Challenge 3 has shown the need for "extra information" to be added to OPM entities (whether nodes or edges). They include types and values. Such extra information can be seen as annotations to OPM entities.
Problem addressed
The purpose of this proposal is twofold:
- outline general properties of annotations to OPM entities,
- define specific annotations, with "reserved keywords" and "agreed meaning".
This proposal does not specify how serializations should encode annotations. Separate proposals for XML, RDF (or others) should be drafted.
In a first instance, reserved annotations will be specified to address the inter-operability problems encountered in PC3.
Proposed solution
Annotations in OPM: general principles.
Rule 1. Every OPM entity can be annotated. An OPM entity can be an OPM Graph, an OPM node, an OPM edge, an OPM account. Note that an annotation does not fall into the category of OPM entities that can be annotated, because if we were to do so, we would reinvent systems that were designed specifically for that.
Rule 2. Every instance of an annotation in an OPM graph must be addressable. This allows annotations to be annotated themselves outside OPM (for instance to indicate who their asserters were).
Rule 3. An annotation consists of a subject, a property, and an object. The subject must be an OPM entity. The object is the actual annotation value. The property represents the "kind" of annotation being attached to the subject.
Rule 4. Annotations are optional and multiple values can be associated with a subject for a same property. (How was this described in DC?)
Reserved Annotations.
- property:
type
, object: uri
. Denotes the type of an OPM entity. Types are represented by a URI.
- property:
value
, object: xsd:any
. Denotes an XML serialization of an application value associated with an OPM entity. Such serialization should be given an XSD type.
- property:
encoding
, object: uri
. Denotes how an XML serialization was constructed. For instance, using the Java bean serialiser, or by applying a specified transformation to the application data, e.g. anonymisation, or by passing a reference to the actual value.
- property:
name
, object: uri
. Denotes a persistent name that can be used by OPM graph queriers to compare OPM entities. The scope of this name is intended to be global.
Rationale for the solution
We sought to devise a minimal annotation approach that can exploit existing annotation technologies such as RDF without any problem.
Comments
Community is invited to provide comments on proposals.
comment 1 by Jan Van den Bussche and Natalia Kwasnikowska
Do we want to provide for different annotations of a node or an edge for different accounts?
Comment 2 by Simon Miles
I support this change proposal. Annotations seem essential to connecting OPM graphs with all the non-causal information available. Also, they allow for profiles which rely on expressing non-causal information, but still map to the core OPM model.
One point that maybe could be emphasised is that some annotations may actually connect two entities in the same graph, i.e. the object of an annotation to one entity is another entity in the graph. While this may be implied by not constraining the annotation objects, it might be worth making explicitly to guide those wishing to express non-causal relationships in the graph, e.g. a 'has same owner' relation between two artifacts. This may help avoid OPM users pressing for additions to the core model which expand it beyond the causal graph core.
Comment 3 by Simon Miles in reply to Comment 1
I think it makes sense for an annotation to be specific to one account an entity belongs to, especially if multiple parties provide differing accounts for one entity.
As a concrete example, if two asserters provide accounts involving the same artifact, they could assert different values for that artifact. If they do, someone reading the graph should be able to determine which account each value annotation, as the discrepancy may be illuminated by seeing how the artifact is described as being generated or used. Or, if they consider one artifact value to be an incorrect representation of what occurred, they may wish to ignore the rest of the account of which it is part.
There is a related question which I could anticipate arising, i.e. in which account is the annotation, if any? As someone remarked in the challenge workshop, annotations may be made after and by different asserters from the entities in the graph, e.g. a data format annotation of an artifact may be added by someone who has read the original graph and determined the artifact's format. Such an annotation should be to the entity in one of the accounts in which the entity was documented, as the entity only has meaning as part of an account. Therefore, the fact that an annotation is on an entity in one account should not be interpreted too strongly, i.e. we should not assume that the annotation and the entity have the same asserter or, in general, provenance. Luc has previously persuaded me that the OPM provenance of an entity being documented in the graph is outside the scope of OPM itself, but best practice in this matter may be the subject of a profile. Therefore, I would answer the question I put as: annotations are not themselves in accounts, and anything we may want to convey with that notion is out of scope of core OPM.
Comment 4 by Simon Miles
This comment is copied, with a little editing for context, from ChangeProposalRemoveProcessValues, as it is also relevant here: please see the latter page for more context.
I have a problem with entities other than artifacts having a "value". A data artifact, as a snapshot, really does seem to have a value: the data content of that artifact at that instant. It seems ambiguous to talk about the value of other entities in the same way.
If the value field of another entity is just an extensibility point, then how is it different from the entity itself? If the entity can be annotated arbitrarily, then why would you have one annotation called
value
whose object is an extensibility point, i.e. something arbitrary?
For example, if we want to express the library name of a process, why not use a
libraryName
annotation? The process is an instant of that library's execution, not the library itself, so suggesting it is the value of the process in the same way as an artifact's value seems misleading. Similarly for WSDL interface, version number, source code language etc.
Therefore, I would argue that the domain of the
value
annotation should be restricted to artifacts.
--
JimMyers - 17 Sep 2009
It's clear that OPM is only relevant in a world where additional metadata and data can be associated with entities in an OPM graph. CHallenge questions can't be answered without this. But I don't know what OPM's role should be in defining valid annotations. There are many type systems - should OPM only be usable with the one we like?
I think I see a reasonble dividing line as to when OPM really needs to specify a vocabulary - in the cases such as time where OPM relations place constraints on /allow inferences about the values/validity of annotations in that vocabulary. I think it would be useful to explore whether this is a reasonable definition of a profile - you specify a vocabulary and the set of constraints/inferences involving core OPM that this new vocabulary requires/enables.
The corrolary here is that if there are no constraints/inferences, it should not be part of the spec. I agree type is useful metadata, but I don't see any constraints/inferences related to type so I question whether OPM needs to define what type is.
Is agreement on these types of things a part of the Challenge series but not OPM?
comment from Joe Futrelle
Needs more work. I think this overlaps with some proposals also on the table, like using dc:identifier to annotate OPM nodes with global ID's and should be harmonized with those proposals. I think the similarity between this proposal as written and RDF is actually a problem, because I think we need a way to associate specific annotations with accounts, and a triple doesn't contain enough information to do that. In RDF the solution--reification--is awkward, and I don't want to reproduce that problem in OPM.
Vote
The proposal is not complete yes. So vote here to indicate that work should continue/discontinue on annotations for OPM
Luc Moreau, Yes, work should continue
Jim Myers, Yes - we need to clarify this edge of OPM and what's in/out
Simon Miles, Yes, some aspects as discussed above need to be clarified
Joe Futrelle, yes
Paul Groth, yes - agree that clarification is needed
NataliaKwasnikowska, yes
PaoloMissier, yes -- work not complete and subject to revisions, but important
Jan Van den Bussche, yes
--
EricStephan, yes
Outcome
Unanimous support: yes: 9/no: 0. Work should continue on this proposal.
--
LucMoreau - 19 Jun 2009
to top