Change Proposal: Allow Multiple Role Labels on Edges
Authors
SimonMiles
2009 June 19.
Subject
Core OPM specification.
Background
Currently, the OPM specification only allows one role label per
used
or
wasGeneratedBy
edge. For the reasons given below, I believe this may be detrimentally restrictive.
Problem addressed
Only one role label per
used
and
wasGeneratedBy
edge allowed by the specification.
Proposed solution
Allow each
used
and
wasGeneratedBy
edge to be labelled with multiple roles, not just one.
Rationale for the solution
Ambiguity of Multiple Edges
A counter-argument to accepting this proposal may be that, where we want to provide multiple role labels, multiple edges, each with a different label, could be expressed in the graph. However, it is unclear whether multiple
used
edges, for example, also implies multiple uses of an artifact by a process. If so, this is not what we are trying to express with multiple role labels on one edge.
Multiple Ways to Name a Role
It does not unreasonable to expect different asserters to describe the role of the same artifact in the same process in different ways, e.g. different systems make use of different ontologies. For example, one agent may label the role as a:divisor, while another uses b:divisor. If two accounts are merged, there will be multiple role descriptions per edge.
Multiple Roles in a Single Usage/Generation
There are cases where one asserter would wish to identify multiple independent roles of an artifact in one account. For example, a workflow engine may distinguish between the roles of 'input' and 'parameter' artifacts, but it is also useful to specify what functional relationship these artifacts have to the workflow enacted, e.g. 'dividend' and 'divisor'. These are independent roles, therefore it is not obviously useful to require one to be a sub-type of another, e.g. dividend as sub-type of input.
In the case of Java-based provenance, we may describe the role of an argument to a method call as the 'nth' parameter, or by the parameter name. Given that callers may only know the order of arguments in calling (so the index matters) but that APIs may change (so the index is not reliable to interpret the role of the argument in the provenance), it could be helpful to have both.
Lack of Reason to Reject Proposal
Finally, there does not seem a strong counter-argument to allowing multiple roles per arc. It does not hinder interoperability, as if you are checking for the existence of a role in a query having other roles present does not restrict you. It is not compulsory, so does not discourage adoption.
Comments
Community is invited to provide comments on proposals.
comment 1 by Luc Moreau
I think the problem is not correctly posed, and does not expose all the issues to take into consideration.
The example give above may not be correct: "e.g. different systems make use of different ontologies. For example, one agent may label the role as a:divisor, while another uses b:divisor. If two accounts are merged, there will be multiple role descriptions per edge. ".
Indeed, the meaning of roles (according to OPMv1.01) is defined by the process they are attached to. So, we should consider the more general problems
where two asserters a and b define
pa -> used(ra) -> aa in account acc_a
pb -> used(rb) -> ab in account acc_b
Let us assume that each process is given a type (in an ontology) and a persistent name.
pa hasType ontoa:ta
pb hasType ontob:tb
pa hasPersistentName na
pb hasPersistentName nb
The meaning of ra is given by the ontology of pa: ontoa:ra (and likewise for rb).
To understand if and when the problem that Simon describes occurs, I think we need to do a case analysis:
1.
acc_a <> acc_b
If the two accounts are different, that's fine, the two used edges, can coexist because they belong
to different accounts.
2.
acc_a = acc_b
2.a Process identities are different, so they are not merged.
na <> nb
Again, that's fine, we have two different Used edges.
2.b Process identities are the same, so they can be merged in a union operation.
na ==nb (and likewise for artifact names)
2.b.1 Process has same type
ontoa:ta == ontob:tb
2.b.1.a Roles are identical
ra = rb
So fine, the union results in a single edge.
2.b.1.b Roles differ
ra <> rb
The meaning of the roles is defined by the process ontology. I believe this case is the one that Simon refers to, though I am not sure.
I claim that it is fine
to have two Used edges with roles ra and rb. The ontology should resolve the "Multiple Roles in a Single Usage/Generation" raised by Simon.
For instance, the ontology could declare roles:
divisor, dividend, and 1st, 2nd and declare that divisor and 1st are interchangeable (likewise for dividend and 2nd).
2.b.2
ontoa:ta <> ontob:tb
So necessarily,
ra <> rb
since roles should be understood in the context of the processes that defined them, so:
ontoa:ta:ra <> ontob:tb:rb
Hence, we can have two edges, with the respective roles. They offer two separate descriptions in terms of two different ontologies.
Conclusion
I don't see the case for multiple roles on edges, given that their meaning is given by the process they relate to.
Comment 2 by Simon Miles in reply to Comment 1
I think I understand the argument, but have a couple of problems with it.
First, it states a particular view of the world, where each process is associated with one ontology. Given that OPM does not prevent multiple independent actors documenting the same process, I don't see why this would necessarily be so. Also, would it have to be a new ontology for every OPM process, i.e. one execution of a procedure, or only one for each procedure of which the process is an execution? In human documentation, people describe the same events in different terms, with those terms having different connotations. Why would this be different for OPM? Doesn't the restriction of one ontology per process require co-ordination between OPM producers beyond the scope of which the specification should be prescribing?
Second, with regards to the ontology resolving the "Multiple Roles in a Single Usage/Generation" problem, doesn't requiring this place more of a burden the ontology developer than is reasonable? Why would it be a better solution than allowing multiple role labels? To return to the example of naming something both an 'input' (as opposed to 'parameter') and a 'dividend' (as opposed to 'divisor'), I would not expect an ontology written along with the application to say that every input is a dividend, or even that every dividend is an input, so we would require a 'dividend-and-input' concept to be created. Would this seem reasonable to users of OPM when the two concepts they wish to use already exist in the domain ontology?
Finally, I still can't think of an argument for
not allowing multiple role labels.
Comment 3 by Luc Moreau
I suggest that x -(r1,r2)-> y be regarded as an abbreviation for x-r1-> y and x-r2->y. The latter is already supported by OPM (where both edges can be in a same account or in differing accounts).
If x -(r1,r2)-> y is not an abbreviation, than what would be the meaning of an OPM graph with the following edges:
- x -(r1,r2)-> y
- x-r1-> y
- x-r2-> y
Comment 4 by Luc Moreau
It seems that this propsal is to allow "synonyms" for roles: e.g. divisor and second argument. Why not express such a capability by means of annotation?
Comment 5 by Luc Moreau
I suggest that x -(r1,r2)-> y be regarded as an abbreviation for x-r1-> y and x-r2->y. The latter is already supported by OPM (where both edges can be in a same account or in differing accounts). In that case it is up to the semantics of the process to identify the meanings of r1 and r2, and it should determine whether an artifact was used/generate multiple time or not (for r1 and r2).
I however don't see a strong use case for this.
--
JimMyers - 17 Sep 2009
I think we introduced roles as a way to catch when there was disagreement between accounts despite agreement on what was used - i.e. a and b both used but acc1 says a is the divisor and acc2 says that b is. Thus I think we have some form of constraint that if roles are defined and different for two+ inputs, the role assignments must agree for the accounts to be in agreement.
Given the complexities thatcould be involved in role ontologies, I don't think OPM should delve into resolving anything but exact matches between accounts ala the current rules. If so, I don't think that allowing multiple roles for a given used edge, or mltiple used edges
within one account makes sense. There's no prohibition now about differences between accounts so that would still be a way to carry the info around.
Comment 6 by Simon Miles in reply to comment 5 and Jim above
The main purpose of role labels I am considering in this proposal is querying: if I want to know about the provenance of the divisor in an operation, I follow the edge labelled "divisor". For the reasons above, particularly "Multiple Roles in a Single Usage/Generation", allowing only one role in one account seems restrictive and unnecessary.
With regards to putting two labels in a merged data structure, (r1,r2), this is fine (and meets the change proposal), as long it is possible for an OPM-parsing tool to know how to separate them. If nothing is said about multiple roles in the spec and we allow the merged data structure to be used, I can't see how a query can check whether an edge has one particular label.
Comment 7 by Paolo Missier
If the purpose of the multiple labels is to allow synonyms, as it seems from the early justification and comments and as Luc suggests in comment 4, then I would argue that synonymy issues should be resolved elsewhere, rather at the point of use (i.e., when role labels are used).
If the terms belong to some ontology, we have standard vocubulary in place to express that they are the same. So although I can see that there may be additional, unanticipated need for multiple labels, I would vote no because I think these needs should be discussed when they arise.
Vote
Luc Moreau, No, I don't see a use case for the proposal
Jim Myers, No
Simon Miles, Yes
Paul Groth, No
PaoloMissier, no
--
SimonMiles - 19 Jun 2009
Outcome
The vote is as follows: No: 4/Yes: 1.
There is no majority for this proposal. The issue however needs to be monitored. As we discover use cases where multiple roles may be required, we should reopen this proposal and/or find was of solving the problem.
to top