WasDerivedFrom Cannot be Inferred
Authors
Luc Moreau, June 19, 2009
Subject
OPM v1.01.
Background
OPM v1.00 introduced an inference rule that allows us to infer a WasDerivedFrom edge. This inference rule
was observed to be incorrect, and OPMv1.01 made this clear in rule (3) of Figure 11:
<a2,r2,p1,acc2> in WasGeneratedBy and <p1,r1,a1,acc1> in Used
----------------------------------------------------------------
<a2,a1,acc1 union acc2> in MayHaveBeenDerivedFrom
Problem addressed
In this page, we make it clear why the inference rule is incorrect. We will put this page to the ballot, stating that this problem needs to be addressed.
We are not proposing a solution here.
Why is the inference rule incorrect?
According to Definition 6, a2 was generated by p1, if p1 was required to initiate its execution for a2 to be generated. Hence, a2 was generated
after p1 started.
According to Definition 5, p1 used a1, if the availability of a1 was required for p1 to complete its execution. Hence, a1 must have been generated
before p1 completed.
Hence ,we have: a1 < p1End and p1Start < a2 and p1Start<p1End
Therefore, there is not guarantee that a1 was generated before a2 was generated.
Hence, we cannot infer that a2 was derived from a1 (see Definition 8).
The origin of this problem is that wasGeneratedBy and Used are essentially temporal relations that are setting weak constraints between the artifacts and
the processes that used and derived them. In the above example, it is possible that an artifact a2 was generated before an artifact a1, whilst a2 was generated by p1 and p1 used a2. If so, then clearly a2 cannot be derived from a1.
Proposed solution
This page has demonstrated that the inference rule, based on the definitions on the specification, is incorrect. A number of possible solutions
to address the problem can be considered. They should be expressed as separate proposals.
Rationale for the solution
It is not reasonable to allow incorrect inference rules. This problem is to be addressed.
One may wonder why the definitions for used and wasGeneratedBy introduce such weak temporal constraints between artifacts and processes.
A reason is that OPM was designed to be compositional. If, at some level of abstraction, we know of two processes p1 and p2, using and generating artifacts as follows.
a3 -> wasGeneratedBy -> p1 -> used -> a1
a4 -> wasGeneratedBy -> p2 -> used -> a2
If at some other level of abstraction, we know that process p3 consists of two parallel processes p1 and p2. Then, we want to be able to
say that:
a3,a4 -> wasGeneratedBy -> p -> used -> a1,a2
The definitions of edges wasGeneratedBy and used allow us to express such dependencies, as we move up levels of abstraction. If we had consider
stricter constraints for these definitions, such as a1 and a2 needed to be available before p1 and p2 started respectively, it may not have been the
case that both a1 and a2 were available before p was able to start.
Comments
Community is invited to provide comments on proposals.
Comment 1 by Simon Miles
My preference would be to remove the inference rule entirely. The special cases where the inference of derivation is possible, which I can see could be useful, can be documented in profiles about specific forms of application. This would also apply to inference of derivation with a well defined uncertainty.
And I can't see how we could get around the current weak semantics of
used
and
wasGeneratedBy
. Any stronger semantics requires a knowledge of the internals of processes which the recorder of OPM documentation simply may not have.
--
JimMyers - 17 Sep 2009
I agree the inference can be incorrect when p1 is a composite process. My concern would be that if people do not assert this relationship, OPM's ability to infer coarser representations become seriously compromised. I hve not thought it through, but I would almost flip this around and ake a OPM requirement to remove the incorrect inference case - it is illegal to say a1-p1-a2 unless a1 was used before a2 was completely generated. Allowing someone to make a state where the derivd from inference is not valid at least in the sense of control flow does not seem to add anything to OPM's capability to represent tue provenance.
Comment by Paolo Missier
I am still confused both by the rule, and, honestly, by the counter-example that shows its incorrectness (incidentally, shouldn't "whilst a2 was generated by p1 and p1 used a2" be "whilst a1 was generated by p1 and p1 used a2"). I also have trobule following the reasoning here: "If we had consider
stricter constraints for these definitions, such as a1 and a2 needed to be available before p1 and p2 started respectively, it may not have been the
case that both a1 and a2 were available before p was able to start."
I am in favour of removing the rule.
Vote
Luc Moreau, yes, the inference is incorrect, i am in favor of
ChangeProposalDeleteWasDerivedInference.
Jim Myes, yes
Paolo Missier, yes
Simon Miles, yes
Paul Groth, yes
NataliaKwasnikowska, yes
Jan Van den Bussche, yes
Outcome
Unanimity in favour of the proposal: Yes: 7/No: 0.
We agree that the inference of an edge wasDerivedFrom is not correct.
--
LucMoreau - 15 Jun 2009
--
LucMoreau - 19 Jun 2009
to top