Skip to topic | Skip to bottom

Provenance Challenge

Challenge
Challenge.OPM1-01Review-TimelessFormalModel

Start of topic | Skip to actions
Open Provenance Model Contents
  1. Introduction
  2. Basics
  3. Overlapping and Hierarchichal Descriptions
  4. Provenance Graph Definition
  5. Timeless Formal Model
  6. Inferences
  7. Formal Model and Time Annotations
  8. Time Constraints and Inferences
  9. Support for Collections
  10. Example of Representation
  11. Conclusion
  12. Best Practice on the Use of Agensts
  13. References

5 Timeless Formal Model

Figure 8 provides a set-theoretic definition [11, 6] of the open provenance model, based on the concepts introduced so far. The model of causality we propose is timeless since time precedence does not imply causality: if a process P1 occurs before a process P2, in general, we cannot infer that P1 caused P2 to happen. However, the converse implication holds assuming time is measured according to a single clock.

Even though the provenance model is timeless, we recognize the importance of time, since time is easily observable by computer systems or users. Hence, in Section 7, we examine how the causality graph can be annotated with time. We will also specify constraints that one would expect time annotations to satisfy (in terms of monotonicity with respect to time) in sound causality graphs.

We assume the existence of a few primitive sets: identifiers for processes, artifacts and agents, roles, and accounts. These sets of identifiers provide indentifies to the corresponding entities within the scope of a given provenance graph. A given serialization will standardize on these sets, and provide concrete representations for them.

It is important to stress that the purpose of these identifiers is to define the structure of graphs: they are not meant to define identities that are persistent and reliably resolvable over time.

In the model, processes, artifacts and agents are identified by their IDs, and are associated with a value and zero or more accounts --- noted P(Account), the powerset notation. In the set-theoretic notation, identifiers map to the corresponding value and account membership. In other words, with a database perspective, elements of ProcessId, ArtifactId and AgentId are keys to processes, artifacts and agents, respectively.

The five causality edges can be easily specified by sets used, wasGeneratedBy, triggeredBy, wasDerivedFrom, and wasControlledBy making use of identifiers for artifacts, processes or agents, roles, and the associated accounts.

Finally, an OPM graph needs to identify explicitly which accounts are overlapping or refinements. For this, we use a set Overlaps enumerating lists of overlapping accounts, and a set Refines enumerating lists of refined accounts.

Timeless Causality Graph Data Model
Figure 8: Timeless Causality Graph Data Model

The model of Figure 8 specifies all the necessary building blocks for creating OPM graphs. We now revisit the definition provided by Section 4, re-examining each item, and explaining it in terms of the formal model.

  1. Accounts are elements of the set Account.
  2. All artifacts of a graph must have identifiers belonging to the set ArtifactId. A function A of type Artifact is total on the set ArtifactId. For an artifact id a, account membership is A(a).acc. In OPM, the artifact entity contains a placeholder, A(a).value, for application specific values or references to the actual piece of state.
  3. All processes of a graph must have identifiers belonging to the set ProcessId. A function P of type Process is total on the set of ProcessId. For a process id p, account memberhsip is P(p).acc. A process contains a placeholder P(p).value for application specific valuers or references to the actual process.
  4. All agents of a graph must have identifiers belonging to the set AgentId. For the total function AG, and for an agent id ag, account memberhsip is AG(ag).acc. Placeholder for the actual agent value is AG(ag).value.
  5. Equality on edges is defined as follows:

    For any used edges u1=⟨ p1,r1,a1, acc1⟩∈ Used and u2=⟨ p2,r2,a2, acc2⟩∈ Used, u1=u2 if p1=p2, a1=a2, r1=r2, acc1=acc2.

    For any wasGeneratedBy edges g1=⟨ a1,r1,p1, acc1⟩∈ WasGeneratedBy and g2=⟨ a2,r2, acc2⟩∈ Used, g1=g2 if p1=p2, a1=a2, r1=r2, acc1=acc2.

    For any wasControlledBy edges c1=⟨ p1,r1,ag1, acc1⟩∈ WasControlledBy and ag2=⟨ p2,r2,ag2, acc2⟩∈ WasControlledBy, c1=c2 if p1=p2, ag1=ag2, r1=r2, acc1=acc2.

    For any wasDerivedFrom edges d1=⟨ a1,a1, acc1⟩∈ WasDerivedFrom and d2=⟨ a2,a2, acc2⟩∈ DerivedFrom, d1=d2 if a1=a2, a1=a2, acc1=acc2.

    For any wasTriggeredBy edges t1=⟨ p1,p1, acc1⟩∈ WasTriggeredBy and t2=⟨ p2,p2, acc2⟩∈ WasTriggeredBy, t1=t2 if p1=p2, p1=p2, acc1=acc2.

  6. The model does not place any constraints on roles, beyond their membership to the set Role.
  7. We introduce a convenience function accountOfgr operating on entities of a graph gr. For a given OPM graph gr=⟨ A, P, AG, U, G, T, D, C, Ov, Re⟩, where AArtifact, PProcess, AGAgent, and UUsed, GWasGeneratedBy, TWasTriggeredBy, DWasDerivedFrom,C
    WasControlledBy, OvOverlapping, ReRefinement
    accountOfgr(p)=P(p).acc 
    accountOfgr(a)=A(a).acc 
    accountOfgr(ag)=AG(ag).acc 
    accountOfgr(u)=acc  where  u=⟨ p,r,a,acc⟩∈ U
    accountOfgr(g)=acc  where  g=⟨ a,r,p,acc⟩∈ G
    accountOfgr(t)=acc  where  t=⟨ p1,p2,acc⟩∈ T
    accountOfgr(d)=acc  where  d=⟨ a1,a2,acc⟩∈ D
    accountOfgr(c)=acc  where  c=⟨ p,r,ag,acc⟩∈ C
    We then introduce effectiveAccountOf:
    effectiveAccountOfgr(p
     =accountOfgr(p
      i,j,k accountOfgr (ui,j,k)  where  ui,j,k=⟨ p,ri,aj,acck⟩ ∈ U
      i,j,k accountOfgr (di,j,k)  where  di,j,k⟨ ai,rj,p,acck⟩ ∈ G
      i,j accountOfgr (ti,j)  where  ti,j=⟨ ppi,accj⟩ ∈ T
      i,j accountOfgr (ti,j)  where  ti,j=⟨ pip,accj⟩ ∈ T
      i,j,k accountOfgr (ci,j,k)  where  ci,j,k=⟨ p,riagj,acck⟩ ∈ C
    (It is defined similarly for artifacts and agents.)
  8. No topological restriction is placed on OPM graphs. For instance, ⟨ p,r1,a,∅⟩ ∈ U and ⟨ a,r2,p,∅⟩ ∈ G are two acceptable edges of an OPM graph, which would create a circularity.

    If gr1=⟨ A1,P1,AG1, U1,G1,T1,D1,C1, Ov1, Re1⟩ and gr2=⟨ A2,P2,AG2, U2, G2,T2,D2,C2, Ov2, Re2⟩, then gr1gr2=⟨ A1A2,P1P2,AG1AG2, U1U2,G1G2,T1T2,D1D2,C1C2, Ov1Ov2, Re1Re2⟩, where the ⊔ operator is define as: A1A2(x)=⟨ v,a1a2⟩ with A1(x)=⟨ v,a1⟩ and A2(x)=⟨ v,a2⟩.

    If gr1=⟨ A1,P1,AG1, U1,G1,T1,D1,C1, Ov1, Re1⟩ and gr2=⟨ A2,P2,AG2, U2, G2,T2,D2,C2, Ov2, Re2⟩, then gr1gr2=⟨ A1A2,P1P2,AG1AG2, U1U2,G1G2,T1T2,D1D2,C1C2, Ov1Ov2, Re1Re2⟩, where the ⊓ operator is define as: A1A2(x)=⟨ v,a1a2⟩ with A1(x)=⟨ v,a1⟩ and A2(x)=⟨ v,a2⟩.

    If gr1,gr2OPMGraph, then

    gr1⋃ gr2∈ OPMGraph

    and

    gr1⋂ gr2∈ OPMGraph.
  9. For an OPMGraph gr=⟨ A,P,AG, U,G,T,D, C, Ov, Re⟩, for an account α, view(α,gr) is ⟨ Aα,Pα,AGα, Uα,Gα,Tα,Dα,Cα,Ov, Re⟩, where:
    Aα⊆ AwithAα={ (a,acc)∈ A such that α∈ effectiveAccountOfgr(a)}
    Pα⊆ PwithPα={ (p,acc)∈ P such that α∈ effectiveAccountOfgr(p)}
    AGα⊆ AGwithAGα={ (ag,acc)∈ AG such that α∈ effectiveAccountOfgr(ag)}
    Uα⊆ UwithUα={ ⟨ p,r,a,acc⟩ ∈ U such that α∈ acc}
    Gα⊆ GwithGα={ ⟨ a,r,p,acc⟩ ∈ G such that α∈ acc}
    Tα⊆ TwithTα={ ⟨ p1,p2,acc⟩ ∈ T such that  α∈ acc}
    Dα⊆ DwithDα={ ⟨ a1,a2,acc⟩ ∈ D such that  α∈ acc}
    Cα⊆ CwithCα={ ⟨ p,ag,acc⟩ ∈ C such that  α∈ acc}
  10. A legal account view gr=⟨ A,P,AG, U,G,T,D,C,Ov, Re⟩ is such that there is no cycle in U, G, T, D and if ⟨ a1,r1,p1, acc1⟩∈ G and ⟨ a1,r2,p2, acc1⟩∈ G, then ⟨ a1,r1,p1, acc1⟩=⟨ a1,r2,p2, acc1⟩, where acc1 is a singleton.
  11. Two accounts α12 are declared to be overlapping in an OPMgraph gr=⟨ A,P, AG, U,G,T,D,C, Ov, Re⟩, if ⟨ α12⟩ ∈ Ov or ⟨ α21⟩ ∈ Ov.
  12. Two accounts α12 are declared to be legally overlapping in an OPMgraph if they are overlapping and if their respective account views ⟨ A1,P1,AG1, U1,G1,T1,D1,C1,Ov1,Re1⟩ and ⟨ A2,P2,AG2, U2,G2,T2,D2,C2,Ov2,Re2⟩ are such that
      Domain(A1)⋂ Domain(A2)≠∅
     orDomain(P1)⋂ Domain(P2)≠∅
     orDomain(AG1)⋂ Domain(AG2)≠∅.
    Hence, the overlapping relationship is reflexive, symmetric but not transitive.
  13. An account α1 is declared to refine account α2 in an OPMgraph gr=⟨ A,P, AG, U,G,T,D,C, Ov, Re⟩, if ⟨ α12⟩ ∈ Re.
  14. An account α1 is declared to be legally refining account α2 in an OPMgraph if they are overlapping and if their respective account views gr1=⟨ A1,P1,AG1, U1,G1,T1,D1,C1,Ov1,Re1⟩ and gr2=⟨ A2,P2,AG2, U2,G2, T2,D2,C2,Ov2,Re2⟩ are such that
      source(gr2)⊆ source(gr1)
     andsink(gr2)⊆ sink(gr1)

Concept is currently ill-defined. Definition remaining to be finalised. Can we define refinement just on syntactic properties of the graphs? Hence, the refinement relationship is reflexive, asymmetric and transitive.


Comments


to top


You are here: Challenge > OPM1-01Review-TimelessFormalModel

to top

Copyright © 1999-2012 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback