Skip to topic | Skip to bottom

Provenance Challenge

Challenge
Challenge.OPM1-01Review-GraphDef

Start of topic | Skip to actions
Open Provenance Model Contents
  1. Introduction
  2. Basics
  3. Overlapping and Hierarchichal Descriptions
  4. Provenance Graph Definition
  5. Timeless Formal Model
  6. Inferences
  7. Formal Model and Time Annotations
  8. Time Constraints and Inferences
  9. Support for Collections
  10. Example of Representation
  11. Conclusion
  12. Best Practice on the Use of Agensts
  13. References

4 Provenance Graph Definition

The open provenance model is defined according to the following rules, which we formalise in Section 5.

  1. Accounts are entities that we assume can can be compared.
  2. Artifacts are identified by unique identifiers. Artifacts contain a placeholder for a domain specific value or reference to a piece of state. Two artifacts are equal if and only if they have the same identifier (irrespective of their placeholder contents2). Artifacts can optionally belong to accounts: account membership is declared by listing the accounts an artifact belongs to.
  3. Processes are identified by unique identifiers. Processes contain a placeholder for domain specific values or references. Two processes are equal if and only if they have the same identifier (irrespective of their placeholder contents). Processes can optionally belong to accounts: account membership is declared by listing the accounts a process belongs to.
  4. Agents are identified by unique identifiers. Agents contain a placeholder for domain specific values or references. Two agents are equal if and only if they have the same identifier (irrespective of their placeholder contents). Agents can optionally belong to accounts: account membership is declared by listing the accounts an agent belongs to.
  5. Edges are identified by their source, destination, and role (for those that include a role). The source and destination consist of identifiers for artifacts, processes, or agents, according to Figure 1. Edges can also optionally belong to accounts: account membership is defined by listing the accounts an edge belongs to. Structural equality applies to edges: two edges used, wasGeneratedBy and wasControlledBy are equal if they have the same source, the same destination, the same role, and the same accounts; two edges wasTriggeredBy and wasDerivedFrom are equal if they have the same source, the same destination, and the same accounts. The meaning of roles is not defined by OPM but by application domains; OPM only uses roles syntactically (as "tags") to distinguish the involvement of artifacts in processes.
  6. Roles are mandatory in edges used, wasGeneratedBy and wasControlledBy. The meaning of a role is defined by the semantics of the process they relate to. Role semantics is beyond the scope of OPM.
  7. To ensure that edges establish a causal connection between actual causes and effects, the model assumes that if an edge belongs to an account, then its source and destination also belong to this account. In other words, the effective account membership of an artifact/process/agent is its declared account membership and the account membership of the edges it is souce and destination of.
  8. An OPM graph is a set of artifacts, processes, agents, edges, and accounts, as specified above. OPM graphs may be disconnected. The empty set is an OPM graph. A singleton containing an artifact, a process or an agent is an OPM graph. The set of OPM graphs is closed under the intersection and union operations, i.e. the intersection of two OPM graphs is an OPM graph (and likewise for union). We note at this stage that syntactically valid OPM graphs may not necessarily make sense from a provenance viewpoint. Rules below refine the OPM graph concept.
  9. A view of an OPM graph according to one account, referred to as account view, is the set of elements whose effective account membership for artifacts, processes, and agents, and account membership for edges contain the account.
  10. While cycles can be expressed in the syntax of OPM, a legal account view is defined as an acyclic account view, which contains at most one wasGeneratedBy edge per artifact. This ensures that within one account, an OPM graph captures proper causal dependencies, and that a single explanation of the origin of an artifact is given.
  11. Hence, a legal OPM graph is one for which all account views are legal.
  12. Legal account views are OPM graphs. The union of two legal account views is an OPM graph (it is not necessarily a legal view since it may contain cycles). The intersection of two legal account views is a legal account view.
  13. Two account views can be declared to be overlapping to express the fact that they represent different descriptions of an execution.
  14. A declaration that two views are overlapping is legal if the views have some artifact, process or agent in common. Whilst one could infer whether two graphs actually overlap, this would typically require the graphs to be parsed fully in order to make such an inference; instead, we rely on explicit declarations of such overlapping properties to facilitate the processing and traversal of graphs.
  15. An account view v1 can be declared to be a refinement of another account view v2 to express the fact v1 provides further details about an execution than v2.
  16. A declaration that a view is a refinement of another legal if the views have some common "input" artifact and "output" artifact. (Definition needs to be refined!)
  17. A provenance graph is a legal OPM graph where overlapping and refined views are legal.
  18. Edges can optionally be annotated with time information. This aspect is discussed in Section~\retime:annotatio.
  19. A provenance graph does not need to contain time annotations.

Having defined the concept of a provenance graph, we now study its formal specification.

Footnote 2: In the Open Provenance Model, artifact identifiers are the only way to distinguish artifacts in the graph structure. Two artifacts differ if they have different ids, even though they may refer to a same application data product. Two different artifacts are therefore separate nodes in a provenance graph: they have two different computational histories. Given that an artifact represents an instantaneous state of an object, one expect the actural data for a given artifact to remain constant over time.


Comments


to top

You are here: Challenge > OPM > OPM1-01Review > OPM1-01Review-GraphDef

to top

Copyright © 1999-2012 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback