PrIMe: The Provenance Methodology

This page describes the Provenance Incorporating Methodology (PrIMe).

!PrIMe: The Provenance Methodology

Introduction

Provenance is already well understood in the study of fine art where it refers to the trusted, documented history of some art object. Given that documented history, the object attains an authority that allows scholars to understand and appreciate its importance and context relative to other works. Art objects that do not have a trusted, proven history may be treated with some scepticism by those that study and view them. This same concept of provenance may also be applied to data and information generated within computer applications.

In general, computer applications produce data, and making an application provenance-aware allows its users to understand the provenance of their data, understood as the process that led to that data. To be able to determine the provenance of data, it must be possible to document an application's execution and to then perform queries over that documentation. Such documentation is called process documentation and is comprised of multiple, individual pieces of information, called p-assertions, which are recorded during execution and then stored and maintained in a repository of such information called a provenance store. One difficulty that remains, however, is to ensure that necessary and sufficient forms of process documentation are captured so that queries can return a satisfactory account of a given data item's provenance. It is the role of such software engineering tools as PrIMe to ensure this is achieved.

PrIMe is a tool to be used by system developers who make modifications to applications by applying the steps of PrIMe after querying the application's users for the kinds of information they require from their application. By applying PrIMe, developers can make applications provenance-aware, which are then able to satisfy provenance use cases, where a use case is a description of a scenario in which a user interacts with a system by performing particular functions on that system, and a provenance use case is a use case requiring documentation of past processes in order to achieve the functions.

The provenance architecture satisfies provenance use cases by making extra information available: documentation of past processes and extra information that can be derived from such documentation. PrIMe is a guided approach to makeing application information available to querying actors by modifying an application through applying a series of well-specified adaptations.

When developing provenance-aware applications, PrIMe aims to fulfill the following criteria.

Usability PrIMe is easy to apply.

Traceability All design decisions made using PrIMe can be traced back to the requirements that informed them.

Applicability PrIMe can be, and has been, successfully applied to several distinct applications. Notably an organ transplant management application, an aerospace engineering application and a bioinformatics application.

Structure of PrIMe

The overall structure of PrIMe is shown in Figure 1. Each oval in the diagram corresponds to a distinct step within the methodology and the lines between each step indicate how they are related. The dashed ovals delimit three different phases of the methodology, comprising: (i) the identification of provenance use cases and the pieces of information that will be used to answer them, (ii) the decomposition of the application into a set of actors and their interactions and, (iii) applying a set of principled adaptations to the application in order to ensure the required information items are available for documentation.

Traversing these steps, PrIMe starts from the application itself. PrIMe assumes that the structure and purpose of the application is known beforehand. This does not mean that the application must already exist, but that the overall functionality of the application has been identified and the general structure has been determined. Given this assumption, the steps through PrIMe are as follows.

Phase 1

Step 1.1: Provenance use case analysis.
Step 1.2: Identify use case information items.

Phase 2

Step 2.1: Identify application actors.
Step 2.2: Map out actor interactions.
Step 2.3: Identify knowledgeable actors.

Phase 3

Step 3.1: Introduce application adaptations.

The steps are taken as follows. First, an analysis is performed to identify the set of provenance use cases that are to be answered by the provenance architecture (Step 1.1), then the information items (pieces of information) that are required in order to answer these use case questions are identified (Step 1.2). The application structure is then examined to identify the application actors (Step 2.1), and from here the interactions between application actors are mapped out (Step 2.2), thus revealing the information flow through the application. Once this is done, it is then possible to determine which application actors have data representing the information items necessary to answer the use cases as the application is run: these actors are called knowledgeable actors and are identified in Step 2.3.

At this point, it may become clear that the decomposition of the application into actors has not been at the right level of granularity, i.e. it is still not possible to identify an actor that has access to an information item. In this case, the process of identifying actors and interactions is repeated until an actor can be located that knows about the information item in question. Finally, adaptations are introduced into the application in order to expose information items and add provenance functionality (Step 3.1). This last step involves giving actors the capability to record process documentation so that it can be produced and stored in provenance stores to allow querying actors to perform queries on the documentation in order to answer provenance use cases.

Figure 1. The Structure of PrIMe

Key Concepts in PrIMe

PrIMe employs several key concepts, which we describe briefly below

Provenance use case questions

In Phase 1 of PrIMe, the kinds of provenance related questions to be answered about the application must be identified. These provenance use case questions determine how PrIMe will be applied by highlighting which parts of execution needs to be documented and subsequently which parts of the application must be made provenance aware. Use cases in this sense are similar to those found in UML, i.e. descriptions of scenarios in which users interact with an application. They drive the process of making an application provenance-aware by informing application developers of the granularity of the processes to be considered and the critical information to expose.

Some examples of generalised use case question are listed below.

What are the details of the process that produced this result?
Two processes, thought to be performing the same steps on the same inputs, have been run and produced different results. Was this because of a change in the inputs, the steps making up the process or the configuration of the process?
Did the process that produced this result use the correct types of data at each stage?
Did the process that produced this result follow the original plan?
Did the process that produced this result meet with regulatory rules?
What actions were this data used in performing and what actions were performed on this data?
What were the settings/configuration of the services/tools/machines used in the process that produced this result?
Where data is collated from multiple processes, what were the processes that fed into the process that produced this result?
Which of these processes resulted in a satisfactory conclusion (by some criteria)?

Information items

When considering how to answer a use case, it is necessary to identify the information items that would provide answers; there may be many such items, e.g., a given result, or a sequence of decisions.

For each core provenance use case, identify the information items (pieces of information) required in order to satisfy the use case.
For each process in the system, identify the additional items of information that could be exposed and may be useful in future provenance use cases.

Actors

Once information items have been identified, it is necessary to associate every information item with a particular component within the application. To achieve this, PrIMe decomposes the application into a set of actors and performs an analysis of their interactions. This approach is similar in nature to object oriented approaches to modelling systems, which decompose applications into classes and objects.

Decomposing an application into actors follows an iterative approach, comprising the following three steps. Step 2.1: Identify an initial set of application actors. Step 2.2: Map out the interactions between these actors. Step 2.3: Identify those actors that have access to the identified information items. These steps may need to be repeated if it is discovered that no actor can be identified at the current level of granularity that has access to a use case related information item

Some key points regarding actors are as follwos.

An actor is an entity within the application that performs actions, e.g. Web Services, components, machines, people etc. and interacts with other actors.
One actor may be seen as being composed of other actors.
A primitive actor is one for which the designers do not know the other actors of which it is composed (or do, but the decomposition is deemed to be too detailed to be relevant).
A role is a place-holder for an actor performing a particular function, where we cannot know exactly which actor will perform that function during the application's execution.
Roles can be composite or primitive as with actors.

Knowledgeable Actors

Any actor that has access to an information item is known as a knowledgeable actor and, as stated above, the aim is to associate every information item necessary to answer a use case question with such an actor.

Some key points about knowledeable actors are as follows.

A knowledgeable actor is an actor that has access to an information item
The primary knowledgeable actor for an information item is the primitive actor who first becomes aware of that information, which will be because:
- the actor creates the item, or
- the actor receives or observes the item from outside the application, or
- the item is a subjective assertion about another information item, e.g. a declaration that a message was received from another actor

For some information items, there will be no primary knowledgeable actor

Adaptations

PrIMe provides several application adaptations that can be used to reveal information items that are currently inaccessible, and to provide modifications to actors to enable them to record process documentation.

The adaptations involve the following.

Adaptations are changes made to the flow of information in the application to ultimately expose information items to the clients who will use the documentation of process at query time.
Many adaptations cause a new actor to become knowledgeable about an information item
Each adaptation requires a change in the application design, and many of these make use of the components of the provenance architecture

Conclusion

PrIMe provides a step-by-step guide to making applications provenance-aware, and is vital to the development of provenance-aware applications. Application developers and users will only consider making their applications provenance-aware if they can see a clear and easy way to modify their applications to provide this functionality. Any development is a trade off between the effort and resources required to effect the development and the gains to be made by doing so. The availability of PrIMe for developers and users of applications helps to ensure that the effort required to make applications provenance-aware is minimised.

2006

Steve Munroe, Simon Miles, Victor Tan, Paul Groth, Sheng Jiang, Luc Moreau, and John Ibbotson, and Javier Vázquez-Salceda. PrIMe: A Methodology for Developing Provenance-Aware Applications. Technical report, University of Southampton, 2006. [WWW ]