This page describes the Provenance Incorporating Methodology (PrIMe).
Provenance is already well understood in the study of fine art where it refers to the trusted, documented history of some art object. Given that documented history, the object attains an authority that allows scholars to understand and appreciate its importance and context relative to other works. Art objects that do not have a trusted, proven history may be treated with some scepticism by those that study and view them. This same concept of provenance may also be applied to data and information generated within computer applications.
In general, computer applications produce data, and making an application provenance-aware allows its users to understand the provenance of their data, understood as the process that led to that data. To be able to determine the provenance of data, it must be possible to document an application's execution and to then perform queries over that documentation. Such documentation is called process documentation and is comprised of multiple, individual pieces of information, called p-assertions, which are recorded during execution and then stored and maintained in a repository of such information called a provenance store. One difficulty that remains, however, is to ensure that necessary and sufficient forms of process documentation are captured so that queries can return a satisfactory account of a given data item's provenance. It is the role of such software engineering tools as PrIMe to ensure this is achieved.
PrIMe is a tool to be used by system developers who make modifications to applications by applying the steps of PrIMe after querying the application's users for the kinds of information they require from their application. By applying PrIMe, developers can make applications provenance-aware, which are then able to satisfy provenance use cases, where a use case is a description of a scenario in which a user interacts with a system by performing particular functions on that system, and a provenance use case is a use case requiring documentation of past processes in order to achieve the functions.
The provenance architecture satisfies provenance use cases by making extra information available: documentation of past processes and extra information that can be derived from such documentation. PrIMe is a guided approach to makeing application information available to querying actors by modifying an application through applying a series of well-specified adaptations.
When developing provenance-aware applications, PrIMe aims to fulfill the following criteria.
Usability PrIMe is easy to apply.
Traceability All design decisions made using PrIMe can be traced back to the requirements that informed them.
Applicability PrIMe can be, and has been, successfully applied to several distinct applications. Notably an organ transplant management application, an aerospace engineering application and a bioinformatics application.
The overall structure of PrIMe is shown in Figure 1. Each oval in the diagram corresponds to a distinct step within the methodology and the lines between each step indicate how they are related. The dashed ovals delimit three different phases of the methodology, comprising: (i) the identification of provenance use cases and the pieces of information that will be used to answer them, (ii) the decomposition of the application into a set of actors and their interactions and, (iii) applying a set of principled adaptations to the application in order to ensure the required information items are available for documentation.
Traversing these steps, PrIMe starts from the application itself. PrIMe assumes that the structure and purpose of the application is known beforehand. This does not mean that the application must already exist, but that the overall functionality of the application has been identified and the general structure has been determined. Given this assumption, the steps through PrIMe are as follows.
Phase 1
Phase 2
Phase 3
The steps are taken as follows. First, an analysis is performed to identify the set of provenance use cases that are to be answered by the provenance architecture (Step 1.1), then the information items (pieces of information) that are required in order to answer these use case questions are identified (Step 1.2). The application structure is then examined to identify the application actors (Step 2.1), and from here the interactions between application actors are mapped out (Step 2.2), thus revealing the information flow through the application. Once this is done, it is then possible to determine which application actors have data representing the information items necessary to answer the use cases as the application is run: these actors are called knowledgeable actors and are identified in Step 2.3.
At this point, it may become clear that the decomposition of the application into actors has not been at the right level of granularity, i.e. it is still not possible to identify an actor that has access to an information item. In this case, the process of identifying actors and interactions is repeated until an actor can be located that knows about the information item in question. Finally, adaptations are introduced into the application in order to expose information items and add provenance functionality (Step 3.1). This last step involves giving actors the capability to record process documentation so that it can be produced and stored in provenance stores to allow querying actors to perform queries on the documentation in order to answer provenance use cases.
Figure 1. The Structure of PrIMe
Some examples of generalised use case question are listed below.
Once information items have been identified, it is necessary to associate every information item with a particular component within the application. To achieve this, PrIMe decomposes the application into a set of actors and performs an analysis of their interactions. This approach is similar in nature to object oriented approaches to modelling systems, which decompose applications into classes and objects.
Decomposing an application into actors follows an iterative approach, comprising the following three steps. Step 2.1: Identify an initial set of application actors. Step 2.2: Map out the interactions between these actors. Step 2.3: Identify those actors that have access to the identified information items. These steps may need to be repeated if it is discovered that no actor can be identified at the current level of granularity that has access to a use case related information item
Some key points regarding actors are as follwos.
Some key points about knowledeable actors are as follows.
The adaptations involve the following.
PrIMe provides a step-by-step guide to making applications provenance-aware, and is vital to the development of provenance-aware applications. Application developers and users will only consider making their applications provenance-aware if they can see a clear and easy way to modify their applications to provide this functionality. Any development is a trade off between the effort and resources required to effect the development and the gains to be made by doing so. The availability of PrIMe for developers and users of applications helps to ensure that the effort required to make applications provenance-aware is minimised.
2006