The Pre-Prototype: a proof of concept design for provenance
In order to clearly define the concept of provenance within the context of the
EU PROVENANCE project and provide the driving impetus for the subsequent design and implementation of a provenance architecture, we have chosen to design and develop a simple system, which we term the pre-prototype, that effectively articulates the project’s conception of provenance. We believe that this approach presents a more effective method of clarifying the concept of provenance in the project as opposed to producing a purely theoretical, universal definition of provenance that will inevitably be abstract or vague. In addition, the pre-prototype also serves to explore some of the design and implementation issues of the provenance architecture that the project aims to develop within a service-oriented grid computing environment.
The pre-prototype scenario is based on a conceptualized process for
Baking Victoria Sponge Cakes (BVSC); cake baking being a widely understood and intuitive process and thus well suited for the aims of the pre-prototype. The pre-prototype, when situated in the context of a service-oriented computing paradigm, provides a simple but comprehensive scenario for identifying the key characteristics and requirements of a provenance architecture. Figure1 shows the service-oriented view of the BVSC process in a sequence diagram.
By analysing the BVSC process we have identified two main types of provenance in the context of a Service Oriented Architecture (SOA): interaction provenance and actor provenance. Interaction provenance focuses on the capture of an execution trace while actor provenance concentrates on the information pertaining to participating entities. We have placed special emphasis on the interaction provenance, since services are usually dynamically discovered, aggregated, executed and discontinued in a virtual organization on the Grid. In this context, information on how services are invoked, what messages are passed among them, and when they are invoked are usually required in order for a workflow result to be analysed or for a workflow to be repeated.
In line with the common practice in Web Service design, we have used an XML schema to model all data types and messages for all interactions between BVSC services, and WSDL to describe all BVSC services. UML has been used to present graphically these data models and service interfaces.
A provenance service has been designed to carry out provenance recording and storage for the BVSC application. The decision to employ a service-oriented implementation is made based on several considerations. Firstly, provenance can provide more added value for complex distributed applications that are increasingly adopting a service-oriented view for modelling and software engineering, as demonstrated in grid computing. Secondly, a service-oriented implementation of the provenance infrastructure simplifies its integration into a SOA, thus facilitating the adoption of the infrastructure in SOA-based applications. Finally, a service-oriented provenance infrastructure deploys easily into heterogeneous distributed environments, thus facilitating the access, sharing and reuse of provenance data.
We have also developed a simple query API and used it to implement some sample queries. Although these queries are relatively limited in complexity, the query implementation at this stage demonstrates two important points. The first is that provenance data can be accessed through the designed algorithm. The second is the most important one, in that it demonstrates how provenance data can be used to answer questions. While there are undoubtedly many different questions in terms of application characteristics and many different ways of accessing and retrieving provenance data, the query algorithm and examples present a showcase for the viability of provenance usage.
BVSC services have been implemented using Apache Axis for both client-side stubs and server-side skeletons. The client-side stubs act as proxies for services. The server-side skeletons are then fully implemented to form a concrete Web service. Interaction and actor provenance recording, provenance store and query APIs have been implemented in the
PReServ software package.
PReServ provides a provenance service that allows services to record provenance as well as query stored provenance, a client library to record and query provenance, an Axis handler to automatically record SOAP messages exchanged between Axis web services and Axis clients and a sample web service demonstrating the use of the provenance service and Axis handler. We have also implemented a simple query mechanism and performed the queries discussed in the previous section against the BVSC provenance store.
In the pre-prototype, we have discussed the BVSC process, service-oriented analysis, design and implementation. We have developed an implementation of a provenance architecture for recording provenance for the BVSC application, and used this recorded provenance data for answering some sample queries. In developing the pre-prototype, we have achieved three objectives. Firstly, the pre-prototype provides a proof of concept for provenance and provenance infrastructure in the specific context of the BVSC application. Secondly, it provides guidelines towards the construction of a basic provenance architecture. Finally, it demonstrates a possible design and implementation pattern for provenance-enabled applications.
The BVSC application provides a simple but typical scenario from which key concepts, system requirements and software components have been identified as discussed above. However, the simplicity does impose some limitations on the investigation of other issues. For example, the BVSC process involves a linear sequence of services, and hence we have not considered the issue of iterative loops and/or parallel processing. The BVSC services are also implemented in a centralized fashion, and there are clearly additional issues of distribution and scalability to consider if a provenance infrastructure was implemented for a distributed Grid environment. The actor provenance in the BVSC application is currently vague and does not have explicit semantics associated with it. Actor provenance has to be modelled properly in order for it to be manipulated in a meaningful manner by end users. This is the future work.
For detailed information about the pre-prototype please read the
pre-prototype report.
To get a glimpse of the implementation of the pre-prototype please go to the EU PROVENANCE
demo page.
to top