Once the ontological concepts have been identified and captured in the ontology, the abstract workflow is constructed with the WDO-It! tool by creating "instances" of the ontology concepts, and connecting Data and Method concepts accordingly to specify the intended workflow behavior. In addition, the abstract workflows created with the WDO-It! tool ground Data concepts to Sources (and Sinks), which are concepts that are reused from the provenance component of the Proof-Markup Language (PML-P). With respect to PML, Sources and Sinks are equivalent and we only refer to them as Sources. Sources represent the entitites where data is coming from (or where the data is eventually going to). For example, a Source can be a Database, a Document, or a Human user. These are represented as ovals in the workflow graph.
Finally, different levels of abstraction are also supported. The first workflow represents the most abstract workflow representation of the PC3 workflow. The second workflow, on the other hand, represents a lower level of abstraction of the "PopulateDB" method shown in the first workflow.
One benefit of authoring abstract workflows using WDO-It! is the ability to generate �wrappers� and �data annotators,� which are modules designed to capture and encode provenance associated with an abstract workflow, during runtime and post-runtime respectively. The main distinction between the two logging methods has to do with when the provenance is logged, which has ultimately has implications on how it is logged. Certain properties of the workflow will dictate when one method should be used over the other, for example when intermediate artifacts are not persisted during execution of the workflow, a wrapper approach must be used to capture these intermediate artifacts before they are lost, as is the case when running the PC3 workflow using the Java version. In this case, the intermediate results only exist as Java objects that get removed from memory at the end of execution, thus a wrapper approach is necessary to capture these objects during runtime before they are destroyed. This implies however that the workflow be instrumented to invoke wrapper modules thus requiring alterations to an otherwise tried and tested workflow.
If a workflow does not delete intermediate results, then the non-invasive �data annotation� method can be used. This module can �piece together� provenance by chaining the intermediate results based on their �wasDerivedFrom� relationship. When running the batch version of the PC3 workflow, provenance could be captured with a data annotator because the batch files do not cleanup the intermediate XML files that get dumped.
It is important to note that most of the information needed to generate a fully functional wrapper or data annotator is contained in the abstract workflow. All the relationships between data, methods, and PML sources in a particular workflow are captured in WDO-It! and this knowledge is leveraged to help generate a wrapper or data annotator that needs very minor tweaks to get to work.
For this challenge we opted to use the batch version of the PC3 workflow and employed a wrapper approach for logging provenance, even though we could have used a data annotator. Provenance for this workflow was encoded in the Proof Markup Language (PML), the default encoding language of both the wrappers and data annotators. Our PML based provenance dump for the PC3 workflow can be found here. The start nodeset of the PML provenance graph can be found here.
Probe-It! consists of three primary views to accommodate the different kinds of provenance information: result view, global justification view, and local information view, which refer to final and intermediate data, descriptions of the generation process as a whole, and information about a specific step in the process respectively. Below is a partial PML trace of the PC3 workflow as visualized in Probe-It! The orange boxes on the top left and top right correspond to the workflow inputs, the XML file encoding the CSVRootPath? and the JobID? respectively. The arcs represent the "usedBy" relationship in OPM. However this is not an OPM graph and in PML terms the arcs actually represent the "hasAntecedents" relation.
* Probe-It! screen shot visualizing PC3 PML:
I | Attachment ![]() | Action | Size | Date | Who | Comment |
---|---|---|---|---|---|---|
![]() | PopulateDBWorkflow.JPG | manage | 38.7 K | 04 Jun 2009 - 15:30 | PauloPinheirodaSilva | Subworkflow about the process of populating the database |
![]() | ThirdPCWorkflow.JPG | manage | 19.7 K | 04 Jun 2009 - 15:31 | PauloPinheirodaSilva | Abstract workflow for the process of the third provenance challenge |
![]() | probeit.png | manage | 181.6 K | 04 Jun 2009 - 18:58 | PauloPinheirodaSilva | Probe-It! screen shot visualizing PC3 PML |