In progress
Provide here a description of how you have encoded the Challenge workflow.
Upload a representation of the information you captured when executing the workflow. Explain the structure (provide pointers to documents describing your schemas etc.)
Sa sample log of the provenance activities generated by the workflow/services is shown here notifications.xml.
The Karma Service API supports 2 kinds of provenance retrieval: Data Provenance and Process Provenance. It also supports variations of these that can retrieve RecursiveDataProvenance?, DataUsage?, and WorkflowTrace?. Results of these provenance queries on the given workflow are shown here:
These query APIs form the building blocks for constructing the different "canonical" provenance queries in the challenge. Karma does not provide extensive support for annotations at the level of data products. We take the approach that the provenance system is not a generic metadata management system and should be focused mainly on storing and retreiving provenance. In the LEAD project where Karma is used, queries over generic data product metadata and provenance are achieved by pushing the provenance into the metadata for the data product and allow the MyLEAD metadata management system to answer the "join" queries.
Limited support for queries over annotations is present and has been used to answer the challenge queries that include annotations (except for #9). Some of them has required us to query the provenance service's backend relational database, since support for queries over annotation is not present through the service API yet.
For each query, if your system can support your query, provide a description of how you implement the query, what result is returned; otherwise, explain whether the query is in the remit of your system.
Also, make sure you complete the ProvenanceQueriesMatrix.
Teams | Queries | ||||||||||
Q1 | Q2 | Q3 | Q4 | Q5 | Q6 | Q7 | Q8 | Q9 | |||
Karma team | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() |
* Complete support not available through Karma's Web-Service API. SQL query on backend database required.
The getRecursiveDataProvenance API provided by the Karma provenance service allows the retrieval of the entire data provenance history of a data product. Invoking that method with the data product ID of Atlas X Graphic (in this case, 'lead:uuid:1157946992-atlas-x.gif') returns the complete process that led to its creation. The result of the provenance query is shown in recursive_data_provenance.xml.
This query is performed by the client by first invoking the getDataProvenance method on the Karma provenance service to retreive the immediate data provenance for Atlas X Graphic. The client then recursively calls getDataProvenance to get move up the provenance tree until the SoftmeanService is encountered in the data provenance results. The pseudo-code for the client looks like this:
PrintRecursiveDataProvenanceUntil('lead:uuid:1157946992-atlas-x.gif', 'urn:qname:...:SoftmeanService'); void PrintRecursiveDataProvenanceUntil(DataProductID dataProduct, URI processID) 1. let $dataList := [dataProduct] 2. while ($dataList != empty) do a. $dataProvenance = karma.getDataProvenance($dataList[0]) // get data provenance for this level b. Print $dataProvenance; $dataList.delete(0) // print process information & remove data from list c. if ($dataProvenance.getProducedBy() == processID) break; // found Softmean. Stop. d. foreach ($inputData in $dataProvenance.getUsingData()) do // get input data used by this data product. recurse up the tree using iteration i. $dataList.add($inputData) 3. End
The results of this operation is shown in query2.txt.
This query is different from #2 in that the provenance levels are relative to the file, instead of being specified explicitly as 'Softmean'. The getRecursiveDataProvenance API in the Karma provenance service has an optional parameter to specify the depth of recursion. By passing a recursion level of 3 in addition to the data product ID of Atlas X Graphic (in this case, 'lead:uuid:1157946992-atlas-x.gif'), it is possible to retreive the data provenance for stages 3,,4, and 5. The result of the provenance query is shown in query3.xml.
The Karma provenance service is primarilly intended as a provenance recording and querying system, and only has limited capabiltiy for recording generic metadata and annotations. Provenance activities can have annotations and relevant activities also contain the messages that were exchanged by service and client to perform an operation. These activities are recorded in a relational database and free text queries are possible on the annotations using SQL queries. Direct SQL queries is currently not exposed to the client but provenance service has the capability to answer these queries as follows:
SELECT invokee.workflow_id, invokee.service_id, invokee.workflow_node_id, invokee.workflow_timestep, invoker.workflow_id, invoker.service_id, invoker.workflow_node_id, invoker.workflow_timestep FROM invocation_state_table invocation, entity_table invokee, entity_table invoker, notification_table notifications WHERE invokee.entity_id = invocation.invokee_id AND invoker.entity_id = invocation.invoker_id AND notifications.source_id = invocation.invokee_id AND notifications.notification_type = 'ServiceInvoked' AND invokee.service_id = 'urn:qname:http://www.extreme.indiana.edu/karma/challenge06:AlignWarpService' AND notifications.notification_xml LIKE '%<ModelMenuNumber>12</ModelMenuNumber>%' AND DAYOFWEEK(invocation.request_receive_time) = 2; // 1=Sunday, 2=Monday, ...In our example (assuming the workflow was run on a Monday instead of actually Sunday), this query returns:
Entity | workflow_id | service_id | workflow_node_id | workflow_timestep |
Invokee 1 | 'tag:gpel.leadproject.org,2006:69B/ProvenanceChallengeBrainWorkflow17/instance1' | 'urn:qname:http://www.extreme.indiana.edu/karma/challenge06:AlignWarpService' | 'AlignWarpService' | 6 |
Invokee 2 | 'tag:gpel.leadproject.org,2006:69B/ProvenanceChallengeBrainWorkflow17/instance1' | 'urn:qname:http://www.extreme.indiana.edu/karma/challenge06:AlignWarpService' | 'AlignWarpService_2' | 8 |
Invokee 3 | 'tag:gpel.leadproject.org,2006:69B/ProvenanceChallengeBrainWorkflow17/instance1' | 'urn:qname:http://www.extreme.indiana.edu/karma/challenge06:AlignWarpService' | 'AlignWarpService_3' | 10 |
Invokee 4 | 'tag:gpel.leadproject.org,2006:69B/ProvenanceChallengeBrainWorkflow17/instance1' | 'urn:qname:http://www.extreme.indiana.edu/karma/challenge06:AlignWarpService' | 'AlignWarpService_4' | 12 |
Invoker | - | 'tag:gpel.leadproject.org,2006:69B/ProvenanceChallengeBrainWorkflow17/instance1' | - | - |
In the workflow we execute, the command-line applications are wrapped by shell script that can perform pre- and post-processing. We incorporate a call to the scanheader utility within the wrapper for align_warp and have it include the output of the scanheader in the ServiceInvoked activity's annotation. Now the query becomes similar to the previous case:
SELECT invokee.workflow_id, invokee.service_id, invokee.workflow_node_id, invokee.workflow_timestep, invoker.workflow_id, invoker.service_id, invoker.workflow_node_id, invoker.workflow_timestep FROM entity_table invokee, entity_table invoker, notification_table notifications, invocation_state_table invocation WHERE invokee.entity_id = invocation.invokee_id AND invoker.entity_id = invocation.invoker_id AND notifications.source_id = invocation.invokee_id AND notifications.notification_type = 'ServiceInvoked' AND invokee.service_id = 'urn:qname:http://www.extreme.indiana.edu/karma/challenge06:AlignWarpService' AND notifications.notification_xml LIKE '%global_maximum=4095%'In our example, this query returns:
Entity | workflow_id | service_id | workflow_node_id | workflow_timestep |
Invokee_1 | 'tag:gpel.leadproject.org,2006:69B/ProvenanceChallengeBrainWorkflow17/instance1' | 'urn:qname:http://www.extreme.indiana.edu/karma/challenge06:AlignWarpService' | 'AlignWarpService' | 6 |
Invokee_2 | 'tag:gpel.leadproject.org,2006:69B/ProvenanceChallengeBrainWorkflow17/instance1' | 'urn:qname:http://www.extreme.indiana.edu/karma/challenge06:AlignWarpService' | 'AlignWarpService_2' | 8 |
Invokee_3 | 'tag:gpel.leadproject.org,2006:69B/ProvenanceChallengeBrainWorkflow17/instance1' | 'urn:qname:http://www.extreme.indiana.edu/karma/challenge06:AlignWarpService' | 'AlignWarpService_3' | 10 |
Invokee_4 | 'tag:gpel.leadproject.org,2006:69B/ProvenanceChallengeBrainWorkflow17/instance1' | 'urn:qname:http://www.extreme.indiana.edu/karma/challenge06:AlignWarpService' | 'AlignWarpService_4' | 12 |
Invoker_0 | - | 'tag:gpel.leadproject.org,2006:69B/ProvenanceChallengeBrainWorkflow17/instance1' | - | - |
PrintRecursiveDataUsageFor(Invokee_0, Invokee_1, 'urn:qname:...:ConvertService'); void PrintRecursiveDataUsageFor(EntityID invoker, EntityID invokee, URI processID) // get initial process's provenance 1. let $processProv := karma.getProcessProvenance(invoker, invokee) 1. let $processList := [$processProv], $visitedDataList := [], $outputDataList := [] // start recursing down the data usage tree iteratively 2. while ($processList != empty) do a. foreach ($processProv in $processList) do // test if any of the processes in the current list was 'ConvertService'. If so, print it's output image files. i. if $processProv.getInvokee().getServiceID() == processID Print $processProv.getProducingData() // add data products that were produced to the list of output to recurse into ii. Add all $processProv.getProducingData() to $outputDataList // we're done with these processes b. $processList := [] c. foreach ($outputData in $outputDataList) do // get the data usage list for the output data produced i. let $dataUsage := karma.getDataUsage($outputData) // get the process provenance for each process that used the output data and add them to process list ii. foreach ($usedByProcess in $dataUsage.getUsageList()) - let $processProv := karma.getProcessProvenance($usedByProcess.invoker, $usedByProcess.invokee) - Add $processProv to $processList // we're done with these data d. let $dataList := [] 3. End
The results of this operation is shown in query5.txt.
This is a variation of query 4 and query 5. The SQL query used to retreive the align_warp services that had model menu number value of -12 is the same as the query in #4 with the exception of the DAYOFWEEK predicate. Similarly, the client's recursive procedure to locate output of all SoftmeanServices? that were preceeded by these align_warps is similar to the recursive procedure outlined in query #5, with ConvertService being replaced by SoftmeanService. They're reproduced below.
SELECT invokee.workflow_id, invokee.service_id, invokee.workflow_node_id, invokee.workflow_timestep, invoker.workflow_id, invoker.service_id, invoker.workflow_node_id, invoker.workflow_timestep FROM invocation_state_table invocation, entity_table invokee, entity_table invoker, notification_table notifications WHERE invokee.entity_id = invocation.invokee_id AND invoker.entity_id = invocation.invoker_id AND notifications.source_id = invocation.invokee_id AND notifications.notification_type = 'ServiceInvoked' AND invokee.service_id = 'urn:qname:http://www.extreme.indiana.edu/karma/challenge06:AlignWarpService' AND notifications.notification_xml LIKE '%<ModelMenuNumber>12</ModelMenuNumber>%';
PrintRecursiveDataUsageFor(Invokee_0, Invokee_1, 'urn:qname:...:SoftmeanService'); (See Query #5 for definition)
The results of this operation is shown in query6.txt.
The getWorkflowTrace API if the Karma service returns the complete workflow trace for a workflow as an XML document. Given the workflow traces for two different workflows, it is possible to do a semantic "diff" of the two documents to find out the differences in the processes that were invoked and the data products used and produced, The pseudo-code for printing out the differences between two workflow traces is given below:
void PrintWorkflowTraceDiff(WorkflowTrace trace1, WorkflowTrace trace2) // Workflow trace is an extension of process procenance document 1. let $processProv1 := trace1 as ProcessProvenance 2. let $processProv2 := trace2 as ProcessProvenance 3. PrintProcessProvenanceDiff($processProv1, $processProv2) // Each step in the workflow trace is a process provenance document 4. foreach($processProv1, $processProv2 in trace1.getTraceSteps(), trace2.getTraceSteps() a. PrintProcessProvenanceDiff($processProv1, $processProv2) 5. End void PrintProcessProvenanceDiff(ProcessProvenance processProv1, ProcessProvenance processProv2) 1. Print "Diff of Processes: ", processProv1.getInvokee(), processProv2.getInvokee() 2. if (processProv1.getInvokee() != processProv2.getInvokee()) a. Print "Invokees Differ: ", processProv1.getInvokee(), processProv2.getInvokee() 3. if (processProv1.getInvoker() != processProv2.getInvoker()) a. Print "Invokers Differ: ", processProv1.getInvoker(), processProv2.getInvoker() 4. if (processProv1.getStatus() != processProv2.getStatus()) a. Print "Process Completion Status Differ: ", processProv1.getStatus(), processProv2.getInvoker() 5. if (processProv1.getRequestReceiveTime() != processProv2.getRequestReceiveTime()) a. Print "Invocation Times Differ: ", processProv1.getRequestReceiveTime(), processProv2.getRequestReceiveTime() 6. foreach ($dataProd1, $dataProd2 in processProv1.getUsingData(), processProv2.getUsingData()) a. PrintDataProductDiff($dataProd1, $dataProd2) 7. foreach ($dataProd1, $dataProd2 in processProv1.getProducingData(), processProv2.getProducingData()) a. PrintDataProductDiff($dataProd1, $dataProd2) 8. End void PrintDataProductDiff(DataProduct dataProd1, DataProduct dataProd2) 1. if (dataProd1.getDataProductID() != dataProd2.getDataProductID()) // trivial. IDs always differ. a. Print "Produced Data IDs Differ: ", dataProd1.getDataProductID(), dataProd2.getDataProductID() 2. if (dataProd1.getLocation() != dataProd2.getLocation()) a. Print "Produced Data Locations Differ: ", dataProd1.getLocation(), dataProd2.getLocation() 3. if (dataProd1.getTimestamp() != dataProd2.getTimestamp()) a. Print "Produced Data Timestamp Differ: ", dataProd1.getTimestamp(), dataProd2.getTimestamp() 4. End
The second workflow was not run and hence the query results for this are not available.
As noted earlier, the Karma service does not support detailed annotations at the file level, defering to an external Metadata management system such as MyLEAD. However, it supports generic annotations to be submitted as part of the provenance activities that can be queried upon. We use this facility to add metadata about the input anatomy images to the provenance activity and query it. This is again similar to queries #4, #5 and #6 in that a SQL query retrieves the invocations and we use the getProcessProvenance API of Karma to retrieve the output data products.
SELECT invokee.workflow_id, invokee.service_id, invokee.workflow_node_id, invokee.workflow_timestep, invoker.workflow_id, invoker.service_id, invoker.workflow_node_id, invoker.workflow_timestep FROM invocation_state_table invocation, entity_table invokee, entity_table invoker, notification_table notifications WHERE invokee.entity_id = invocation.invokee_id AND invoker.entity_id = invocation.invoker_id AND notifications.source_id = invocation.invokee_id AND notifications.notification_type = 'ServiceInvoked' AND invokee.service_id = 'urn:qname:http://www.extreme.indiana.edu/karma/challenge06:AlignWarpService' AND notifications.notification_xml LIKE '%<Center>UChicago</Center>%';
The Karma service does not support complex queries such as these on the data product annotations. One way to perform this query would have been to retrieve the annotations for atlas graphics with key studyModality having value visual or audio using a query similar to query #8 and then to filter out the keys at the client end. However, we do not expect to answer such queries through the provenance system and these will not be part of the provenance service API.
Suggest variants of the workflow that can exhibit capabilities that your system support.
Suggest significant queries that your system can support and are not in the proposed list of queries, and how you have implemented/would implement them. These queries may be with regards to a variant of the workflow suggested above.
According to your provenance approach, you may be able to provide a categorisation of queries. Can you elaborate on the categorisation and its rationale.
If your system can be accessed live (through portal, web page, web service, or other), provide relevant information here.
Provide here further comments.
Provide here your conclusions on the challenge, and issues that you like to see discussed at a face to face meeting.
-- YogeshSimmhan - 13 Sep 2006
to top
I | Attachment ![]() | Action | Size | Date | Who | Comment |
---|---|---|---|---|---|---|
![]() | KarmaBrainAtlasWF.gif | manage | 254.0 K | 11 Sep 2006 - 08:02 | YogeshSimmhan | Karma's Brain Atlas Workflow Composition in BPEL using XBaya |
![]() | KarmaBrainAtlasWF-bpel.xml | manage | 31.8 K | 11 Sep 2006 - 08:06 | YogeshSimmhan | BPEL Script for Workflow |
![]() | KarmaBrainAtlasWF.xwf | manage | 192.1 K | 11 Sep 2006 - 08:16 | YogeshSimmhan | Workflow representation that can be viewed/edited/launched from XBaya |
![]() | recursive_data_provenance.xml | manage | 28.5 K | 12 Sep 2006 - 02:45 | YogeshSimmhan | Data Provenance retrieved recursively for a data product and its ancestral data products (Results of Query 1) |
![]() | data_provenance.xml | manage | 1.2 K | 12 Sep 2006 - 02:45 | YogeshSimmhan | Data Provenance retrieved for a data product |
![]() | process_provenance.xml | manage | 1.8 K | 12 Sep 2006 - 02:46 | YogeshSimmhan | Process Provenance for a single service invocation |
![]() | workflow_trace.xml | manage | 23.1 K | 12 Sep 2006 - 02:46 | YogeshSimmhan | Workflow Trace for all invocations in a workflow |
![]() | karma.xsd | manage | 13.1 K | 12 Sep 2006 - 02:57 | YogeshSimmhan | Karma v2.x schema describing provenance documents |
![]() | query2.txt | manage | 5.3 K | 12 Sep 2006 - 03:45 | YogeshSimmhan | Results of Query 2 |
![]() | query3.xml | manage | 17.3 K | 12 Sep 2006 - 03:46 | YogeshSimmhan | Results of Query 3 |
![]() | query4.txt | manage | 7.0 K | 13 Sep 2006 - 13:44 | YogeshSimmhan | Results of Query 4 |
![]() | query5.txt | manage | 0.7 K | 13 Sep 2006 - 13:44 | YogeshSimmhan | Results of Query 5 |
![]() | query8.txt | manage | 0.9 K | 13 Sep 2006 - 13:46 | YogeshSimmhan | Results of Query 8 |
![]() | notifications.xml | manage | 123.2 K | 13 Sep 2006 - 13:46 | YogeshSimmhan | Sample Provenance Activity log generated by Workflow |
![]() | karma.ppt | manage | 904.0 K | 13 Sep 2006 - 13:47 | YogeshSimmhan | Presentation Draft |
![]() | query6.txt | manage | 0.4 K | 13 Sep 2006 - 13:55 | YogeshSimmhan | Results of Query 6 |