Our OPM output is here for a successful execution, and here for a failed execution (IsExistsCSVFile? fails). The output is XML using the OPM v1.01.a schema by Paul Groth and Luc Moreau. Here is the opm2dot graph for the successful execution.
We implemented our queries in XQuery 1.0. For each query, we load the provenance XML document into
For a given detection, which CSV files contributed to it?
LoadCSVFileIntoTable? tells the database to import the detections directly from a file. Since we did not instrument the database, we added an output to LoadCSVFileIntoTable?, called Detections, which outputs the detection values. We can then query for a specific detection value, e.g., 261887437030025141.
(: get the artifact id containing the detection :) let $artifactId := opmLib:getArtifactIdsThatContainsValue($graph, "261887437030025141") (: get the process that generated it :) let $inputs := opmLib:getImmediateAncestorUseds($graph, $artifactId) (: get the artifact with role FileEntry :) for $used in $inputs, $artifact in $graph/artifacts/artifact where $used/role/@value = "FileEntry" and $used/cause/@id = $artifact/@id return $artifact/value
The output is a FileEntry? used by LoadCSVFileIntoTable?. This is a composite artifact, and support for accessing sub-artifacts would allow extracting the file name.
Output:
<value> {Checksum = "f8f9d70711cb3a1cb8b359d99d98fa63", ColumnNames = {"objID", "detectID", "ippObjID", "ippDetectID", "filterID", "imageID", "obsTime", "xPos", "yPos", "xPosErr", "yPosErr", "instFlux", "instFluxErr", "psfWidMajor", "psfWidMinor", "psfTheta", "psfLikelihood", "psfCf", "infoFlag", "htmID", "zoneID", "assocDate", "modNum", "ra", "dec", "raErr", "decErr", "cx", "cy", "cz", "peakFlux", "calMag", "calMagErr", "calFlux", "calFluxErr", "calColor", "calColorErr", "sky", "skyErr", "sgSep", "dataRelease"}, FilePath = "pc3/workflows/data/J062941/P2_J062941_B001_P2fits0_20081115_P2Detection.csv", HeaderPath = "pc3/workflows/data/J062941/P2_J062941_B001_P2fits0_20081115_P2Detection.csv.hdr", RowCount = 20, TargetTable = "P2Detection"} </value>
The user considers a table to contain values they do not expect. Was the range check (IsMatchTableColumnRanges) performed for this table?
(: find artifact containing table name :) let $artifactIds := opmLib:getArtifactIdsThatContainsValue($graph, 'TargetTable = "P2Detection"') (: find the one used by LoadCSVFileIntoTable :) let $artifactId := opmLib:getArtifactIdsUsedByProcessValue($graph, $artifactIds, "LoadCSVFileIntoTable") (: see if any descendant processes were IsMatchTableColumnRanges :) let $found := (for $process in opmLib:getAllDescendantProcesses($graph, $artifactId) where contains($process/value, "IsMatchTableColumnRanges") return $process) return if(count($found) = 0) then "no" else "yes"
Output:
yes
Which operation executions were strictly necessary for the Image table to contain a particular (non-computed) value?
(: find artifacts containing the image table name :) let $artifactIds := opmLib:getArtifactIdsThatContainsValue($graph, 'TargetTable = "P2ImageMeta"') (: get the artifact id that was used by LoadCSVFileIntoTable :) let $id := (for $id in $artifactIds, $used in $graph/causalDependencies/used, $process in $graph/processes/process where $id = $used/cause/@id and $used/effect/@id = $process/@id and contains($process/value, "LoadCSVFileIntoTable") return $id) (: return all processes that led to that artifact :) return opmLib:getAllAncestorProcesses($graph, $id)
Output:
<process id="_p0"> <value>.load.IsCSVReadyFileExists fire 0</value> </process> <process id="_p1"> <value>.load.StopOnFalse fire 0</value> </process> <process id="_p2"> <value>.load.ReadCSVReadyFile fire 0</value> </process> <process id="_p3"> <value>.load.IsMatchCSVFileTables fire 0</value> </process> <process id="_p4"> <value>.load.StopOnFalse2 fire 0</value> </process> <process id="_p5"> <value>.load.CreateEmptyLoadDB fire 0</value> </process> <process id="_p6"> <value>.load.Array Permute fire 0</value> </process> <process id="_p8"> <value>.load.ForEach.in fire 0</value> </process> <process id="_p27"> <value>.load.ForEach.CompositeActor.in fire 3</value> </process> <process id="_p28"> <value>.load.ForEach.CompositeActor.Record Disassembler fire 1</value> </process> <process id="_p29"> <value>.load.ForEach.CompositeActor.IsExistsCSVFile fire 1</value> </process> <process id="_p30"> <value>.load.ForEach.CompositeActor.StopOnFalse fire 1</value> </process> <process id="_p31"> <value>.load.ForEach.CompositeActor.ReadCSVFileColumnNames fire 1</value> </process>
The workflow halts due to failing an IsMatchTableColumnRanges check. How many tables successfully loaded before the workflow halted due to a failed check?
(: count how many times IsMatchTableColumnRangesOutput was executed. :) let $num := count(for $wgbs in $graph/causalDependencies/wasGeneratedBy where $wgbs/role/@value = "IsMatchTableColumnRangesOutput" return $wgbs) (: since it halted, n - 1 tables were loaded. :) return $num - 1
Output:
2
A CSV or header file is deleted during the workflow's execution. How much time expired between a successful IsMatchCSVFileTables test (when the file existed) and an unsuccessful IsExistsCSVFile? test (when the file had been deleted)?
(: find the wasGeneratedBy of the false output from IsExistsCSVFile :) let $fail := opmLib:getWasGeneratedBy($graph, "IsExistsCSVFile", "false")/time (: find the wasGeneratedBy of the true output from IsMatchCSVFileTables :) let $ok := opmLib:getWasGeneratedBy($graph, "IsMatchCSVFileTables", "true")/time (: return elapsed seconds :) let $diff := xs:time($fail/noLaterThan) - xs:time($ok/noEarlierThan) return $diff div xs:dayTimeDuration('PT1S')
Output:
1.562
Determine the step where halt occured?
(: get the last used or wasGeneratedBy relation :) let $last:= $graph/causalDependencies/(used|wasGeneratedBy)[last()] let $processId := if(name($last) = "used") then $last/effect/@id else $last/cause/@id return $graph/processes/process[@id=$processId]
Output:
<process id="_p13"> <value>.load-for-opt-query3.ForEach.CompositeActor.StopOnFalse fire 0</value> </process>
Which steps were completed successfully before the halt occurred?
(: get the last used or wasGeneratedBy relation :) let $last:= $graph/causalDependencies/(used|wasGeneratedBy)[last()] let $artifactId := if(name($last) = "used") then $last/cause/@id else $last/effect/@id return opmLib:getAllAncestorProcesses($graph, $artifactId)
Output:
<process id="_p0"> <value>.load-for-opt-query3.IsCSVReadyFileExists fire 0</value> </process> <process id="_p1"> <value>.load-for-opt-query3.StopOnFalse fire 0</value> </process> <process id="_p2"> <value>.load-for-opt-query3.ReadCSVReadyFile fire 0</value> </process> <process id="_p3"> <value>.load-for-opt-query3.IsMatchCSVFileTables fire 0</value> </process> <process id="_p4"> <value>.load-for-opt-query3.StopOnFalse2 fire 0</value> </process> <process id="_p5"> <value>.load-for-opt-query3.CreateEmptyLoadDB fire 0</value> </process> <process id="_p6"> <value>.load-for-opt-query3.Array Permute fire 0</value> </process> <process id="_p8"> <value>.load-for-opt-query3.ForEach.in fire 0</value> </process> <process id="_p10"> <value>.load-for-opt-query3.ForEach.CompositeActor.in fire 1</value> </process> <process id="_p11"> <value>.load-for-opt-query3.ForEach.CompositeActor.Record Disassembler fire 0</value> </process> <process id="_p12"> <value>.load-for-opt-query3.ForEach.CompositeActor.IsExistsCSVFileFail fire 0</value> </process>
For a workflow execution, determine the user inputs?
(: find all artifacts in a used relation, but not in a wasGeneratedBy relation. :) let $used := $graph/causalDependencies/used/cause/@id let $wasGeneratedBy := $graph/causalDependencies/wasGeneratedBy/effect/@id (: find the difference :) let $diff := distinct-values($used[not(.=$wasGeneratedBy)]) (: return the artifacts :) return $graph/artifacts/artifact[@id=$diff]
Output:
<artifact id="0"> <value>"pc3/workflows/data/J062941"</value> </artifact> <artifact id="6"> <value>"J062941"</value> </artifact> <artifact id="8"> <value>"Record"</value> </artifact>
For a workflow execution, determine steps that required user inputs?
(: get artifacts ids of user inputs (from optional query 10) :) let $used := $graph/causalDependencies/used/cause/@id let $wasGeneratedBy := $graph/causalDependencies/wasGeneratedBy/effect/@id let $diff := distinct-values($used[not(.=$wasGeneratedBy)]) (: find processes that directly used these artifacts :) return opmLib:getImmediateUsedProcessesForArtifactId($graph, $diff)
Output:
<process id="_p0"> <value>.load.IsCSVReadyFileExists fire 0</value> </process> <process id="_p2"> <value>.load.ReadCSVReadyFile fire 0</value> </process> <process id="_p5"> <value>.load.CreateEmptyLoadDB fire 0</value> </process> <process id="_p6"> <value>.load.Array Permute fire 0</value> </process>
-- DanielCrawl - 31 Mar 2009
I | Attachment ![]() | Action | Size | Date | Who | Comment |
---|---|---|---|---|---|---|
![]() | pc3-load.png | manage | 43.5 K | 14 Apr 2009 - 16:46 | DanielCrawl | screenshot of load workflow |
![]() | pc3-J062941.out.xml | manage | 96.7 K | 18 May 2009 - 22:43 | DanielCrawl | provenance for J062941 INVALID |
![]() | pc3-J062941.opt-query3.xml | manage | 22.1 K | 18 May 2009 - 22:36 | DanielCrawl | provenance for J062941 that fails |
![]() | j41.png | manage | 941.2 K | 01 May 2009 - 01:07 | DanielCrawl | opm2dot of pc3-J062941.out.xml |
![]() | pc3-J062941.good.xml | manage | 96.7 K | 18 May 2009 - 22:45 | DanielCrawl | provenance for J062941 |