Our aim is to provide a framework to represent minimal information necessary such that we can reason about provenance in scientific workflows. With this in mind, we distinguish between the specification of the workflow and the execution of the workflow. On the one hand, the workflow specification is composed of step-classes (alternatively called procedures, tasks, actors, processes, boxes) such as reslice in the challenge workflow. On the other hand, an execution of a workflow generates a partial order of steps, each of which has a set of input and output data objects. Each step is an instance of a step-class, and the input-output flow of data and class associated with each step must conform to the workflow specification. An example of step in the challenge workflow is �8.reslice� which is an instance of the step-class �reslice�. The proposed model of provenance for workflows can represent the relationship between a step-class (specification) and a step (execution). This is necessary for keeping track of the data and parameters used by each step of a workflow execution as well as to trace the data produced. The model is represented within a relational framework extended with transitive closure. It is implemented under Oracle 10.g and Java. An object layer together with a user interface (JDBC component) is provided. More details on our model and implementation of the Challenge workflow are provided in the following section.
Our model is composed of base tables which provide minimal information about the workflow execution, the workflow specification, and the relationships between the specification and execution. The model is represented though the E/R diagram below. The tables IMMCONTAINS, IMMSTEPCONTAINS and USERVIEW will be defined in the section Worflow Variants. The other tables and their schemas are listed below. We list the type of each column (VARCHAR are strings) and the primary keys (in bold). For each table, we give an example based on the Challenge workflow.
Figure 1 � E/R schema for the model of Provenance
Note: The content of the tables is provided at the end of this page (see images tablesData-1.jpg to tablesData-4.jpg).
As an example, the data whose ID is 1 has the name �Anatomy Image1� and is of type Anatomy Image.
DATAID NAME TYPE ----------------------------------------------------- 1 Anatomy Image1 Anatomy Image
DATAID ATTRIBUTE VALUE ----------------------------------------------------- 1 center UChicago
As an example, the table below expresses that the step whose id is 1 takes a nonlinear (linear=false), 12th order (order=12) parameter, and whose model is 1365 (model=1365). This parameter can be invoked by using �-m 12 �q� (description=-m 12 �q).
STEP ATTRIBUTE VALUE ----------------------------------------------------- 1 linear false 1 order 12 1 model 1365 1 description -m 12 -q
As an example, Step 2 (2.align_warp) takes as inputs data whose ID is 3 (Anatomy Image2), 4 (Anatomy Header2), 9 (Reference Image), and 10 (Reference Header). Each input in the example was taken on the same date, 8/8/2006.
STEP DATAID TS ----------------------------------------------------- 2 3 8/8/2006 2 4 8/8/2006 2 9 8/8/2006 2 10 8/8/2006
As an example, Step 10 (10. slicer) produced one output with a DATAID of 25 (Atlas X Slice) on 08/17/2006.
STEP DATAID TS ----------------------------------------------------- 10 25 8/17/2006
As an example, steps 1, 2, 3, and 4 are executions of the step-class �align_warp.�
STEP STEPCLASS TS ----------------------------------------------------- 1 align_warp 8/7/2006 2 align_warp 8/8/2006 3 align_warp 8/10/2006 4 align_warp 8/11/2006
stageInstance(step string, stage string) stores the relationship between a stage and a step. As an example, steps 1 through 4 are in stage 1.
STEP STAGE ----------------------------------------------------- 1 1 2 1 3 1 4 1
CREATE VIEW Process AS SELECT DISTINCT instanceOf.step step, instanceOf.stepClass stepClass, input.dataId input, dataInput.name inputName, output.dataId output, dataOutput.name outputName, input.ts time FROM instanceOf, input, data dataInput, output, data dataOutput WHERE instanceOf.step = input.step AND input.dataId = dataInput.dataId AND input.step = output.step AND output.dataId = dataOutput.dataId AND input.ts <= output.tsAs an example, step 13 is an execution of the step-class convert. It takes an input of DATAID 25 (Atlas X Slice) and was executed on 08/20, 2006. It does not take any parameters. It produces the output of DATAID 28 (Atlas X Graphic).
STEP STEPCLASS INPUT INPUTNAME OUTPUT OUTPUTNAME TIME ---------------------------------------------------------------- 13 convert 25 Atlas X Slice 28 Atlas X Graphic 8/20/2006
Teams | Queries | ||||||||||
Q1 | Q2 | Q3 | Q4 | Q5 | Q6 | Q7 | Q8 | Q9 | |||
UPenn team | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() |
For each query, we provide the corresponding SQL query and the results obtained. We also provide these results in a user-friendly format (as screenshots of our Java user interface).
SELECT DISTINCT step, stepClass, input, inputName, output, outputName FROM process START WITH output = ( SELECT dataId FROM data WHERE name = 'Atlas X Graphic' ) CONNECT BY PRIOR input = output ORDER BY step
STEP STEPCLASS INPUT INPUTNAME OUTPUT OUTPUTNAME ---------------------------------------------------------------- 1 align_warp 1 Anatomy Image1 11 Warp Parameters1 1 align_warp 10 Reference Header 11 Warp Parameters1 1 align_warp 2 Anatomy Header1 11 Warp Parameters1 1 align_warp 9 Reference Image 11 Warp Parameters1 10 slicer 23 Atlas Image 25 Atlas X Slice 10 slicer 24 Atlas Header 25 Atlas X Slice 13 convert 25 Atlas X Slice 28 Atlas X Graphic 2 align_warp 10 Reference Header 12 Warp Parameters2 2 align_warp 3 Anatomy Image2 12 Warp Parameters2 2 align_warp 4 Anatomy Header2 12 Warp Parameters2 2 align_warp 9 Reference Image 12 Warp Parameters2 3 align_warp 10 Reference Header 13 Warp Parameters3 3 align_warp 5 Anatomy Image3 13 Warp Parameters3 3 align_warp 6 Anatomy Header3 13 Warp Parameters3 3 align_warp 9 Reference Image 13 Warp Parameters3 4 align_warp 10 Reference Header 14 Warp Parameters4 4 align_warp 7 Anatomy Image4 14 Warp Parameters4 4 align_warp 8 Anatomy Header4 14 Warp Parameters4 4 align_warp 9 Reference Image 14 Warp Parameters4 5 reslice 11 Warp Parameters1 15 Resliced Image1 5 reslice 11 Warp Parameters1 16 Resliced Header1 6 reslice 12 Warp Parameters2 17 Resliced Image2 6 reslice 12 Warp Parameters2 18 Resliced Header2 7 reslice 13 Warp Parameters3 19 Resliced Image3 7 reslice 13 Warp Parameters3 20 Resliced Header3 8 reslice 14 Warp Parameters4 21 Resliced Image4 8 reslice 14 Warp Parameters4 22 Resliced Header4 9 softmean 15 Resliced Image1 23 Atlas Image 9 softmean 15 Resliced Image1 24 Atlas Header 9 softmean 16 Resliced Header1 23 Atlas Image 9 softmean 16 Resliced Header1 24 Atlas Header 9 softmean 17 Resliced Image2 23 Atlas Image 9 softmean 17 Resliced Image2 24 Atlas Header 9 softmean 18 Resliced Header2 23 Atlas Image 9 softmean 18 Resliced Header2 24 Atlas Header 9 softmean 19 Resliced Image3 23 Atlas Image 9 softmean 19 Resliced Image3 24 Atlas Header 9 softmean 20 Resliced Header3 23 Atlas Image 9 softmean 20 Resliced Header3 24 Atlas Header 9 softmean 21 Resliced Image4 23 Atlas Image 9 softmean 21 Resliced Image4 24 Atlas Header 9 softmean 22 Resliced Header4 23 Atlas Image 9 softmean 22 Resliced Header4 24 Atlas HeaderThe results can also be visualized within the user-friendly interface. On the top, the query is given by the user, and on the bottom the result (provenance) is represented through a graph. The user can access to the characteristics of a given step or data item by clicking on it (visualized on the right hand side).
* Query2
SELECT DISTINCT step, stepClass, input, inputName, output, outputName FROM Process START WITH output = ( SELECT dataId FROM data WHERE name = 'Atlas X Graphic' ) CONNECT BY PRIOR input = output MINUS ( SELECT DISTINCT step, stepClass, input, inputName, output, outputName FROM Process START WITH output IN ( SELECT input FROM Process WHERE stepClass = 'softmean' ) CONNECT BY PRIOR input = output )
STEP STEPCLASS INPUT INPUTNAME OUTPUT OUTPUTNAME ---------------------------------------------------------------- 10 slicer 23 Atlas Image 25 Atlas X Slice 10 slicer 24 Atlas Header 25 Atlas X Slice 13 convert 25 Atlas X Slice 28 Atlas X Graphic 9 softmean 15 Resliced Image1 23 Atlas Image 9 softmean 15 Resliced Image1 24 Atlas Header 9 softmean 16 Resliced Header1 23 Atlas Image 9 softmean 16 Resliced Header1 24 Atlas Header 9 softmean 17 Resliced Image2 23 Atlas Image 9 softmean 17 Resliced Image2 24 Atlas Header 9 softmean 18 Resliced Header2 23 Atlas Image 9 softmean 18 Resliced Header2 24 Atlas Header 9 softmean 19 Resliced Image3 23 Atlas Image 9 softmean 19 Resliced Image3 24 Atlas Header 9 softmean 20 Resliced Header3 23 Atlas Image 9 softmean 20 Resliced Header3 24 Atlas Header 9 softmean 21 Resliced Image4 23 Atlas Image 9 softmean 21 Resliced Image4 24 Atlas Header 9 softmean 22 Resliced Header4 23 Atlas Image 9 softmean 22 Resliced Header4 24 Atlas Header
SELECT DISTINCT step, stepClass, input, inputName, output, outputName FROM process WHERE step IN ( SELECT step FROM StageInstance WHERE ( stage in (3,4,5) ) ) START WITH output = ( SELECT dataId FROM data WHERE name = 'Atlas X Graphic' ) CONNECT BY PRIOR input = output
STEP STEPCLASS INPUT INPUTNAME OUTPUT OUTPUTNAME ---------------------------------------------------------------- 9 softmean 19 Resliced Image3 23 Atlas Image 9 softmean 21 Resliced Image4 23 Atlas Image 9 softmean 17 Resliced Image2 24 Atlas Header 9 softmean 20 Resliced Header3 24 Atlas Header 9 softmean 15 Resliced Image1 24 Atlas Header 9 softmean 19 Resliced Image3 24 Atlas Header 13 convert 25 Atlas X Slice 28 Atlas X Graphic 9 softmean 15 Resliced Image1 23 Atlas Image 9 softmean 18 Resliced Header2 23 Atlas Image 9 softmean 16 Resliced Header1 23 Atlas Image 9 softmean 18 Resliced Header2 24 Atlas Header 9 softmean 17 Resliced Image2 23 Atlas Image 10 slicer 23 Atlas Image 25 Atlas X Slice 9 softmean 22 Resliced Header4 23 Atlas Image 9 softmean 16 Resliced Header1 24 Atlas Header 9 softmean 22 Resliced Header4 24 Atlas Header 9 softmean 20 Resliced Header3 23 Atlas Image 10 slicer 24 Atlas Header 25 Atlas X Slice 9 softmean 21 Resliced Image4 24 Atlas Header
SELECT step, stepClass, input, inputName, output, outputName, time FROM process WHERE stepClass = 'align_warp' AND EXISTS ( SELECT value FROM stepParams WHERE stepParams.step = process.step AND attribute='order' AND value='12' ) AND EXISTS ( SELECT value FROM stepParams WHERE stepParams.step = process.step AND attribute='model' AND value='1365' ) AND rtrim(to_char(time, 'DAY')) = 'MONDAY'
STEP STEPCLASS INPUT INPUTNAME OUTPUT OUTPUTNAME TIME ---------------------------------------------------------------- 1 align_warp 1 Anatomy Image1 11 Warp Parameters1 8/7/2006 1 align_warp 2 Anatomy Header1 11 Warp Parameters1 8/7/2006 1 align_warp 10 Reference Header 11 Warp Parameters1 8/7/2006 1 align_warp 9 Reference Image 11 Warp Parameters1 8/7/2006
SELECT name FROM data WHERE dataId IN ( SELECT output FROM process START WITH input IN ( SELECT data.dataId FROM data, dataAttributes WHERE data.dataId = dataAttributes.dataId AND data.type = 'Anatomy Header' AND dataAttributes.attribute = 'global maximum' AND dataAttributes.value = '4095' ) CONNECT BY PRIOR output=input ) AND type = 'Atlas Graphic'
NAME --------------------- Atlas X Graphic Atlas Y Graphic Atlas Z Graphic
SELECT distinct pred.orig, data.name FROM ( SELECT distinct CONNECT_BY_ROOT step orig, step FROM process START WITH stepClass='softmean' CONNECT BY PRIOR input=output ) pred, output, data WHERE EXISTS ( SELECT * FROM stepParams WHERE attribute = 'order' AND value = 12 AND stepParams.step = pred.step ) AND EXISTS ( SELECT * FROM stepParams WHERE attribute = 'model' AND value = 1365 AND stepParams.step = pred.step ) AND pred.orig = output.step AND output.dataId = data.dataIdNote that we understand "-m12" as a search for parameter whose order is equal to 12. We could also have chosen to search for a parameter whose description was like "-m 12".
ORIG NAME --------------------------------- 9 Atlas Header 9 Atlas Image
SELECT distinct data.name FROM process, dataAttributes, data WHERE process.stepClass='align_warp' AND process.input = dataAttributes.dataId AND dataAttributes.attribute = 'center' AND dataAttributes.value = 'UChicago' AND process.output = data.dataId
NAME ------------------ Warp Parameters2 Warp Parameters1
SELECT distinct data.dataId, data.name, attribute, value FROM data, dataAttributes, attribute, value WHERE data.type='Atlas Graphic' AND data.dataId = dataAttributes.dataId AND dataAttributes.attribute = 'studyModality' AND dataAttributes.value IN ('speech', 'visual', 'audio')
DATAID NAME ATTRIBUTE VALUE --------------------------------------------------------------------- 29 Atlas Y Graphic studyModality audio 29 Atlas Y Graphic studyModality visual 30 Atlas Z Graphic studyModality speech
The model we propose takes into account the capabilities of many of the current scientific workflow systems; one such capability is the idea of composite step-classes, that is, step-classes which are composed of step-classes. There are several reasons why composite step classes are useful in workflows. First, they help to manage the complexity of a workflow; users may wish to focus on a certain level of abstraction and ignore the lower levels of detail. Second, composite step-classes can also represent levels of ``authorization''; users without the appropriate clearance level would not be allowed to see the lower level executions of a step-class.
Typically, an input to a composite step-class is also an input to one or more of its substep classes while an output of a substep class is either an input to another substep class or becomes the output of a composite step class. Also, composite step-classes define a partition of the step-classes of the workflow (there is no step-class which belongs to multiple composite step-classes).
We have therefore defined the following notion of user views:
Given a workflow specification, the user view of a user (or class of users) U, userView(U), is the set of lowest-level step classes that U is entitled to see.
Note that a user view cannot contain two step classes such that one is contained in the other. We also assume that the user view is valid, i.e. that each of the highest-level step classes in the workflow specification is either in the view, or that at some lower-level all of its contained substeps are in the user view.
To answer questions of provenance, we must take the user view into account and reason about the input and output to steps which are instances of step-classes that are in the user view. We must know the containment relationship between step-classes and the relationship between each user and the step-classes he is entitled to see.
In the following, we introduce the part of our model which deals with user views and we will give examples based on a variation of the provenance workflow in the challenge. In the following discussion, we refer to composite steps as �boxes�.
Figure 2 � Workflow for the challenge with user views (boxes 1 to 3)
UserView Application
In this section, we introduce some additional tables which are needed for reasoning about provenance in the context of user views:
As an example, box 1 covers all the step-class align_warp and reslice, and box3 contains box1.
COMPOSTEPCLASS STEPCLASS --------------------------------- box1 align_warp box1 reslice box3 box1
CREATE VIEW AS SELECT DISTINCT CONNECT_BY_ROOT compoStepClass as compoStepClass, stepClass FROM immcontains CONNECT BY PRIOR stepClass=compoStepClass ORDER BY compoStepClass
COMPOSTEPCLASS STEPCLASS --------------------------------- box1 align_warp box1 reslice box3 box1 box3 align_warp box3 reslice
CREATE VIEW AS SELECT DISTINCT CONNECT_BY_ROOT compoStep compoStep, step FROM immStepContains CONNECT BY PRIOR step=compoStep ORDER BY compoStep
COMPOSTEP STEP --------------------------------- stepBox1-1 1 stepBox1-1 5 stepBox1-2 2 stepBox1-2 6 stepBox1-3 3 stepBox1-3 7 stepBox1-4 4 stepBox1-4 8 stepBox2-1 10 stepBox2-1 13 stepBox2-2 11 stepBox2-2 14 stepBox2-3 12 stepBox2-3 15 stepBox3-1 1 stepBox3-1 10 stepBox3-1 13 stepBox3-1 5 stepBox3-1 9 stepBox3-1 stepBox1-1 stepBox3-1 stepBox2-1
As an example, the user �uBlackBox� sees the whole workflow as one step. The user �uBio� cannot see the detail of step-class align_warp and reslice separately (he sees box1 instead). He cannot see the details of step-class slicer or convert (he sees box2 instead). However, he can see the step-class softmean. UAdmin can see all step-classes.
USR STEPCLASS -------------------------------- uBlackBox box3 uBio box1 uBio softmean uBio box2 uAdmin align_warp uAdmin reslice uAdmin softmean uAdmin slicer uAdmin convert
The following two tables, Cinput and Coutput are input and output extended to composite steps.
CREATE VIEW cInput AS -- First union operand: "For each containment, the input is an input of -- the higher level step class if and only if this input was not an output -- from another of its sub steps SELECT containsStep.compoStep step, input.dataId, min(input.ts) ts FROM input, containsStep WHERE input.step = containsStep.step AND NOT EXISTS ( SELECT o.step FROM output o, containsStep c2 WHERE o.dataId = input.dataId AND o.step = c2.step AND c2.compoStep = containsStep.compoStep ) GROUP BY containsStep.compoStep, input.dataId -- Second union operand -- We want to keep the direct input relations UNION SELECT step, dataId, ts FROM input;As an example, the step stepBox2-1 takes DATAID 23 (Atlas Image) and DATAID 24 (Atlas Header) as inputs.
STEP DATAID TS ---------------------------- stepBox2-1 23 8/17/2006 stepBox2-1 24 8/17/2006
CREATE VIEW coutput AS -- First union operand: "For each containment, the output is an output of -- the higher level step class if and only if this output is used as an -- input for another step which is not contained in the same step -- or if this is a final output of the workflow SELECT containsStep.compoStep step, output.dataId, output.ts FROM output, containsStep WHERE output.step = containsStep.step AND EXISTS ( /* Step class is used as an entry of at least one step * which is not a substep of this step class */ SELECT i.step FROM input i WHERE i.dataId = output.dataId AND NOT EXISTS ( SELECT cont2.compoStep FROM containsStep cont2 WHERE cont2.step = i.step AND cont2.compoStep = containsStep.compoStep ) UNION -- Step class is a final output? ( SELECT output.dataId FROM dual MINUS SELECT i.dataId FROM input i ) ) UNION -- Second union operand -- We want to keep the direct output relations SELECT step, dataId, ts FROM outputAs an example, step stepBox2-1 produced data ID 28 (Atlas Graphic X) on 08/20/2006.
STEP DATAID TS ---------------------------- stepBox2-1 28 8/20/2006
CREATE VIEW UProcess AS SELECT DISTINCT userView.usr usr, instanceOf.step step, instanceOf.stepClass stepClass, input.dataId input, dataInput.name inputName, output.dataId output, dataOutput.name outputName, input.ts time FROM userView, instanceOf, cinput input, data dataInput, coutput output, data dataOutput WHERE userView.stepClass = instanceOf.stepClass AND instanceOf.step = input.step AND input.dataId = dataInput.dataId AND input.step = output.step AND output.dataId = dataOutput.dataId AND input.ts <= output.tsAs an example, let us consider three user views and their perspectives on stage 1 of the workflow (align_warp).
The user �uAdmin� can see all the steps of the workflow; and thus can see all data.
The user �uBio� sees align_warp and reslice calls as single boxes (step class box1), thus he cannot see data whose ids are 11 to 14 (Warp Params 1 to 4). He also sees slicer and convert as single boxes (step class box2), thus he cannot see data with ids 25 to 27 (Atlas X Slice, Atlas Y Slice and Atlas Z Slice).
The user �uBlackBox� cannot see any detail of the workflow. He sees everything as a big box which entries are data id 1 to 10 and outputs are data id 28 to 30.
The implications of user views when it comes to query answering, are discussed in the following section.
In this section, we answer query 1 considering user views and we introduce another query to emphasize the benefit of our approach.
SELECT DISTINCT step, stepClass, input, inputName, output, outputName FROM uProcess WHERE usr='uBlackBox' START WITH output = ( SELECT dataId FROM data WHERE name = 'Atlas X Graphic' ) CONNECT BY PRIOR input = output ORDER BY step;
STEP STEPCLASS INPUT INPUTNAME OUTPUT OUTPUTNAME stepBox3-1 box3 1 Anatomy Image1 23 Atlas Image stepBox3-1 box3 1 Anatomy Image1 24 Atlas Header stepBox3-1 box3 1 Anatomy Image1 28 Atlas X Graphic stepBox3-1 box3 10 Reference Header 23 Atlas Image stepBox3-1 box3 10 Reference Header 24 Atlas Header stepBox3-1 box3 10 Reference Header 28 Atlas X Graphic stepBox3-1 box3 17 Resliced Image2 23 Atlas Image stepBox3-1 box3 17 Resliced Image2 24 Atlas Header stepBox3-1 box3 17 Resliced Image2 28 Atlas X Graphic stepBox3-1 box3 18 Resliced Header2 23 Atlas Image stepBox3-1 box3 18 Resliced Header2 24 Atlas Header stepBox3-1 box3 18 Resliced Header2 28 Atlas X Graphic stepBox3-1 box3 19 Resliced Image3 23 Atlas Image stepBox3-1 box3 19 Resliced Image3 24 Atlas Header stepBox3-1 box3 19 Resliced Image3 28 Atlas X Graphic stepBox3-1 box3 2 Anatomy Header1 23 Atlas Image stepBox3-1 box3 2 Anatomy Header1 24 Atlas Header stepBox3-1 box3 2 Anatomy Header1 28 Atlas X Graphic stepBox3-1 box3 20 Resliced Header3 23 Atlas Image stepBox3-1 box3 20 Resliced Header3 24 Atlas Header stepBox3-1 box3 20 Resliced Header3 28 Atlas X Graphic stepBox3-1 box3 21 Resliced Image4 23 Atlas Image stepBox3-1 box3 21 Resliced Image4 24 Atlas Header stepBox3-1 box3 21 Resliced Image4 28 Atlas X Graphic stepBox3-1 box3 22 Resliced Header4 23 Atlas Image stepBox3-1 box3 22 Resliced Header4 24 Atlas Header stepBox3-1 box3 22 Resliced Header4 28 Atlas X Graphic stepBox3-1 box3 9 Reference Image 23 Atlas Image stepBox3-1 box3 9 Reference Image 24 Atlas Header stepBox3-1 box3 9 Reference Image 28 Atlas X Graphic
STEP STEPCLASS INPUT INPUTNAME OUTPUT OUTPUTNAME 9 softmean 15 Resliced Image1 23 Atlas Image 9 softmean 15 Resliced Image1 24 Atlas Header 9 softmean 16 Resliced Header1 23 Atlas Image 9 softmean 16 Resliced Header1 24 Atlas Header 9 softmean 17 Resliced Image2 23 Atlas Image 9 softmean 17 Resliced Image2 24 Atlas Header 9 softmean 18 Resliced Header2 23 Atlas Image 9 softmean 18 Resliced Header2 24 Atlas Header 9 softmean 19 Resliced Image3 23 Atlas Image 9 softmean 19 Resliced Image3 24 Atlas Header 9 softmean 20 Resliced Header3 23 Atlas Image 9 softmean 20 Resliced Header3 24 Atlas Header 9 softmean 21 Resliced Image4 23 Atlas Image 9 softmean 21 Resliced Image4 24 Atlas Header 9 softmean 22 Resliced Header4 23 Atlas Image 9 softmean 22 Resliced Header4 24 Atlas Header stepBox1-1 box1 1 Anatomy Image1 15 Resliced Image1 stepBox1-1 box1 1 Anatomy Image1 16 Resliced Header1 stepBox1-1 box1 10 Reference Header 15 Resliced Image1 stepBox1-1 box1 10 Reference Header 16 Resliced Header1 stepBox1-1 box1 2 Anatomy Header1 15 Resliced Image1 stepBox1-1 box1 2 Anatomy Header1 16 Resliced Header1 stepBox1-1 box1 9 Reference Image 15 Resliced Image1 stepBox1-1 box1 9 Reference Image 16 Resliced Header1 stepBox1-2 box1 10 Reference Header 17 Resliced Image2 stepBox1-2 box1 10 Reference Header 18 Resliced Header2 stepBox1-2 box1 3 Anatomy Image2 17 Resliced Image2 stepBox1-2 box1 3 Anatomy Image2 18 Resliced Header2 stepBox1-2 box1 4 Anatomy Header2 17 Resliced Image2 stepBox1-2 box1 4 Anatomy Header2 18 Resliced Header2 stepBox1-2 box1 9 Reference Image 17 Resliced Image2 stepBox1-2 box1 9 Reference Image 18 Resliced Header2 stepBox1-3 box1 10 Reference Header 19 Resliced Image3 stepBox1-3 box1 10 Reference Header 20 Resliced Header3 stepBox1-3 box1 5 Anatomy Image3 19 Resliced Image3 stepBox1-3 box1 5 Anatomy Image3 20 Resliced Header3 stepBox1-3 box1 6 Anatomy Header3 19 Resliced Image3 stepBox1-3 box1 6 Anatomy Header3 20 Resliced Header3 stepBox1-3 box1 9 Reference Image 19 Resliced Image3 stepBox1-3 box1 9 Reference Image 20 Resliced Header3 stepBox1-4 box1 10 Reference Header 21 Resliced Image4 stepBox1-4 box1 10 Reference Header 22 Resliced Header4 stepBox1-4 box1 7 Anatomy Image4 21 Resliced Image4 stepBox1-4 box1 7 Anatomy Image4 22 Resliced Header4 stepBox1-4 box1 8 Anatomy Header4 21 Resliced Image4 stepBox1-4 box1 8 Anatomy Header4 22 Resliced Header4 stepBox1-4 box1 9 Reference Image 21 Resliced Image4 stepBox1-4 box1 9 Reference Image 22 Resliced Header4 stepBox2-1 box2 23 Atlas Image 28 Atlas X Graphic stepBox2-1 box2 24 Atlas Header 28 Atlas X Graphic
STEP STEPCLASS INPUT INPUTNAME OUTPUT OUTPUTNAME -------------------------------------------------------------------------- 1 align_warp 1 Anatomy Image1 11 Warp Parameters1 1 align_warp 10 Reference Header 11 Warp Parameters1 1 align_warp 2 Anatomy Header1 11 Warp Parameters1 1 align_warp 9 Reference Image 11 Warp Parameters1 10 slicer 23 Atlas Image 25 Atlas X Slice 10 slicer 24 Atlas Header 25 Atlas X Slice 13 convert 25 Atlas X Slice 28 Atlas X Graphic 2 align_warp 10 Reference Header 12 Warp Parameters2 2 align_warp 3 Anatomy Image2 12 Warp Parameters2 2 align_warp 4 Anatomy Header2 12 Warp Parameters2 2 align_warp 9 Reference Image 12 Warp Parameters2 3 align_warp 10 Reference Header 13 Warp Parameters3 3 align_warp 5 Anatomy Image3 13 Warp Parameters3 3 align_warp 6 Anatomy Header3 13 Warp Parameters3 3 align_warp 9 Reference Image 13 Warp Parameters3 4 align_warp 10 Reference Header 14 Warp Parameters4 4 align_warp 7 Anatomy Image4 14 Warp Parameters4 4 align_warp 8 Anatomy Header4 14 Warp Parameters4 4 align_warp 9 Reference Image 14 Warp Parameters4 5 reslice 11 Warp Parameters1 15 Resliced Image1 5 reslice 11 Warp Parameters1 16 Resliced Header1 6 reslice 12 Warp Parameters2 17 Resliced Image2 6 reslice 12 Warp Parameters2 18 Resliced Header2 7 reslice 13 Warp Parameters3 19 Resliced Image3 7 reslice 13 Warp Parameters3 20 Resliced Header3 8 reslice 14 Warp Parameters4 21 Resliced Image4 8 reslice 14 Warp Parameters4 22 Resliced Header4 9 softmean 15 Resliced Image1 23 Atlas Image 9 softmean 15 Resliced Image1 24 Atlas Header 9 softmean 16 Resliced Header1 23 Atlas Image 9 softmean 16 Resliced Header1 24 Atlas Header 9 softmean 17 Resliced Image2 23 Atlas Image 9 softmean 17 Resliced Image2 24 Atlas Header 9 softmean 18 Resliced Header2 23 Atlas Image 9 softmean 18 Resliced Header2 24 Atlas Header 9 softmean 19 Resliced Image3 23 Atlas Image 9 softmean 19 Resliced Image3 24 Atlas Header 9 softmean 20 Resliced Header3 23 Atlas Image 9 softmean 20 Resliced Header3 24 Atlas Header 9 softmean 21 Resliced Image4 23 Atlas Image 9 softmean 21 Resliced Image4 24 Atlas Header 9 softmean 22 Resliced Header4 23 Atlas Image 9 softmean 22 Resliced Header4 24 Atlas Header
SELECT * FROM uProcess upc WHERE usr = 'uBlackBox' START WITH outputName = 'Resliced Image1' CONNECT BY PRIOR upc.input = upc.output;
USR STEP STEPCLASS INPUT INPUTNAME OUTPUT OUTPUTNAME TIME uBio stepBox1-1 box1 1 Anatomy Image1 15 Resliced Image1 8/7/2006 uBio stepBox1-1 box1 10 Reference Header 15 Resliced Image1 8/7/2006 uBio stepBox1-1 box1 2 Anatomy Header1 15 Resliced Image1 8/7/2006 uBio stepBox1-1 box1 9 Reference Image 15 Resliced Image1 8/7/2006
USR STEP STEPCLASS INPUT INPUTNAME OUTPUT OUTPUTNAME TIME uAdmin 5 reslice 11 Warp Parameters1 15 Resliced Image1 8/12/2006 uAdmin 1 align_warp 1 Anatomy Image1 11 Warp Parameters1 8/7/2006 uAdmin 1 align_warp 10 Reference Header 11 Warp Parameters1 8/7/2006 uAdmin 1 align_warp 2 Anatomy Header1 11 Warp Parameters1 8/7/2006 uAdmin 1 align_warp 9 Reference Image 11 Warp Parameters1 8/7/2006
Note that uAdmin can see that data id 11 (Warp Params 1) is part of the provenance while uBio cannot see it.
Generally speaking, our provenance queries can be categorized as such:
According to our classification, all the queries proposed in this challenge are deep provenance queries, they are both step and data provenance queries.
Our implementation is not presently available to the public at large. However, we will be happy to share the code with those who are interested in seeing it. Please send us an email.
We have shown how our model is able to represent the challenge workflow and to answer the proposed queries. We have also shown how the notion of a user view can allow the user to manage the complexity of a workflow through higher levels of abstraction and layering.
We would be interested in discussing with the workshop attendees about three points: (i) the meaning of a stage in a workflow (and in particular, how to interpret this notion with respect to user views), (ii) the biological significance of the procedures that can compose a workflow (in the workflow provided, what are the �significant� steps for a scientist?) and (iii) the (general) problem of computing a difference between two workflows (cf. query 7).
(* Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.)
-- SarahCohenBoulakia - 11 Sep 2006