Submission in progress
Our workflow is represented in PtolemyII's MoML language, created in Kepler.
Figure 1. The Kepler workflow for the challenge
The implementation is a mocking workflow, where each actor is implemented in Java, taking the input files, determining the index (from 1 to 4) and generating their output file names. There is no real execution behind the workflow.
There is another mocking implementation, where each actor is actually a nested sub-workflow, building each actor from Kepler's basic actors. That workflow looks the same on the top-level. Although, the provenance record is much larger, the answer for the question is the same.
According to the RWS model, r(ead), w(rite) and s(tate-reset) event are recorded. Besides that we need to record the actors, their ports, the "tokens" flowing in the workflow among the actors, the created objects and their values.
The RWS prototype inference engine is implemented in Prolog ;-), and the provenance data is currently printed out simply as Prolog fact set, but it will be put in relational database in the future.
portTable('.pc.align_warp2.GetIndexFromName.StringIndexOf.output', '.pc.align_warp2.GetIndexFromName.StringIndexOf', a). portTable('.pc.convert_x.atlas_gfx', '.pc.convert_x', c).
tokenTable Token - Object relationship. The object carries the data and it has a unique ID.
tokenTable('.pc.convert_x.atlas_gfx.0.0', o596169037_35902573).
objectTable Object - Value relationship. Currently no type is recorded.
objectTable(o596169037_35902573, '"/usr/home/pnorbert/Provenance/ProvCh/data/output/atlas-x.gif"', notype).
traceTable The RWS event trace, when a port is reading/writing a token, or an actor has a state-reset
traceTable('.pc.convert_x', s, 'nil', 1). - state reset of actor traceTable('.pc.convert_x.input', r, '.pc.slicer_x.atlas_pgm.0.0', 1). - read a token on port 'input' traceTable('.pc.convert_x.atlas_gfx', w, '.pc.convert_x.atlas_gfx.0.0', 1). - write token on port 'atlas_gfx'
Table | lines |
---|---|
portTable | 81 |
tokenTable | 30 |
objectTable | 30 |
traceTable | 86 |
trace-pc.txt: The provenance trace of the original workflow
Teams | Queries | ||||||||||
Q1 | Q2 | Q3 | Q4 | Q5 | Q6 | Q7 | Q8 | Q9 | |||
RWS team | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() |
Please, note that this work focuses on single runs only. Data provenance of multiple-runs and workflow provenance are addressed by others in the Kepler community and eventually all work are expected to converge toward a unified provenance framework. Question 2 and 3 are answerable, just we slipped out of time to construct the appropriate Prolog queries for them.
The inference engine prototype is implemented in Prolog. The basic information we need is the token lineage of a given token, i.e. all tokens in the workflow on that the given token depended. Then values, objects and actors can be looked up from the provenance tables.
The predicate tokenLineageOfValue( Value, List )
provides the list of
all tokens on that the given value (more accurately the token that
contained this value first) depended. This predicate basically
generates the transitive closure of the direct dependency among
tokens, back to the very first tokens, i.e. to the inputs.
valueLineageOfValue/2
and actorLineageOfValue/2
use the above
predicate and then look for the token's value it contains or the actor
that generated it, resp.
Since the dependency graph may contain several paths back to a certain
token and also several tokens can be created by the same actor, we may
get an actor or value in the list several times. Therefore, we use the
list_to_set/2
built-in predicate to make each resulted element
unique.
Output: a set of actors that contributed and data values (file names) that led to this file
Answer 1.a.
List of actors that contributed to the result: (21 actors).
They appear in reversed order as they were executed.
?- q1_actors('"/usr/home/pnorbert/Provenance/ProvCh/data/output/atlas-x.gif"', ActorList), print(ActorList). [ .pc.Convert_x, .pc.Slicer_x, .pc.SoftMean, .pc.Reslice3, .pc.Reslice2, .pc.Reslice4, .pc.Reslice1, .pc.AlignWarp3, .pc.RefImg, .pc.RefHdr, .pc.InputHdr3, .pc.InputImg3, .pc.AlignWarp2, .pc.InputHdr2, .pc.InputImg2, .pc.AlignWarp4, .pc.InputHdr4, .pc.InputImg4, .pc.AlignWarp1, .pc.InputImg1, .pc.InputHdr1 ]
Note: new lines entered manually in the doc for easier read.
Answer b.
List of input and intermediate values created by the workflow (26 values).
?- q1_values('"/usr/home/pnorbert/Provenance/ProvCh/data/output/atlas-x.gif"', ValueList), print(ValueList). [ "/usr/home/pnorbert/Provenance/ProvCh/data/output/atlas-x.gif", "/usr/home/pnorbert/Provenance/ProvCh/data/out-stage4/atlas-x.pgm", "/usr/home/pnorbert/Provenance/ProvCh/data/out-stage3/atlas.img", "/usr/home/pnorbert/Provenance/ProvCh/data/out-stage3/atlas.hdr", "/usr/home/pnorbert/Provenance/ProvCh/data/out-stage2/resliced3.hdr", "/usr/home/pnorbert/Provenance/ProvCh/data/out-stage2/resliced2.img", "/usr/home/pnorbert/Provenance/ProvCh/data/out-stage2/resliced4.hdr", "/usr/home/pnorbert/Provenance/ProvCh/data/out-stage2/resliced1.img", "/usr/home/pnorbert/Provenance/ProvCh/data/out-stage2/resliced2.hdr", "/usr/home/pnorbert/Provenance/ProvCh/data/out-stage2/resliced3.img", "/usr/home/pnorbert/Provenance/ProvCh/data/out-stage2/resliced4.img", "/usr/home/pnorbert/Provenance/ProvCh/data/out-stage2/resliced1.hdr", "/usr/home/pnorbert/Provenance/ProvCh/data/out-stage1/warp3.warp", "/usr/home/pnorbert/Provenance/ProvCh/data/input/reference.img", "/usr/home/pnorbert/Provenance/ProvCh/data/input/reference.hdr", "/usr/home/pnorbert/Provenance/ProvCh/data/input/anatomy3.hdr", "/usr/home/pnorbert/Provenance/ProvCh/data/input/anatomy3.img", "/usr/home/pnorbert/Provenance/ProvCh/data/out-stage1/warp2.warp", "/usr/home/pnorbert/Provenance/ProvCh/data/input/anatomy2.hdr", "/usr/home/pnorbert/Provenance/ProvCh/data/input/anatomy2.img", "/usr/home/pnorbert/Provenance/ProvCh/data/out-stage1/warp4.warp", "/usr/home/pnorbert/Provenance/ProvCh/data/input/anatomy4.hdr", "/usr/home/pnorbert/Provenance/ProvCh/data/input/anatomy4.img", "/usr/home/pnorbert/Provenance/ProvCh/data/out-stage1/warp1.warp", "/usr/home/pnorbert/Provenance/ProvCh/data/input/anatomy1.img", "/usr/home/pnorbert/Provenance/ProvCh/data/input/anatomy1.hdr" ]
Like the myGrid/Taverna team, we also created a more generic version of the challenge workflow, which works for any number of input images, provided that the Softmean can take any number of input images at once.
Figure 2. The generalized Kepler workflow for the challenge
The difference between this and the Taverna workflow is that we create the input file names within the workflow one-by-one. A list of input files would be given as one token to the first AlignWrap actor, which supposedly wants to get them one-by-one. The 4 outputs of the Reslice are collected into an array and Softmean is executed only once. The output of Softmean is repeated 3 times with different slice parameters (generated by the seqXYZ actor), thus executing the final two operations three times.
The answers for the first query are basically the same but with some changes. The actor list is shorter, reporting e.g. Reslice once instead of Reslice1,...,Reslice4, however, the additional array and repeat operations appear in the list. The value list becomes larger because of the additional array and repetition tokens.
-- NorbertPodhorszki - 07 Sep 2006
to top
I | Attachment ![]() | Action | Size | Date | Who | Comment |
---|---|---|---|---|---|---|
![]() | pc.png | manage | 70.0 K | 07 Sep 2006 - 16:41 | NorbertPodhorszki | Kepler workflow, flat version |
![]() | pca.png | manage | 47.5 K | 07 Sep 2006 - 17:12 | NorbertPodhorszki | Generalized Kepler workflow |
![]() | trace-pc.txt | manage | 15.4 K | 07 Sep 2006 - 17:42 | NorbertPodhorszki | The provenance trace of the original workflow |