Virtual Data Language Preprocessing

To aid the definition and configuration of our derivation workflow, we wrote a pre-processor program, vdlgen.sh. We had a pre-processed version of the VDL (analysisbase.vdl), the transformation catalog (tc.data) and the pool.config file (pool.config.kickstart or pool.config.nokickstart). These were transformed by the pre-processor into versions ready to be used by the abstract DAG generator. An example post-processed VDL script is shown in analysis.vdl.

New constructs

We added 4 simple constructs to VDL to aid our derivation workflow definition, that the pre-processor extracted and used to alter the VDL accordingly.

$${var}

expands to the value of variable var, supplied when pre-processing in a particular instance

MAP lfn pfn

causes the RLS to be checked for the mapping (logical filename) lfn to (physical filename) pfn and adds if not already mapped

FOR var FROM start TO end STEP step
block
END

counts from start to end incrementing by step, and outputs block each time
if the token %var% appears in block, it is expanded to the value of the counter
start, end and step may be calculations
these loops can be nested

LIST block start end step

a single-line convenience form of FOR, where var is always 'item'

Wrapping

Our pre-processor also takes configuration options that determine whether and how to 'wrap' the transformations. We have a wrapper program, like kickstart, that records provenance in our provenance store before, and potentially after, a script is executed. It is called recordProvenance.sh. We could use it by replacing kickstart in the pool.config, but we generally wanted kickstart as well.

If told to, the pre-processor calls a wrapping script, passing some configuration parameters. The wrapping script changes the VDL transformations to take extra inputs, including the path of the 'unwrapped' program to be executed at that step. It also changes the VDL derivations to pass that extra information. Finally, the wrapping script generates a new transformation catalog in which each transformation physical location was replaced with recordProvenance.

An extra field, delegates-provenance, in the pre-processed transformation catalog could mark an entry as not to be wrapped, but still be given the information required to record provenance. Transformations for which this is useful are workflow scripts that run several smaller activities locally, each of which should record its own provenance (inputs, outputs etc.), rather than just the workflow recording its inputs and outputs. This allows us to independently control the granularity of the distributed workflow, where each task should last about 15 minutes, and the granularity of the workflow of provenance-recording activities, which may be much finer.

-- SimonMiles - 23 Feb 2005
to top

End of topic
Skip to action links | Back to top

Attachment	Action	Size	Date	Who	Comment
klausCompressWorkflow.pdf	manage	7.9 K	25 Feb 2005 - 14:08	PaulGroth	Compression Workflow
klausMeasureWorkflow.pdf	manage	5.0 K	25 Feb 2005 - 14:09	PaulGroth	Measurement Workflow
recordProvenance.sh	manage	1.1 K	02 Mar 2005 - 15:52	SimonMiles	PReP^? provenance recording wrapper script
analysisbase.vdl	manage	2.8 K	02 Mar 2005 - 15:56	SimonMiles	Pre-processed VDL script
tc.data	manage	1.8 K	02 Mar 2005 - 15:59	SimonMiles	Pre-processed transformation catalogue
vdlgen.sh	manage	3.7 K	02 Mar 2005 - 16:02	SimonMiles	VDL pre-processor script
pool.config.kickstart	manage	1.0 K	02 Mar 2005 - 16:03	SimonMiles	Pool config file to use with kickstart turned on
pool.config.nokickstart	manage	1.1 K	02 Mar 2005 - 16:03	SimonMiles	Pool config file to use with kickstart turned off
analysis.vdl	manage	5.2 K	06 Mar 2005 - 19:58	SimonMiles	Post-processed VDL

Edit | Attach image or document | Printable version | Raw text | More topic actions
Revisions: | r1.7 | > | r1.6 | > | r1.5 | Total page history | Backlinks

Soca.VirtualDataLanguagePreprocessing moved from Ourpasoa.VirtualDataToolkitExperiment on 23 May 2005 - 12:37 by SimonMiles - put it back

You are here: Soca > DocumentStore > VirtualDataLanguagePreprocessing

to top