Work in progress
See also references and glossary at the bottom of this page.
Job Provenance was developed as a part of the gLite middleware. Despite its design is more general, capable of handling virtualy any Grid jobs, the current implementation supports only gLite jobs, and we use gLite to implement the Provenance Challeng workflow. Therefore we provide a brief overview of relevant parts of job processing in gLite before the actual description of the workflow implemetation.
Upon creation the job is assigned a unique immutable Job Identifier (JobId?). The JobId? is used to refer to the job all the time during the job life and afterwards.
The user describes the job (i.e. executable, parameters, input files etc.) using the Job Description Languate (JDL), using the extensible Classified Advertisement (classad) syntax. The description may grow fairly complex, including requirements on the execution environment, proximity of input and output storage etc.
Processing of the job can be summarised as follows:
Besides simple jobs gLite supports also complex ones, job workflows in the form of Directed Acyclic Graphs (DAG). A DAG is completely described, using a nested JDL syntax, as a set of its nodes (simple jobs) and execution dependencies among them. DAG processing is implemented by interfacing the WM planning machinery with the Condor DAGMan.
TODO: references JDL, WMS, LB
align_warp
invocations can run in parallel
but softmean
must be preceeded by successfull completition of all four reslice
instances.
In our experimental runs
we put the files on a dedicated GridFTP? server and access (both down- and upload) with
the gsiftp://
protocol (solving also access control -- a running gLite job possesses delegated user credentials).
Consequently, the data items are identified with their full URL in our implementation.
We might have used the gLite data services, identifying files with GUID's or logical file names. However, this approach would make the implementation more obscure while not exhibiting any important provenance features.
We provide a template for the workflow JDL. It contains placeholders for the data files, details on instantiating and submitting it with gLite command-line tools can be found at this page.
Upload a representation of the information you captured when executing the workflow. Explain the structure (provide pointers to documents describing your schemas etc.)
As noted above, when the execution of workflow is finished, the JP service can collect traces of the workflow's life from various Grid subsystems. Currently only LB is instrumented to provide the trace, however, the encompassed data are rich and completely sufficient for the challenge.
The LB trace is uploaded as a raw LB dumpfile, three sample snapshots are available
here (files dump[123]
).
JP provides the user with an interface to retrieve such raw files, and their format is public in principle
(NetLogger? ULM according to draft-abela-05,
LB specific fields are documented in LB User's Guide).
However, access to the raw files is not supposed to be a typical JP usage.
On the contrary, the end user of JP sees all that available data transformed into the form of logical JP attributes, "namespace:name = value" pairs. Attribute values are digested from the raw traces JP plug-in modules, hiding internal structure, syntax, format version, and other implementation details.
At this level the provenance trace of executed workflow is represented by a set of JP attributes and its values assigned to both the workflow and all its subjobs (nodes).
There are the following classes of attributes:
http://egee.cesnet.cz/en/Schema/JP/System
):
jobId
owner
: identity of the job submitter
regtime
: when the job was registered with the middleware
http://egee.cesnet.cz/en/Schema/JP/Workflow
):
http://egee.cesnet.cz/en/WSDL/jp-lbtag
softmean
must have been preceeded by 4 =reslice='s in the challenge workflow,
there are 4 occurences of ancestor
attribute of the softmean
nodes.
For the specific implementation of the challenge workflow we use LB user tags to store additional information about the workflow nodes. JP turns these values into attributes of the 4th kind on the list above. The following table summarizes their meaning:
Attribute name | Attribute meaning |
---|---|
IPAW_OUTPUT | Names of files generated by this node |
IPAW_INPUT | Names of input files for this node |
IPAW_STAGE | Name (number) of workflow stage of this node |
IPAW_PROGRAM | Name of process this node represent |
IPAW_PARAM | Specific parameters of this node processing |
IPAW_HEADER | Anatomy header property (global maximum in our case) |
Details on JP architecture, its components, dataflows among them, and reasons that motivated the design are given in the cited references. For understanding our implementation of the challenge queries one has to be only aware that there are two distinct querying endpoints:
Both the querying endpoinds are exposed as web-service interface.
The challenge queries are implemented as Perl scripts which call elementary clients of both the services.
Teams | Queries | ||||||||||
Q1 | Q2 | Q3 | Q4 | Q5 | Q6 | Q7 | Q8 | Q9 | |||
CESNET team | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() |
Find the process that led to Atlas X Graphic / everything that caused Atlas X Graphic to be as it is. This should tell us the new brain images from which the averaged atlas was generated, the warping performed etc.
The query is implemented as a graph search where the vertices are nodes of the DAG
and oriented edges are given by the ANCESTOR attribute.
The search is seeded with a JPIS query, retrieving JobId? of the last node
of the workflow which produced the queried file directly, i.e. typically the convert
utility.
Pseudocode:
job_list
with the retrieved JobId?
job_list
job
job
job_list
unless it is already there
job_list
job_list
according to IPAW_STAGE
job_list
, including all the retrieved attributes
The output bellow is cut and reformated, here is the original output.
$ ./query1.pl gsiftp://umbar.ics.muni.cz:1414/home/mulac/pch06/blabla-x.gif 2>/dev/null Results ======= jobid https://skurut1.cesnet.cz:9000/hvkpZCsRsiqrxs5K_bo7Ew: attr IPAW_STAGE: 5 attr IPAW_PROGRAM: convert attr IPAW_INPUT: gsiftp://umbar.ics.muni.cz:1414/home/mulac/pch06/blabla-x.pgm attr IPAW_OUTPUT: gsiftp://umbar.ics.muni.cz:1414/home/mulac/pch06/blabla-x.gif attr CE: skurut17.cesnet.cz:2119/jobmanager-lcgpbs-voce jobid https://skurut1.cesnet.cz:9000/02ZaAADKyebzggYPp4M9tA: attr IPAW_STAGE: 4 attr IPAW_PROGRAM: slicer attr IPAW_INPUT: gsiftp://umbar.ics.muni.cz:1414/home/mulac/pch06/blabla.hdr gsiftp://umbar.ics.muni.cz:1414/home/mulac/pch06/blabla.img attr IPAW_OUTPUT: gsiftp://umbar.ics.muni.cz:1414/home/mulac/pch06/blabla-x.pgm attr CE: skurut17.cesnet.cz:2119/jobmanager-lcgpbs-voce jobid https://skurut1.cesnet.cz:9000/wGMnTvCILtiSTi7ZOQwfTQ: attr IPAW_STAGE: 3 attr IPAW_PROGRAM: softmean attr IPAW_INPUT: gsiftp://umbar.ics.muni.cz:1414/home/mulac/pch06/anatomy1-resliced.img ... attr IPAW_OUTPUT: gsiftp://umbar.ics.muni.cz:1414/home/mulac/pch06/blabla.img gsiftp://umbar.ics.muni.cz:1414/home/mulac/pch06/blabla.hdr attr CE: skurut17.cesnet.cz:2119/jobmanager-lcgpbs-voce jobid https://skurut1.cesnet.cz:9000/9d0XMwfPuefR9woAFkDplQ: attr IPAW_STAGE: 2 attr IPAW_PROGRAM: reslice attr IPAW_INPUT: gsiftp://umbar.ics.muni.cz:1414/home/mulac/pch06/anatomy3.warp ... attr IPAW_OUTPUT: gsiftp://umbar.ics.muni.cz:1414/home/mulac/pch06/anatomy3-resliced.img ... attr CE: skurut17.cesnet.cz:2119/jobmanager-lcgpbs-voce jobid https://skurut1.cesnet.cz:9000/RglBtUz0IzwSeM32KLnHPg: attr IPAW_STAGE: 2 attr IPAW_PROGRAM: reslice attr IPAW_INPUT: gsiftp://umbar.ics.muni.cz:1414/home/mulac/pch06/anatomy4.warp ... ... jobid https://skurut1.cesnet.cz:9000/wdWQHL0-RXkd3VeNcSrTaw: attr IPAW_STAGE: 2 attr IPAW_PROGRAM: reslice attr IPAW_PARAM: attr IPAW_INPUT: gsiftp://umbar.ics.muni.cz:1414/home/mulac/pch06/anatomy1.warp ... ... jobid https://skurut1.cesnet.cz:9000/xwIsN2JgGfsRuvYwh0QXsw: attr IPAW_STAGE: 2 attr IPAW_PROGRAM: reslice attr IPAW_INPUT: gsiftp://umbar.ics.muni.cz:1414/home/mulac/pch06/anatomy2.warp ... ... jobid https://skurut1.cesnet.cz:9000/yM3sz8v6WCIPgi5-0m8L4w: attr IPAW_STAGE: 1 attr IPAW_PROGRAM: align_warp attr IPAW_PARAM: -m 12, -q attr IPAW_INPUT: gsiftp://umbar.ics.muni.cz:1414/home/mulac/pch06/anatomy4.img gsiftp://umbar.ics.muni.cz:1414/home/mulac/pch06/reference.img attr IPAW_OUTPUT: gsiftp://umbar.ics.muni.cz:1414/home/mulac/pch06/anatomy4.warp attr CE: skurut17.cesnet.cz:2119/jobmanager-lcgpbs-voce jobid https://skurut1.cesnet.cz:9000/s47ihjBHQXqPkkNwA2iazg: attr IPAW_STAGE: 1 attr IPAW_PROGRAM: align_warp attr IPAW_PARAM: -m 12, -q attr IPAW_INPUT: gsiftp://umbar.ics.muni.cz:1414/home/mulac/pch06/anatomy2.img ... ... ...
softmean
.
Available here.
The filter "ran on Monday" is quite challenging. Currently, we implement it at client side which is not a scalable solution. However, the JP concept foresees a solution of the issue via an already defined interface to type plugin. A plugin, for a concrete type, defines the following methods:
Then, upon arrival to JPIS, weekday number would be extracted from the timestamp and stored in an extra database column.
The plugin would also define an operator isWeekDay(x)
that would be tranformed at query time to an expression refering
to the new column.
Therefore the condition would be evaluated at the SQL level, i.e. in the most effective way.
successor
attribute of workflow's nodes rather than ancestor
.
The output files of nodes having IPAW_STAGE = 5 are gathered and sorted to exclude multiple occurences.
The code can be also easily modified to record the graph traversal (details on workflow nodes) leading to a particular file, and display it with the file in a similar way as in previous queries.
successor
attribute.
The search is cut at IPAW_PROGRAM = 'softmean', and its outputs are printed.
ancestor
attribute.
In this way, JPPS queries are completely avoided and the number of JPIS queries is minimised.
We use Query #1 implementation to show details of the workflows.
Then the differences are apparent -- there is one more stage of the workflow,
and IPAW_PROGRAM attribute values of the two final stages are pgmtoppm
and pnmtojpeg
respectively.
The query client is the same as for #1.
Job Provenance gathers and organises information with the grid job being a primary entity of interest. Despite annotations of a job are its intrinsic part, direct anotations of data are not. Therefore this kind query is not supported.
Similarly to Query #9, we might introduce dummy "producer jobs" (i.e. having the particular data file assined as their output), that would carry the annotation. However, we consider this approach too artifitial.
As mentioned with Query #8, JP does not provide means of adding annotations to data directly. However, annotations can be added to jobs (via JPPS interface) and it makes good sense to consider job outputs to be annotated with the job annotations too.
http://twiki.ipaw.info/Challenge/CESNET/Annotations
.
Pseudocode:
Currently neither JPPS nor JPIS supports a query "all attributes of this job". If the annotation names are not known a priori, the following approaches are possible:
Suggest variants of the workflow that can exhibit capabilities that your system support.
Suggest significant queries that your system can support and are not in the proposed list of queries, and how you have implemented/would implement them. These queries may be with regards to a variant of the workflow suggested above.
According to your provenance approach, you may be able to provide a categorisation of queries. Can you elaborate on the categorisation and its rationale.
If your system can be accessed live (through portal, web page, web service, or other), provide relevant information here.
Provide here further comments.
Provide here your conclusions on the challenge, and issues that you like to see discussed at a face to face meeting.
Important terms used, its meaning in scope of Job Provenance, references to futher information.
Term | Meaning | References |
---|---|---|
DAG | DAG means Directed Acyclic Graph, but in our case it is description of a set of jobs with structure (workflow) represented as a DAG | Condor project pages |
gLite | A Grid implementation currently developped in context of EGEE project | EGEE project, gLite middleware home, EGEE JRA1 home |
Filename | In our case a filename is represented by URL referencing the file in a GridFTP? server. | |
JobId | By JobId? we mean here "Grid JobId?", logical name of job at gLite top level (it is not id in local batch system like LSF or PBS). | |
Grid | Large-scale high performance distributed computing environments that provide access to high-end computational resources. | Grid computing dictionary Grid Scheduling Dictionary of Terms and Keywords |
-- CESNET JRA1 team
-- JiriSitera - 22 Aug 2006
I | Attachment ![]() | Action | Size | Date | Who | Comment |
---|---|---|---|---|---|---|
![]() | pch06.jdl-template | manage | 2.5 K | 24 Aug 2006 - 10:09 | JiriSitera | DAG of workflow (template) |
![]() | query1.log | manage | 5.6 K | 12 Sep 2006 - 18:27 | AlesKrenek | Query #1 results |
![]() | query2.log | manage | 1.8 K | 12 Sep 2006 - 18:28 | AlesKrenek | Query #2 results |
![]() | query3.log | manage | 2.1 K | 13 Sep 2006 - 16:00 | AlesKrenek | Query #3 results |
![]() | query5.log | manage | 0.3 K | 13 Sep 2006 - 16:01 | AlesKrenek | Query #5 results |
![]() | query4-ctvrtek.log | manage | 1.7 K | 13 Sep 2006 - 16:03 | AlesKrenek | Query #4 - Thursday |
![]() | cesnet-slides.pdf | manage | 274.9 K | 11 Jul 2007 - 08:50 | AlesKrenek |