Modeling objects at multiple granularities
Name: Modeling objects at multiple granularities
Scenario Authors: Harvard University
Brief Summary: The scenario simulates collaboration of multiple research groups using different provenance collection systems, highlighting the need for interoperability and for the ability to handle collections, such as compressed archives of files. Passing this challenge requires importing a provenance graph into a different system and extending it, as well as pushing the expressiveness of OPM to represent relationships between objects and collections of those objects.
Research group A produces several data files and processes them using workflow 1. They create an archive (such as .zip or .tar.gz) that contains both the original data files and their results. The group can manipulate the archive by adding and removing files, such as by adding a README file. The researchers then send the file with the appropriate provenance to their collaborators from group B. The group B copies the archive to an appropriate location, uncompresses it, and processes the data further using their workflow 2. Finally, they compress their results and post the archive on their webpage.
For example, we can adapt the fMRI workload from the First and Second Provenance Challenges for this purpose. Group A1 produces the four sets of input files and process them using align_warp and reslice. They compress and post their results. Group A2 obtains the archive, reprocesses the input files using a slightly different workload, updates the output files, and posts them on their webpage. Group B obtains the archives from both A1 and A2 and processes the data further using softmean, slicer, and convert. Finally, scientists in group B issue multiple provenance queries to try to understand why the two sets of results are different, and what A2 did differently.
Users: Scientists that collaborate with other research groups
Requirement for provenance: The second research group and the final consumers of the produced data need to understand what steps were involved in producing the results
Provenance Questions: The scientists need to be able to issue queries in order to learn what steps were used to produce the results, spanning both research groups (similar to the queries in the First Provenance Challenge). They further need to ask questions that involve the compressed archives, such as, which archive did this file come from? Which research group produced the archive? What did we do with this archive? - e.g. we uncompressed it to get files X, Y, Z, which we then processed using our workflow to produce files V, W, which we then compressed and posted on our website.
Technologies Used: Scientific workflow, file compression
--
PeterMacko - 14 May 2010
to top