Name: Reconstructing from Foreign Provenance

Scenario Authors: David Holland (PASS group; Harvard)

Brief Summary:

Regenerate an intermediate data object from provenance that came from a different provenance system.

One group runs a two-step workflow, and provides the input object, the output object, and the provenance of the output object, but not the intermediate result. Another group takes this information and regenerates the intermediate object. Afterwards, we compare the intermediate objects to see if they're the same.

Scenario Diagram:

doesn't seem necessary.


...anyone, just about.

Requirement for provenance:

Regenerating objects without their provenance is hopeless.

Provenance Questions:

The primary question is whether the regenerated object is really identical to the original.

However, it might also be interesting to compare the provenance of the regenerated object to the provenance of the original.

Technologies Used:

Workflow engine, plus some common processing tool (maybe netpbm or ImageMagick?, since we've used those before...)

Background and description:

What I'm thinking is to take some picture, run two image transforms, chosen such that you can't really tell what the first one was just by looking at the second (e.g. rebalance colors then convert to grayscale), then see if provenance imported from another system is adequate to reproduce the intermediate image correctly.

It is not clear whether it's possible to formulate this in a way that makes it a reasonable challenge. That is, the scenario is simple enough that all you really need is to look at what program was run and what its arguments were, and run that; if that's done manually, it's inherently trivial provided that information is actually present somewhere in the imported provenance. To be interesting it has to be done "automatically", but it's not entirely clear what that really means.

On the other hand, if we expand the scenario to include possible criteria like being sure to run the same version of netpbm, or involving system-level phenomena, it becomes immensely difficult.

Still, I think whether the OPM supports this operation is a very good question, so while I'm not sure it's workable I think it's a desirable thing to try if it can be arranged.

I'm hoping that someone from a workflow engine group can sort these issues out. Reconstructing objects in PASS is extremely hard (because our system is too general to do it well) so we haven't thought about it all that much and not at all recently, and I only just thought of this scenario and the scenario deadline is today.

-- PassProject - 15 May 2010
