Skip to topic | Skip to bottom

Provenance Challenge

Challenge
Challenge.PassTerminology

Start of topic | Skip to actions

PASS terminology

ancestry information - provenance records, mostly cross-references between pnodes, representing the flow of data during execution. Ancestry information may also be flat text records; an example might be "the URL I downloaded this from".

cross-reference record - a provenance record that points to another pnode. So far, cross-references point to specific versions, not whole objects, but we don't necessarily claim that a cross-reference pointing to a whole object is inherently ill-formed. Cross-reference records may be either ancestry information or identity information.

cycle - if an object's ancestry includes itself, directly or indirectly, its provenance contains a cycle.

cycle-breaking - algorithms or steps taken during provenance collection to ensure that the output provenance does not create cycles.

flat record - a provenance record whose value is a text string. As opposed to cross-reference record. Most flat records are identity information but some may be ancestry information.

freeze - a version of an object is considered complete when it is frozen. Just exactly when objects should be frozen can be debated at some length. Most of the proposed cycle-breaking algorithms work by forcing additional freezes.

identity information - provenance records, mostly flat text records, that describe the identity of an object rather than its creation or history. Identity information may also be cross-reference records; one example we've encountered is "this file was standard input when the process was started."

object or provenanced object - any "thing" for which provenance is collected or stored.

phony - in PASSv2, we have a model for allowing provenance systems to stack on top of one another. Objects tracked by higher levels of a system are often aggregates or subsections of objects that lower levels understand; or sometimes they may be entirely conceptual. In order to manage provenance properly, these objects must be instantiated at the lower level; we call the instantiations phony objects or phonies. Note that whether an object is "phony" depends on your perspective: to a file system, anything that isn't a file is a phony; however, to an application, users and projects and book chapters and other such things are very real, even though they will appear as phonies at lower levels. This term has proven somewhat confusing but has also become entrenched.

pnode (etymology: derived from "inode") - a container for the complete provenance of a single object, including both identity information and ancestry information. Thus also sometimes shorthand for the provenance itself.

provenance record - the basic unit of provenance; an attribute/value pair. May be either a flat record (where the value is a text string) or a cross-reference record (where the value is a pointer to another object and version).

recycle - in PASS v2, when an object is completely emptied of data (such as when a file is truncated to zero length, or when a process successfully calls execve) we say that it is recycled.

thaw - start a new version of an object. The version number is incremented. The new version is "under construction" until a freeze occurs.

version - when files (and other things as well) are updated in place, different versions are created over time. A provenance system must keep track of these versions, because the distinctions may be vital. It is excessively expensive to allow every kernel-level write operation to create a new version; however, folding versions together introduces the possibility of cycles and necessitates cycle-breaking. Versions are created at thaw time and are considered complete at freeze time.

version pumping - a form of provenance explosion caused by the interaction of circular data flow with naive cycle-breaking algorithms.

-- PassProject - 24 Feb 2007
to top


You are here: Challenge > FirstProvenanceChallenge > Terminology > PassTerminology

to top

Copyright © 1999-2012 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback