Scenario Authors:Simon Miles, Mark Hedges, Stella Fabiane
Characteristics
This proposed provenance challenge scenario is based on a simple (largely linear) process. However, it exhibits features not found in previous challenges, namely It is not purely automated. Some steps involve a user making a decision. The results of each execution of the process are accessible via the web. There is an emphasis on maintaining the ability to determine provenance in the long-term, not just immediately following process execution.
Brief Summary
Crystallography is the experimental science of determining the arrangement of atoms in solids. Crystallographic methods depend on the analysis of the diffraction patterns that emerge from a crystal sample that is targeted by X-ray beams. In the scenario described below, scientists perform a series of steps to produce a set of atom coordinates from a crystal, and then publish this on a public database. The raw data and conduct of an experiment which produced a crystal image are important for others to interpret the quality of that image.
Scenario Diagram
The figure below shows the process around which this scenario is based. Artefacts (data or physical) are depicted as ovals, while boxes represent processes. Where a process is marked with a U, this means it is conducted by the user rather than being automated.
Reading the process from top-left downwards, the artefacts and processes denote the following.
Users
This experiment is one performed by crystallographers working King's College London. The process abstracts from the details but has been confirmed to be realistic, and the provenance questions are once which have been confirmed as valuable to answer.
Requirement for Provenance
The quality of the data produced is critical not only in understanding the crystalised molecule, but because the data from one experiment is used in creating images in future experiments. The public database can only store the coordinate and reflection files, but not the diffraction files (as they are too large).
Provenance Questions
We wish to ask the following questions about the provenance of a crystal image.
Question 1: It is 10 years after the process was conducted, and the process has become obsolete. For a given published crystal image (named by web reference), what was the raw diffraction images from which the crystal image was produced? Assume that the public database can contain only the coordinate and reflection files, and data kept on the desktop PC which ran the process has gone and knowledge in people's heads has been forgotten.
Question 2: For a given crystal, how often did a crystallographer reject and reproduce coordinates (the later stages of the experiment)? This is important because difficulty in obtaining an adequate crystal image can indicate that the original diffraction data was poor quality.
Technologies Used
A crystal image may be identified by a URL browsing the web interface of the database, or may be seen as a row in a database table, as the challenge participant prefers. Tools and sample data are available for the software stages of the process.
Background
This scenario was developed in the context of the Biophysical Repositories in the Lab (BRIL) project.
-- SimonMiles - 20 May 2010
to top