After the presentation of each team's results, several discussions were held centering on how to move forward as a community. A broad consensus developed that there was need for a second challenge. This, current challenge had shown that, for the most part, each team was able to extract the right kinds of information about the challenge workflow to be able to answer the provenance queries that had been set. However, the issue remained that it was unclear whether the data sets obtained by each team were equivalent.
One issue that was clear was that each team was using different language/terminology to describe their provenance systems and models, and that this was clouding the issue as to whether or not the team's models were compatible. An idea was put forward for each team to develop a glossary of terms for their respective systems and to try to link their terms to analogous terms from other teams. These glossaries are to be published on the challenge Twiki.
The discussion then turned to a more detailed understanding of how the teams' systems might be compatible. Even if the teams' systems differed, there might be a route to interoperability if a level of abstraction could be found that all systems adhered to.
It was suggested that for the next provenance challenge, teams should pair up to see if they can make their respective systems interoperable. This would involve passing information from one system to another in such a manner that the latter system can understand the information it receives. A simple text-based or XML output from systems was considered the best approach so that data conversion issues could be avoided.
On the second day of the challenge, another discussion took place. This time the discussion centered around two of the points brought out in the initial discussion:
The question was raised �what role does time play in provenance, how should it be recorded?� Views differed with some participants believing that it was just another form of state information, others believing it to be a form of annotation. The point was brought up that provenance information does not necessarily need time information if provenance is considered to be a causal graph. Furthermore, linking different systems and their respective views of time is problematic, since distributed clocks pose many synchronicity problems. Understanding what a time stamp means in a distributed system is difficult. It was discussed how time should be understood as an ontology that all systems could interpret. A need was identified for queries in which time information makes an explicit difference to the answer.
Can the community deal with provenance queries that require information not defined in workflows? This can occur if we consider annotations of data as not part of a workflow; however, this does not deal with the important notion of causal events, outside of a workflow and how they can be interpreted/captured by any of the teams� systems.
A participant noted that workflows are central to our approach because they enable many other forms of information to be associated to them that can be used by the systems, i.e. annotations. What is needed is for the community to develop a generalised example that does not rely on a workflow representation of activity.
The outcome of the discussion was inconclusive, with some teams holding to the centricity of workflows; others believing that a definition of provenance should not rely on workflows.
The final discussion of the workshop sought to pull together the issues of the preceding discussions in order to clarify a way forward. Luc Moreau proposed that an article should be written that would expose the differences and similarities with each teams� approach. This would act as a �readers guide� to provenance systems. The article should attempt to address five points:
A Twiki will be set up that on which each team can describe their systems (this is to be completed by mid November 2006), and a classification of provenance systems will be developed. Feedback to each team will occur before Christmas 2006 and final versions to be submitted by the end of January 2007.
During this period, each team will also supply their glossaries as discussed in the first discussion.
Final discussion relating to the forthcoming Interoperability Challenge also occurred. The challenge would work on the same workflow as the previous challenge, except that each team is work on only a portion of the workflow (instead of the whole workflow as before), while their partner works on another portion. Each team is to use provenance information from the part of the workflow derived from their partner team and incorporate this into the provenance information they derive from their section of the workflow. This will test the interoperability of each team�s provenance information and test their ability to answer queries using provenance information derived from different provenance systems.
It was decided that each team was to export data from their system to their partner team in a text based format to avoid data conversion problems. The partnering team should then try to import this data into their system and combine it with their own information. One suggestion is to use the University of Maryland�s provenance ontology developed for the first challenge. The release of this data is to be completed by January 2007. In June 2007, in Monterey, discussion will be held regarding progress. The next Provenance challenge is scheduled for the Autumn/Winter of 2007.
Finally, the management of this challenge is to be jointly held by Luc Moreau, Jim Myers and Mike Wilde, who will work on a new set of queries for the challenge.
-- SteveMunroe - 06 Oct 2006
to top