Skip to topic | Skip to bottom

Provenance Challenge

Challenge
Challenge.ProducingFisheryCountryProfiles

Start of topic | Skip to actions

Producing Fishery Country Profiles in a D4Science Virtual Research Environment

Scenario Authors: Leonardo Candela, Donatella Castelli, Alice Tani and Pasquale Pagano Istituto di Scienza e Tecnologie dell’Informazione “A. Faedo” – Consiglio Nazionale delle Ricerche (CNR) – Pisa, Italy

Brief Summary:

Context

D4Science (www.d4science.eu) is an e-Infrastructure that supports transparent sharing of computing, service and content distributed resources. It also supports on-demand creation and management of Virtual Research Environments (VREs), i.e. applications serving specific application scenarios, built by dynamically aggregating resources of the infrastructure. D4Science currently supports a number of VREs serving the needs of various e-Science communities including the Fisheries and Aquaculture Resource Management community. The scenario described below is a slight revised interpretation of an application scenario supported by one of the D4Science-enabled VREs.

FAO (The Food and Agriculture Organization of the United Nations) periodically produces and publishes Fisheries Country Profiles (FCPs), i.e. documents reporting for a certain country at a certain date the result of analyses on environmental, economical, species distribution and other specific fisheries-related aspects of that country. Once a complete version of this document has been produced, many other variants of it are produced by reusing its content. Each variant addresses the needs of a different target audience: stakeholders, decision makers, industry managers, scientists, etc. Published FCPs are stored and made accessible in the FAO document repository which is one of the data sources registered in the D4Science infrastructure.

A specific D4Science-enabled VRE called FCPPS (“Fisheries Country Profile Production System”) has been set up to facilitate the production of FCPs and their versions. This VRE provides transparent access to the many different heterogeneous sources of information and to the tools that are required to produce the information reported in FCPs. Moreover, it offers an environment supporting the editing and the publication of these complex documents.

FCPs are compound document composed by the following parts:

  • a text describing the specific aspects related to the addressed country;
  • species distribution maps related to the country and to the species living in the country;
  • a graph visualizing species catch statistics trends in the country;
  • a table comparing catch statistics data represented in the graph with the species distribution represented in the map.
  • The text is edited by a team of scientists.

The species distribution map is produced through a two step process: a first version of the map, called AquaMaps?, is generated by applying a specific model-based predictive algorithm, originally developed by Kristin Kaschner and colleagues. This algorithm matches models capturing the environmental tolerance of a species with respect to depth, salinity, temperature, primary productivity, and their association with sea ice and coastal areas against local environmental data. In a second step, the map is revised by applying modifications suggested by a number of domain experts.

The graph is generated by analyzing species catch statistics data maintained in repositories of the different FAO areas pertinent to the specific country [ ]. Depending on the modalities used to collect these data, their “quality” (e.g. precision) can be very different.

The table is obtained by taking into account the original data used to generate the map and the catch statistics, and also the corrections on the maps brought by the experts.

Diverse policies are associated with the different parts of the FCP. In particular, each part can be re-used only by its initial creator, by the creator's collaborators and by the experts that have been involved in the process of producing the full original FCP version.

The FCP and its parts have been generated under the control of some remote service. The process described below starts assuming that the FCP is now available in the FAO repository service.

Process

A FAO team of the Knowledge and Capacities Department has been commissioned to produce a good quality 10 pages summary version of the Canada FCP focused on the herring species. This version will be disseminated to policy makers. The team decides to perform this task in the framework of the FCCPS VRE. In order to do it, the team has to do the following steps:

  1. Access the Canada FCP;
  2. Open the available document editor and select the FCP summary template;
  3. Fill the textual part by reusing text from the original FCP;
  4. Add the map, the graph and the table for the herring species.

The map, graph and table are expected to be give a picture of the situation in the country updated to the last six months and to be of good quality. The quality of the graph is a function of the quality of the data that were originally used to produce it. If more up-to-date species observation data are available or the quality of the graph is below a certain threshold and an alternative better quality data source for that country is available in the e-infrastructure then thiscomponents is regenerated.

If the graph or the map are regenerated, then the table is regenerated as well to be consistent with the applied changes.

All the steps above must agree with the re-use policies associated with the different parts. Complying with these policies must be verified before starting the FCP summary version production process.

Scenario Diagram:

Users:

The target users for the scenario are scientists and knowledge managers and more in general, anyone who wants to create compound information objects composed of multimedia and multi-content parts.

Requirement for provenance:

Provenance technology is essential to solve this scenario. In particular, it is required for performing the following operations:

  1. Checking that the re-use and modification policies on the documents and its parts are satisfied;
  2. Checking that the graph is up to date and it has the required quality;
  3. Generate an up to date version of the map, graph and table.

For achieving these purposes, the consumer must have the following provenance information about each constituent part of the FCP:

  • who created it and which were her/his collaborators that are allowed to modify it;
  • how were the map, the graph and the table generated, i.e. which primary data, parameters and workflow were used to produce the original version of them;
  • why the experts revised the map;
  • where the part generation algorithms are located;
  • when the map was created, to understand if it has to be updated;
  • which was the quality of the primary date used.

Note that in this scenario the data providers might also be willing to use provenance information in order to understand who has used the published data (for example, for measuring their impact in the scientific domain).

Provenance Questions:

The provenance questions that a user would pose are the following:

  1. Who is/are the creator/s of an object part?
  2. Where was an object created?
  3. When was the map created?
  4. How was an object part created? That is, which data, parameters and processes/workflow were used to create the object?
  5. What are the differences between distinct versions of an object?
  6. Why was an object created/modified?

Technologies Used:

The scenario is based on the following technologies:

  • Users interact with the system through a web browser;
  • All functionalities are offered through graphical user interfaces;
  • Single sign-on technology is supported by the infrastructure;
  • Data are stored in a multitude of storage back-ends: relational databases, XML databases, column-stores, document repositories, and graph-tailored storage repositories;
  • Functionalities are realized as web-services, standalone applications, legacy code, plain executables;
  • Functionalities can be combined in workflows. Workflows can be persisted and used in conjunction with other workflows.

Background and description: D4Science Website: www.d4science.eu

gCube (D4Science enabling system): http://www.gcube-system.org/

-- PaulGroth - 27 May 2010
to top


Challenge.ProducingFisheryCountryProfiles moved from Main.ProducingFisheryCountryProfiles on 08 Jun 2010 - 12:19 by LucMoreau - put it back
You are here: Challenge > FourthProvenanceChallenge > FourthProvenanceChallengeCFSP > ProducingFisheryCountryProfiles

to top

Copyright © 1999-2012 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback