Minutes of the SoCA AccessGrid meeting at 16.00 on 25th January 2005
Attending
- Chicago:
- Ian Foster, Kate Keahey, Mike Wilde, Yung Zhao
- Southampton:
- Liming Chen, Dave de Roure, Weijian Fang, Nick Jennings, Simon Miles, Luc Moreau, Victor Tan
Agenda
Luc circulated the following agenda, prior to the meeting (available at http://twiki.grimoires.org/bin/view/Soca/MinutesMeeting25Jan2005):
- Contract News
- Public Twiki
- VOs
- Negotiation
- Workflow
- Registry
- Provenance
Contract
- Luc reported that there is no news on the contract: it has not been received yet.
TWiki
Virtual Organisations
- Nick outlined the background work of Southampton and the agents community on Virtual Organisations.
- In the agents field, a Virtual Organisation is independent agents offering different services combining to provide collective value.
- Most work in agent-based VOs centres around game theory and mechanism design: attempting to maximise utility, stability etc. among a group of agents.
Virtual Organisations for e-Science
- Virtual Organisations for e-Science is an EPSRC-funded project involving Southampton and Liverpool, looking into the basic computer science involved in applying such agent-based mechanisms to the Grid.
- The project is keen to acquire VO use cases, feedback on existing VO work and examples of VOs.
- Luc will put a link to the project on the SoCA TWiki.
SoCA Workshop on Virtual Organisations
- Luc pointed out that the SoCA proposal includes exchange of use cases, which can include examples of Virtual Organisations.
- Ian noted that Carl Kesselman had suggested running a workshop on VOs involving organisation scientists and Grid researchers.
- Carl had suggested holding it at Red Cliff College.
- Nick suggested that the workshop could also involve agent researchers.
- Nick noted that Victor Lesser, who uses organisations defined in terms of collaboration and cooperation, may be interested in participating.
- There is no current timescale for the workshop, but Ian will work with Carl and provide feedback about progress.
VO Use Cases
- Nick would like use cases of dynamic VOs, which are temporary and participants may enter and leave during its existence.
- Ian said that, in Grid 3, groups were coming together to run short (e.g. order of weeks) experiments, particularly biologists.
- Mike noted that most of the work they were involved with involved large-scale, permanent organisations so not very dynamic.
- It was discussed that the term 'Virtual Organisation' may not be used the same way by Grid and agents people.
- For example, VOs may be assumed to include software agents only or may include humans too, and used to refer to teams including humans only.
- Mike suggested that dynamic VOs may appear when VOs are nested.
- For example, within the ATLAS High-Energy Physics experiment, a sub-group may form to work on a particular topic for a few months.
- That VO will gather members, have its own resources, share files etc. to perform their work, then later disband.
- Luc and Nick felt this was a good example of a dynamic VO.
- Mike also suggested that dynamic VO examples may become apparent as the Open Science Grid (OSG) Policy Group starts trying to define application policies.
- Luc will put a link to the OSG and to the Policy Group (link to be supplied by Mike) on the TWiki.
Negotiation
- Nick defined negotiation as the fundamental way agents (both human and software) interact.
- Negotiation takes many shapes and forms, but fits into two classes.
- First, bi-lateral negotiation is between two parties and an example would be service provision following some Service Level Agreement.
- Second, auctions are used to allocate resources and to connect buyers and sellers.
- Ian suggested that SoCA could work on experimental negotiation work, but Luc pointed out that developing a new negotiation protocol is very time consuming.
WS-Agreement
- Kate has worked with the group drawing up the WS-Agreement specification.
- The group removed the part regarding negotiation from the specification, with the assumption that this would be specified separately later.
- This is because the group's focus was on advanced scheduling and primitive negotiation was adequate to start with.
- The group is very interested in learning about strategies for negotiation, for the WS-Negotiation specification.
- One of Nick's researchers has submitted a comment on WS-Agreement to GGF as a public comment, but this mainly confirms that the specification does not properly specify negotiation.
- The WS-Agreement group has done little new since mid-December.
- The implementation of WS-Agreement is in GT4 and so the forthcoming WS-Negotiation specification is likely to also relate to Grid Services.
- Nick will make available some of the Southampton work on negotiation, which also employs WS-Coordination, to be put on the TWiki.
- Kate will make available WS-Agreement documents, to be put on the TWiki.
Chained Negotiation
- Luc has done some work on chained one-to-one bi-lateral negotiation, which is a use case perhaps not considered in WS-Agreement.
- A paper is planned to be submitted to HPDC, and Luc will make it available to SoCA when it is complete.
- Kate felt that new use cases should be primarily submitted to Grid Resource Allocation and Agreement Protocol (GRAAP) Working Group and will investigate how to submit a use case before we write it up.
GGF
- Dave is now on the Global Grid Forum (GGF) Steering Group, has the role of engaging computer scientists in GGF, and is chair of the Semantic Grid working group.
- There is to a radical re-organisation of GGF, which will be announced at GGF 13.
- This will involve the standards and community processes being made more distinct, with research and working groups more clearly separated.
- The steering group will be split, so that sub-groups will steer each of the two different processes.
- There will be a Semantic Grid session at the next GGF, and participants are keen to consider agents and semantic web services in this context.
- GGF are looking for presentations about agents and the Grid focusing on three areas: negotiation, VOs and autonomy.
- One reason the Grid research has not as yet embraced agents may be because there is little use of autonomy, or of autonomic computing.
- GGF may provide a forum for future SoCA discussions.
Workflow
XTDM
- Yung has been working on the XTDM to map physical data access to logical XML schemas, particularly the layer that maps the XML schema to Java-processed tags for access operations.
- Currently there are tags for directory operations, database (SQL) operations, text files and binary files.
- A scripting environment is being developed to help users map the physical format to the schema.
- However, many problems have been identified.
- For example, how should we deal with data types containing sub-types: should the sub-types be independently referenceable or anonymous?
- Or, how can we provide transformations to collect together and integrate a data set from multiple, distributed sites.
- Or, how to resolve file names, when they could be logical (and so the physical location found via a Replica Location Service) or physical.
- A test case maps to a structured file system containing image files; there is not currently a concrete test case for database access.
- Ian suggested Yung could write up the current state of work and distribute this to SoCA members.
XML-OWL Mapping
- At Southampton, Martin Szomszor has been working on mapping between XML and OWL.
- This would allow the exchange of data between services with the same semantics but different syntax.
- The software converts XML of one schema to OWL then back to XML of a different schema.
- The mapping language is based on path-like constructs to refer to data in the XML, rather than being XSLT-like.
- A preliminary paper has been written, and once the idea is more concrete, it may be useful to discuss this within SoCA, as there are parallels with XTDM.
- Yung suggested similarities with Bertram Ludäscher at San Diego Supercomputer Centre, where an ontology includes the schemas of instances of concepts.
BPEL
- Ian has sent Luc the latest thinking about workflows, including how to evolve the capabilities of the DAX language to include iteration.
- BPEL was discussed: Luc is not using BPEL at the moment, only VDT.
- Dave stated that the Open Middleware Infrastructure Institute (OMII) funds a project to develop a BPEL engine at University College London (UCL).
- There has been criticism of BPEL, but UCL claim this is due to poor previous engine implementations, rather than the language itself being bad.
- Ian suggested BPEL was relevant for certain sorts of application.
VDS
- Mike said they would like to address workflows very shortly, though not immediately.
- They would like feedback on VDS and VDT.
- A functional MRI (fMRI) group will be using VDS to run a large number of workflows on data of varied types.
- The application has a public archive of studies being accessed by many users for analysis in different ways.
- To aid this application, they have developed a pre-processor (Perl script) for VDL that parses a for-each construct, to provide processing of a large number of files.
- This avoids the need to write a domain-specific script that spits out the VDL for each file.
- By using this construct, you can now neatly express derivations that process data sets.
- The fMRI application is currently moving from local tests to Grid deployment.
- The way to best distinguish generation tasks (which refine workflows) and transformation tasks (which control execution in workflows) is still being worked on.
PASOA VDS Experiment
- Paul and Simon have been using VDS to run a bioinformatics workflow using Southampton's own provenance recording (as part of the PASOA project), but have been having problems with performance.
- Mike suggested that the ratio of time taken in processing to time taken in scheduling should be appropriate: a guideline is that a processing task should take 15 minutes.
- They are also looking at 'nested refiners' which would tune a workflow by grouping together many short tasks so that they are run together locally, so less scheduling overhead is incurred.
- Yung also suggested tuning the Condor configuration script as, for example, it may use batch-and-submit mode, which may not be appropriate.
- Mike suggested that Southampton could send the figures to Chicago so they could judge whether they were as expected.
- After the HPDC deadline, they will send a description of their use case to Mike.
Provenance
- In Southampton, there are two provenance projects: PASOA and EU Provenance, to which Luc will put links on the TWiki.
- Mike is keen to look for commonalities and differences between current VDS' provenance recording and PASOA's, to reach common standards.
- Luc suggested having an AccessGrid session on APIs for provenance recording after the HPDC submission deadline.
PASOA
- For PASOA, Simon has written a use case paper which can be made available if there is interest.
- For PASOA, Paul has developed a protocol (PReP) for recording the provenance of exchanges between clients and services.
- In PReP, there are two types of provenance.
- Interaction provenance records the messages exchanged between two identified actors (client and service), and actor provenance records information about the state of the actors during the interaction.
- A Web Service and client implementing PReP is available from the PASOA website.
- Paul is writing up version 2 of PReP and WSDL will be available soon: documents will be made available to SoCA on the TWiki after the HPDC deadline.
- In the PASOA VDS experiment discussed earlier, they are submitting provenance to the PReP Web Service as part of the VDS workflow.
VDS Provenance
- Currently, VDS uses 'kickstart' to record provenance, but a standalone HTTP-accessed provenance store has been developed called 'execode'.
- kickstart output can be directed to execode.
- Yung intends to develop the schema used by kickstart to include more information: the workflow being run, the site, retries attempted, exceptions etc.
- They are also looking at recording WS calls, but are unsure what to record: the service endpoints, logical filenames sent in and out, or the data itself?
- They are considering having totally separate provenance for Web Service calls and for the local running of applications and, before this is settled, there will be no work on developing Web Service provenance.
Registry
- In the myGrid project, Luc has worked on semantic description and discovery of Web Services: papers are available for those interested.
- The registry developed to achieve this is now part of the Grimoires project (funded by OMII) and currently being refactored by Weijian.
GT4
- The refactoring work includes deployment in different environments: Axis, OMII's own container and GT3.9.3.
- They would like to test it as a substitute for the registry in GT4, but need to know the API used to register and discover services.
- Ian said that GT4 has index servers, in which every container registers its services with very basic information (name, creation time).
- Registration occurs on start-up of a container and services may also be published in multiple, remote index servers, one of which could include a Grimoires registry.
- Ian will find and distribute the documentation on how to register services in this way.
Metadata
- The Grimoires registry API allows metadata to be attached to a service advert and is agnostic to what that metadata is, so Luc thought it would be interesting to see if the metadata used in VDS could be put into a Grimoires registry.
- Mike also considered we could then try pushing the metadata contained in a Grimoires registry into the Virtual Data Catalog (VDC) so that it can be used in a workflow.
Semantic Checking
- In the PASOA VDS experiment, provenance will be used to determine whether a workflow run was semantically correct, i.e the data output by one service was of the correct semantic type for the service it was passed as input to.
- They use the provenance along with adverts in a Grimoires registry, which will contain semantic type information, to determine if the workflow run was semantically valid.
- The services in the experiment are not Web Services but scripts.
- As part of the experiment, Southampton will describe and scripts as services and provide semantic types and publish these adverts in the registry.
- After the HPDC deadline, they will make these adverts available to Chicago to compare with the information kept in a VDC.
VDS
- Yung is developing some preliminary Web Service interfaces for registering and searching for elements in the VDC.
- Also, Web Service interfaces are available through which you can invoke operations on the VDS.
- This work needs to be restructured before release.
- VDS 1.4 is being cleaned up and new elements documented, but Mike can send a snapshot out.
Security
- Grimoires considers security including access control in the context of the registry.
- OMII mandates its security container, but would they would also like to deploy the registry in GT4 and use its security.
- Ian said that the PERMIS-based security had been integrated into GT4.
Querying
- Grimoires is based on UDDI at core, which has a primitive query language.
- We have extended this to query over metadata, either by direct string matching or, because metadata can be RDF graphs, using the RDQL query language.
- If supporting another query language would be useful, they are interested to hear.
- Yung has implemented querying of the VDC using XML based languages such as XQuery and XPath.
- They are considering extensions of this by embedding queries of other languages into the XML-based queries.
Conclusion
- Members of SoCA will exchange documents about the topics above using the TWiki.
- AccessGrid meetings will be set up for further technical discussions of the above.
- Luc cautioned that there was only 2 years to spend the funding, so we should not delay in setting up workshops and exchange researchers.
- We will set up a meeting around GGF 14 in Chicago, or at HPDC.
- There may usefully be face-to-face meetings for discussing and developing APIs or similar.
- A workshop may include face-to-face meetings by is also for opening up to the wider community, so others will be invited.
- Further planning will take place at or after the first AccessGrid technical discussion.