The Geodise Toolboxes provide a collection of functions that provide Grid client functionality to the Jython scripting environment. The Geodise Compute, Database and XML toolboxes contain routines that facilitate many aspects of Grid computing and data management including:
Grid computing provides the infrastructure for the collaborative use of computers, networks, data, storage and applications across distributed organisations. A computational job can be run on the Grid to make use of resources unavailable on the user’s desktop, for example to exploit software licenses or greater computational power. The Geodise Compute Toolbox provides Python functions for submitting and monitoring jobs on the Grid, transferring files to and from remote compute resources, and managing the certificates used to identify users and authorise use of the resources.
Compute intensive applications often use and produce many data files and data structures. It can become difficult to find, reuse and share data from various applications that have been run repeatedly with different parameters. The Geodise Database Toolbox can be used to store additional user-defined information (called metadata) describing files and Jython variables, so that they can be located and retrieved more easily with metadata queries. Files and variables can also be grouped together, and data can be shared with other users by granting access permissions.
XML is a flexible standard data format that is widely used to structure and store information, and to exchange data between various computer applications. The XML Toolbox functions convert and store Jython variables and structures from the internal format into XML and vice versa. This allows parameter structures, variables and results from computational applications to be stored in a non-proprietary file format, or in XML-capable databases, and can be used to transfer Jython variables across the Grid. The XML toolbox also enables the transparent exchange of data between the Jython scripting environment and the Matlab technical computing environment.
This user guide introduces the reader to the Compute, Database and XML toolboxes, with tutorials that give an overview of the functionality provided by each of the toolboxes. The function reference for each toolbox contains detailed information about the syntax of its functions.
Jython is an implementation of the powerful object-oriented Python scripting language written in Java. The Python language is a high-level programming language has a clean syntax which allows scientists and engineers to rapidly develop scripts and workflows. The Jython environment also allows the developer to exploit the capabilities of extensive built-in and third party Java libraries. The Jython interpreter is freely available for both commercial and non-commercial use from http://www.jython.org/.
Throughout this manual the term Python refers to the Python scripting language, and the term Jython refers to the Jython scripting environment.
The GeodiseLab toolboxes have applications in a wide range of scenarios. Here we will outline three use cases that describe the potential benefits of Grid computing to the daily practice of the scientist or engineer.
The use cases that we will discuss are:
Engineering Design Search and Optimisation (EDSO) is a compute and data intensive task which is well matched to Grid computing. Optimisation algorithms are used to search the parameter space of an engineering problem to discover an optimal design subject to certain criteria. During EDSO the optimisation algorithm must repeatedly evaluate some measure of the quality of a design; this may involve one or more lengthy numerical calculations. For example, an engineer wishing to improve the aerodynamic performance of a wing design may configure an optimiser to vary key design parameters, whilst invoking simulations of Computational Fluid Dynamics (CFD) to determine the quality of alternative geometries.
Depending upon the complexity of the numerical calculations and the number of evaluations required to determine the optimum design, EDSO may be a lengthy and computationally intensive task. When the evaluation of the objective function involves complex simulations (i.e. CFD) numerous large data files may be required, or produced, by the numerous calculations. The Grid client functionality makes it straightforward for the engineer to leverage computational resources available on the Grid to perform EDSO.
When undertaking EDSO in the Jython scripting environment the engineer may use the Geodise Compute toolbox to automate the transfer of files, and the submission and management of computational jobs required during the evaluation of a design. By exploiting Grid resources not only is the engineer able to leverage the greater computational power available, but he can also drive any applications that he requires on a multitude of platforms from the comfort of his desktop PSE.
Data management is an issue in a number of scientific and engineering application domains, including that of computational electromagnetics. For example, when performing simulations of electromagnetic phenomena a large volume of data may be generated, typically in the form of the input and output files. It is a non-trivial problem for the researcher to store, manage and reuse this data. The investment associated with the computationally expensive Finite Difference Time Domain modelling technique used to explore the properties of electromagnetic devices require that simulation results are suitably managed for reuse at a later data.
At present the most common solution for this problem is to store these flat files within a hierarchical directory structure on a local file system. As the volume of data grows over time this solution is frequently inadequate for long term storage since it may become increasingly difficult to locate and reuse data within the collection. The Geodise Database toolbox provides a solution as a client to a managed data archive on the Grid.
The Geodise Database Toolbox allows the researcher to archive data files to a managed repository from Jython and annotate these files with metadata. In addition to standard metadata the user may define custom metadata specific to the problem. The researcher can then query the metadata to find these files using a straightforward syntax within the Jython scripting environment. In addition the Geodise Database Toolbox supports the archiving of variables from Jython. Items in stored the repository can be associated together into datagroups, allowing the creation of annotated hierarchies within which the user's results can be organised.
The Geodise XML toolbox provides a collection of straight-forward functions which convert variables in the Jython scripting environment to and from the external XML format. Variables in the Jython workspace can be saved to and loaded from an XML file with minimal effort on the part of the researcher. XML is a structured format that can be interpreted by third party applications. By encoding the Jython variables in the XML format there are a number of benefits.
The provision of the Geodise XML toolbox for Matlab allows the transparent exchange of variables between the Matlab technical computing environment and Jython scripting environment. Variables are mapped to the appropriate built-in datatypes in the two languages. This allows researchers working with these two Problem Solving Environments to collaborate on shared datasets.
The Geodise XML toolbox is also leveraged by the Geodise Database Toolbox to store variables and metadata in a database. The contents of variables and metadata in the database can then be queried and searched across. The Geodise Database toolbox may be used to share variables stored in the managed repository between members of a virtual organisation because researchers can authorise other users to access their data. When variables are retrieved from the repository they will be transparently converted into the built-in datatypes of that PSE.
Copyright © 2005, The Geodise Project, University of Southampton