gd_datagroup

Creates a new datagroup, used to group together archived files, variables and subdatagroups.

Syntax

datagroupID = gd_datagroup(datagroupname,metadata=None,

datagrouptype='')

Description

datagroupID = gd_datagroup(datagroupname) creates a new, empty datagroup with a datagroup name. The datagroupname argument can act as a user defined identifier for the datagroup, although it does not have to be unique. Some standard information about the datagroup (metadata) is also generated which can be later queried with gd_query. A unique identifier (datagroupID) is returned which can then be used to add files and variables to the datagroup while they are being archived with gd_archive. Files, variables and other datagroups already in the archive can be added to a datagroup with gd_datagroupadd.

datagroupID = gd_datagroup(datagroupname,metadata) creates a new, empty datagroup with a datagroup name and some user defined metadata which can later be queried with gd_query. Standard metadata about the datagroup is also generated.

datagroupID = gd_datagroup(datagroupname,metadata,

'monitor') is useful for monitoring a group of data produced by a computational job. It is similar to an ordinary datagroup but stores extra index information that allows a user of gd_query to easily find the datagroup associated with their most recent job, or the most recent job meeting certain metadata criteria. This functionality is provided for convenience so that the user does not have to remember any particular metadata field names or values, or what time the datagroup was created.

Input Arguments

metadata The keys (referred to as fields) in the metadata dictionary must be strings but the values can contain any combination of variables (string, integer, float, complex, dictionary, list, or tuple) necessary to describe the datagroup. There are two special subdictionaries, standard and access, which may only contain certain values.

Some metadata is automatically generated (even when no metadata is passed to the function) and stored in the standard subdictionary of themetadata dictionary. For datagroups this consists of ID, userID and archiveDate. Optional comment, version and tree fields can also be added to standard. The tree field is a string which can be used to represent a user defined hierarchy for the data, similar to a directory path, e.g. 'myuserID/designs/testmodel'. See gd_query for further information on these standard fields. Any other fields set in the standard subdictionary will be overwritten or removed.

The access subdictionary of metadata controls who may query the datagroup. The person who created the datagroup automatically has access to it and does not need to be added. access can contain two fields, each of which can be a single string or a list of strings:

users	User ID strings specifying which users may access the datagroup.
groups	Group ID strings specifying which groups of users may access the datagroup (currently a user group must be created in the database by an administrator).

Examples

Create a datagroup with some metadata, m (user defined metadata and a standard comment), and give access permission to user1 and user2.

from gddatabase import *

m = {'expnum': 123}

m['standard'] = {'comment': 'Data for experiment 123'}

m['access'] = {'users': ['user1','user2']}

datagroupID = gd_datagroup('design opt 2004-09-03',m)

print datagroupID

dg_ce868f40-8ds0-455e-9ae5-36c05epc25a9

Add a file to the datagroup when it is archived.

gd_archive('C:/file.dat', datagroupID=datagroupID)

Add a variable to the datagroup after it has been archived.

v = {'width': 12}

varID = gd_archive(v)

gd_datagroupadd(datagroupID,varID)

Create a monitored datagroup and find it with a query.

monID = gd_datagroup('design job 2004-09-03', datagrouptype='monitor')

gd_datagroupadd(monID,varID)

gd_query('standard.jobIndex = max','monitor')

Further examples are given in gd_datagroupadd and gd_query.

Notes

A valid proxy certificate is required (see gd_createproxy from the Geodise Compute Toolbox).