gd_query

Performs queries over metadata or Python dictionaries stored in the archive.

Syntax

qresults = gd_query(query=None,datasource='file',

resultfields='*')

Description

gd_query() with no input arguments starts the query GUI, a Graphical User Interface for querying metadata and structures which also allows hyperlink browsing between related data. See the Geodise Database Toolbox Tutorial for more details.

qresults = gd_query(query) sends a query string to the database requesting all file metadata that meets the criteria specified in the string. A query takes the form 'field.subfield = value', where = can be replaced by other comparison operators. A field is equivalent to a key in a dictionary, and a subfield is equivalent to a key in a subdictionary. More than one query condition can be included in the string using & to join them together. The function returns a list of metadata dictionaries, one for each matching result.

qresults = gd_query(query,datasource) sends a query string to the database requesting matching archived variables or metadata of a certain type, depending on the value of the datasource string. To query metadata setdatasource to ‘file’ (default), ‘varmeta’ (metadata about variables), ‘datagroup’ or ‘monitor’. A list of matching dictionaries is returned, one for each result. To query variables stored in the database set datasource to ‘var’. In this case the function will return a list of matching variables. The only variables that can be queried in this way are dictionaries, because they contain named fields that can be searched for.

qresults = gd_query(query,datasource,resultfields) sends a query string to the database as above but only returns selected fields for each matching result. The resultfields string is a comma separated list indicating which fields should be returned for each result, for example just the standard.ID fields (i.e. {'standard':{'ID':id_value}}). The default, *, returns all fields. To view the query results, use function gd_display.

Input Arguments

query A query takes the form 'field.subfield = value' where field is a key in the archived metadata/variable dictionary and subfield is a key in a subdictionary, for example iterations or standard.ID. The value is an alphanumeric value the field should contain, and can also be thought of as the value part of a dictionary entry. The operator & (meaning ‘and’) can be used to specify more than one search condition.

The following operators can be used to compare fields with values:

=	Equal to
!=	Not equal to
>	Greater than
<	Less than
>=	Greater than or equal to
<=	Less than or equal to
like	Similar to
not like	Not similar to

Similarity matches with like and not like use the following wildcards:

_	Matches any single character.
%	Matches any string of any length (including 0).

For example, 'standard.localName like %dat%' will match strings containing the phrase ‘dat’, and 'model.name like _est%'
will match strings starting with any character followed by ‘est’ and then any string. To search for the characters _ and %, precede them with the \ escape character.

The operators do case sensitive comparison when used with string values. To make an operator case insensitive surround it with two # characters. For example, #=#, #!=#, #like#, #not like#.

Another wildcard, *, provides flexibility in describing the field path. For example, model.name can be replaced by *.name for a less specific search.

In addition to user defined metadata fields, the following standard metadata fields can be queried:

standard.ID	ID that uniquely identifies a file, variable or datagroup.
standard.datagroupname	Name of datagroup. Only used when querying datagroups.
standard.localName	Name of a local file before it was archived.
standard.byteSize	Size in bytes of a file.
standard.format	Format of file (default is its extension).
standard.createDate	Date the file was created/modified.
standard.archiveDate	Date the file or variable was archived, or the datagroup was created.
standard.userID	ID of the user who archived the data or created the datagroup.
standard.comment	Comment about the file, variable or datagroup.
standard.version	User defined version number for the file, variable or datagroup.
standard.tree	String representing a user defined data hierarchy, similar to a directory path.
standard.files.fileID	Each file in a datagroup.
standard.vars.varID	Each variable in a datagroup.
standard. subdatagroups. datagroupID	Each subdatagroup in a datagroup.
standard.datagroups. datagroupID	Each datagroup a file, variable or subdatagroup belongs to.

Datagroups are collections that can contain files, variables or other datagroups, see gd_datagroup and gd_datagroupadd.

The fields in an archived dictionary variable can also be queried in conjunction with the standard metadata fields for that variable.

datasource The data source indicates which type of data to query, and can be specified by one of the following strings (the default datasource value is 'file'):

‘file’	Metadata about files.
‘datagroup’	Metadata about datagroups.
‘monitor’	Metadata about monitorable datagroups.
‘varmeta’	Metadata about Jython variables.
‘var’	Jython variables.

A datagroup that was created with the ‘monitor’ flag can be queried as an ordinary datagroup, or as a collection of data about a computational job, by setting datasource to ‘monitor’. This provides a quick and easy query mechanism for finding a user’s most recent job, or the latest job meeting certain other metadata criteria. It is provided for convenience so that the user does not have to remember any particular field names, values, or what time the datagroup was created. In addition to standard.ID, standard.userID and user defined metadata, the following standard metadata can be used together with ‘monitor’ to query a job monitoring datagroup.

standard.jobIndex	Job index. Special query syntax jobIndex = max gets the highest index (most recent job).
standard.jobName	Name of job (same as datagroupname).
standard.startDate	Start date of job (when the datagroup was created).

Examples

Query file metadata to find files archived on or after 1^st September 2004 where iterations = 9000. A datasource argument is not required because ‘file’ is the default.

from gddatabase import *

q = 'standard.archiveDate>=2004-09-01 & iterations=9000'

qresults = gd_query(q)

print len(qresults)

print qresults[0].keys()

['standard', 'model', 'iterations', 'params']

print qresults[0]

{'standard':

{'byteSize': 24,

'localName': 'file.dat',

'comment': 'Comment about file',

'archiveDate': '2004-09-03 15:25:45',

'ID': 'file_dat_66830074-e749-4de0-b976-61f4d32',

'format': 'dat',

'createDate': '2004-08-23 10:40:33',

'datagroups': '',

'userID': 'jlw'},

'model': {'name': 'test_design'},

'iterations': 9000,

'params': [1, 4.7, 5.3]}

The above output has been formatted for this document. See gd_display for an example of displaying the full contents of query results in an easy to read format.

print qresults[0]['standard']['archiveDate']

2004-09-03 15:25:45

Query to find files which have a name field equal to ‘test_design’ in their metadata and only return the fields standard.ID and params.

q = '*.name = test_design'

qresults = gd_query(q,'file','standard.ID, params')

print qresults[0]

{'standard':

{'ID': 'file_dat_66830074-e749-4de0-b976-61f4d32'},

'params':[1, 4.7, 5.3]}

Query to find datagroups with comments containing the text ‘experiment’.

q = 'standard.comment like %experiment%'

r = gd_query(q,'datagroup')

Query variable metadata to find the metadata for all variables that are in a particular datagroup.

q = 'standard.datagroups.datagroupID = dg_ce868f40-8ds0-455...'

r = gd_query(q,'varmeta')

Query variables to find dictionaries where the value of field width is between 9 and 14 inclusive.

r = gd_query('width >= 9 & width <= 14', 'var')

Find files that have a comment in their metadata, using "" (two double quotes) to indicate an empty value.

r = gd_query('standard.comment != ""')

Find the latest job monitoring datagroup then find the latest job monitoring datagroup which matches some other criteria.

m = {'modelver': 0.6}; m2 = {'modelver': 0.71}

gd_datagroup('design model job xyz',m,'monitor')

gd_datagroup('design model job abc',m,'monitor')

gd_datagroup('design model job 999',m2,'monitor')

r1 = gd_query('standard.jobIndex = max','monitor')

print r1[0]['standard']['jobName']

design model job 999

r2 = gd_query('standard.jobIndex = max & modelver <= 0.6', 'monitor')

print r2[0]['standard']['jobName']

design model job abc

Notes

When querying standard date information (archiveDate or createDate), specify the date/time using the International Standard Date and Time Notation (ISO 8601) which is: "YYYY-MM-DD hh:mm:ss" (hh:mm:ss is optional).

Only results for data you are authorised to access will be returned. Function gd_addusers can be used to grant access to others.

A valid proxy certificate is required to query the database (see gd_createproxy from the Geodise Compute Toolbox).