Performs queries over metadata or Python dictionaries stored in the archive.
qresults = gd_query(query=None,datasource='file',
resultfields='*')
gd_query() with no input arguments starts the query GUI, a Graphical User Interface for querying metadata and structures which also allows hyperlink browsing between related data. See the Geodise Database Toolbox Tutorial for more details.
qresults = gd_query(query) sends a query string to the database requesting all file metadata that meets the criteria specified in the string. A query takes the form 'field.subfield = value', where = can be replaced by other comparison operators. A field is equivalent to a key in a dictionary, and a subfield is equivalent to a key in a subdictionary. More than one query condition can be included in the string using & to join them together. The function returns a list of metadata dictionaries, one for each matching result.
qresults = gd_query(query,datasource) sends a query string to the database requesting matching archived variables or metadata of a certain type, depending on the value of the datasource string. To query metadata setdatasource to ‘file’ (default), ‘varmeta’ (metadata about variables), ‘datagroup’ or ‘monitor’. A list of matching dictionaries is returned, one for each result. To query variables stored in the database set datasource to ‘var’. In this case the function will return a list of matching variables. The only variables that can be queried in this way are dictionaries, because they contain named fields that can be searched for.
qresults = gd_query(query,datasource,resultfields) sends a query string to the database as above but only returns selected fields for each matching result. The resultfields string is a comma separated list indicating which fields should be returned for each result, for example just the standard.ID fields (i.e. {'standard':{'ID':id_value}}). The default, *, returns all fields. To view the query results, use function gd_display.
query A query takes the form 'field.subfield = value' where field is a key in the archived metadata/variable dictionary and subfield is a key in a subdictionary, for example iterations or standard.ID. The value is an alphanumeric value the field should contain, and can also be thought of as the value part of a dictionary entry. The operator & (meaning ‘and’) can be used to specify more than one search condition.
The following operators can be used to compare fields with
values:
= |
Equal to |
!= |
Not equal to |
> |
Greater than |
< |
Less than |
>= |
Greater than or equal to |
<= |
Less than or equal to |
like |
Similar to |
not like |
Not similar to |
Similarity matches with like and not like
use the following wildcards:
_ |
Matches any single character. |
% |
Matches any string of any length (including 0). |
For example, 'standard.localName like %dat%'
will match strings containing the phrase ‘dat’, and
'model.name
like _est%'
will match strings starting with any character followed by
‘est’ and then any string. To search for the characters
_ and %, precede them with the \ escape character.
The operators do case sensitive comparison when used with string
values. To make an operator case insensitive surround it with two #
characters. For example, #=#, #!=#, #like#, #not like#.
Another wildcard, *, provides flexibility in describing the field
path. For example, model.name can be replaced
by *.name for a less specific
search.
In addition to user defined metadata fields, the following standard
metadata fields can be queried:
standard.ID |
ID that uniquely identifies a file, variable or datagroup. |
standard.datagroupname |
Name of datagroup. Only used when querying datagroups. |
standard.localName |
Name of a local file before it was archived. |
standard.byteSize |
Size in bytes of a file. |
standard.format |
Format of file (default is its extension). |
standard.createDate |
Date the file was created/modified. |
standard.archiveDate |
Date the file or variable was archived, or the datagroup was created. |
standard.userID |
ID of the user who archived the data or created the datagroup. |
standard.comment |
Comment about the file, variable or datagroup. |
standard.version |
User defined version number for the file, variable or datagroup. |
standard.tree |
String representing a user defined data hierarchy, similar to a directory path. |
standard.files.fileID |
Each file in a datagroup. |
standard.vars.varID |
Each variable in a datagroup. |
standard. subdatagroups. datagroupID |
Each subdatagroup in a datagroup. |
standard.datagroups. datagroupID |
Each datagroup a file, variable or subdatagroup belongs to. |
Datagroups are collections that can contain files, variables or
other datagroups, see gd_datagroup
and gd_datagroupadd.
The fields in an archived dictionary variable can also be queried
in conjunction with the standard metadata fields for that
variable.
datasource The data source indicates which type of data to query, and can be specified by one of the following strings (the default datasource value is 'file'):
‘file’ |
Metadata about files. |
‘datagroup’ |
Metadata about datagroups. |
‘monitor’ |
Metadata about monitorable datagroups. |
‘varmeta’ |
Metadata about Jython variables. |
‘var’ |
Jython variables. |
A datagroup that was created with the ‘monitor’ flag
can be queried as an ordinary datagroup, or as a collection of data
about a computational job, by setting datasource to
‘monitor’. This provides a quick and easy query
mechanism for finding a user’s most recent job, or the latest
job meeting certain other metadata criteria. It is provided for
convenience so that the user does not have to remember any
particular field names, values, or what time the datagroup was
created. In addition to standard.ID, standard.userID and user defined
metadata, the following standard metadata can be used together with
‘monitor’ to query a job monitoring datagroup.
standard.jobIndex |
Job index. Special query syntax jobIndex = max gets the highest index (most recent job). |
standard.jobName |
Name of job (same as datagroupname). |
standard.startDate |
Start date of job (when the datagroup was created). |
Query file metadata to find files archived on or after 1st September 2004 where iterations = 9000. A datasource argument is not required because ‘file’ is the default.
from gddatabase import *
q = 'standard.archiveDate>=2004-09-01 & iterations=9000'
qresults = gd_query(q)
print len(qresults)
2
print qresults[0].keys()
['standard', 'model', 'iterations', 'params']
print qresults[0]
{'standard':
{'byteSize': 24,
'localName': 'file.dat',
'comment': 'Comment about file',
'archiveDate': '2004-09-03 15:25:45',
'ID': 'file_dat_66830074-e749-4de0-b976-61f4d32',
'format': 'dat',
'createDate': '2004-08-23 10:40:33',
'datagroups': '',
'userID': 'jlw'},
'model': {'name': 'test_design'},
'iterations': 9000,
'params': [1, 4.7, 5.3]}
The above output has been formatted for this document. See gd_display for an example of displaying the full contents of query results in an easy to read format.
print qresults[0]['standard']['archiveDate']
2004-09-03 15:25:45
Query to find files which have a name field equal to ‘test_design’ in their metadata and only return the fields standard.ID and params.
q = '*.name = test_design'
qresults = gd_query(q,'file','standard.ID, params')
print qresults[0]
{'standard':
{'ID': 'file_dat_66830074-e749-4de0-b976-61f4d32'},
'params':[1, 4.7, 5.3]}
Query to find datagroups with comments containing the text ‘experiment’.
q = 'standard.comment like %experiment%'
r = gd_query(q,'datagroup')
Query variable metadata to find the metadata for all variables that are in a particular datagroup.
q = 'standard.datagroups.datagroupID = dg_ce868f40-8ds0-455...'
r = gd_query(q,'varmeta')
Query variables to find dictionaries where the value of field width is between 9 and 14 inclusive.
r = gd_query('width >= 9 & width <= 14', 'var')
Find files that have a comment in their metadata, using "" (two
double quotes) to indicate an empty value.
r = gd_query('standard.comment != ""')
Find the latest job monitoring datagroup then find the latest job monitoring datagroup which matches some other criteria.
m = {'modelver': 0.6}; m2 = {'modelver': 0.71}
gd_datagroup('design model job xyz',m,'monitor')
gd_datagroup('design model job abc',m,'monitor')
gd_datagroup('design model job 999',m2,'monitor')
r1 = gd_query('standard.jobIndex = max','monitor')
print r1[0]['standard']['jobName']
design model job 999
r2 = gd_query('standard.jobIndex = max & modelver <= 0.6', 'monitor')
print r2[0]['standard']['jobName']
design model job abc
When querying standard date information (archiveDate or createDate), specify the date/time using the International Standard Date and Time Notation (ISO 8601) which is: "YYYY-MM-DD hh:mm:ss" (hh:mm:ss is optional).
Only results for data you are authorised to access will be returned. Function gd_addusers can be used to grant access to others.
A valid proxy certificate is required to query the database (see gd_createproxy from the Geodise Compute Toolbox).
Your certificate subject must have been added to the authorisation database.
gd_display, gd_createproxy, gd_archive, gd_retrieve, gd_datagroup, gd_datagroupadd, gd_addusers
Copyright © 2005, The Geodise Project, University of Southampton