Performs queries over metadata or Matlab structures stored in the archive.
gd_query
qresults = gd_query(query)
qresults = gd_query(query,datasource)
qresults = gd_query(query,datasource,resultfields)
qresults = gd_query(query,datasource,resultfields,
orderby)
qresults = gd_query(query,datasource,resultfields,
orderby,resultlimit)
qresults = gd_query(query,datasource,resultfields,
resultlimit)
gd_query with no input arguments starts the query GUI, a Graphical User Interface for querying metadata and structures which also allows hyperlink browsing between related data. See the Geodise Database Toolbox Tutorial for more details.
qresults = gd_query(query) sends a query string to the database requesting all file metadata that meets the criteria specified in the string. A query takes the form 'field = value', where = can be replaced by other comparison operators. More than one query condition can be included in the string using & to join them together. The function returns a cell array of metadata structures, one for each matching result. To view the query results, use function gd_display.
qresults = gd_query(query,datasource) sends a query string to the database requesting matching archived structures or metadata of a certain type, depending on the value of the datasource string. To query metadata setdatasource to ‘file’ (default), ‘varmeta’ (metadata about variables), ‘datagroup’ or ‘monitor’. A cell array of matching structures is returned, one for each result. To query variables stored in the database set datasource to ‘var’. In this case the function will return a cell array of matching variables. The only variables that can be queried in this way are structures, because they contain named fields that can be searched for.
qresults = gd_query(query,datasource,resultfields) sends a query string to the database as above but only returns selected fields for each matching result. The resultfields string can be one of the following:
1. A comma separated list indicating which fields should be returned for each result, for example just the standard.ID fields. The default, *, returns all fields. Prefix with the keyword 'distinct' to remove duplicates from the results, e.g. 'distinct field1, field2'.
2. An aggregate function (count, max, min, sum, avg) applied to a field, followed by an optional 'groupby' clause. For example, gd_query(query,'file','count(field1)') executes the query and returns the number of matching results which contain field1. The number of matching results for each different value of field2 can be returned with 'count(*) groupby field2'. Further details are given in the Input Arguments section.
qresults = gd_query(query,datasource,resultfields, orderby) sends a query to the database and sorts the results by one or more fields, specified as a comma separated list prefixed with the keyword 'orderby'. Use num() or str() on each field in the list to indicate whether to sort it numerically or alphabetically, e.g. 'orderby num(field1), str(field2)'.The default sort order is ascending; append the keyword 'desc' after a field to sort it in descending order, e.g. ‘orderby num(field1) desc’.
qresults = gd_query(query,datasource,resultfields, orderby, resultlimit) limits the number of results returned from a query. An integer value means return the top resultlimit values from the sorted results. An ordinal string of the form '1st' or '2nd' etc means return a specific result based on its position.
qresults=gd_query(query,datasource,resultfields, resultlimit) limits the number of results returned using resultlimit as above but without the overhead of sorting, which makes the operation quicker. This can be used to get a sample set of results when the order does not matter.
query A query takes the form 'field = value' where field is the name of a field in the archived metadata/variable structure, for example iterations or standard.ID (dot notation is used to access the subfields of a structure). The value is an alphanumeric value the field should contain. The operator & (meaning ‘and’) can be used to specify more than one search condition.
The following operators can be used to compare fields with
values:
= |
Equal to |
!= |
Not equal to |
> |
Greater than |
< |
Less than |
>= |
Greater than or equal to |
<= |
Less than or equal to |
like |
Similar to |
not like |
Not similar to |
Similarity matches with like and not like
use the following wildcards:
_ |
Matches any single character. |
% |
Matches any string of any length (including 0). |
For example, 'standard.localName like %dat%'
will match strings containing the phrase ‘dat’, and
'model.name
like _est%'
will match strings starting with any character followed by
‘est’ and then any string. To search for the characters
_ and %, precede them with the \ escape character.
The operators do case sensitive comparison when used with string
values. To make an operator case insensitive surround it with two #
characters. For example, #=#, #!=#, #like#, #not like#.
Another wildcard, *, provides flexibility in describing the field
path. For example, model.name can be replaced
by *.name for a less specific
search.
In addition to user defined metadata fields, the following standard
metadata fields can be queried:
standard.ID |
ID that uniquely identifies a file, variable or datagroup. |
standard.datagroupname |
Name of datagroup. Only used when querying datagroups. |
standard.localName |
Name of a local file before it was archived. |
standard.byteSize |
Size in bytes of a file. |
standard.format |
Format of file (default is its extension). |
standard.createDate |
Date the file was created/modified. |
standard.archiveDate |
Date the file or variable was archived, or the datagroup was created. |
standard.userID |
ID of the user who archived the data or created the datagroup. |
standard.comment |
Comment about the file, variable or datagroup. |
standard.version |
User defined version number for the file, variable or datagroup. |
standard.tree |
String representing a user defined data hierarchy, similar to a directory path. |
standard.files.fileID |
Each file in a datagroup. |
standard.vars.varID |
Each variable in a datagroup. |
standard. subdatagroups. datagroupID |
Each subdatagroup in a datagroup. |
standard.datagroups. datagroupID |
Each datagroup a file, variable or subdatagroup belongs to. |
Datagroups are collections that can contain files, variables or
other datagroups, see gd_datagroup
and gd_datagroupadd.
The fields in an archived structure variable can also be queried in
conjunction with the standard metadata fields for that variable.
However, this can be an expensive operation because two data
sources, ‘var’and ‘varmeta’ (see below),
are queried.
datasource The data source indicates which type of data to query, and can be specified by one of the following strings (the default datasource value is 'file'):
‘file’ |
Metadata about files. |
‘datagroup’ |
Metadata about datagroups. |
‘monitor’ |
Metadata about monitorable datagroups. |
‘varmeta’ |
Metadata about Matlab variables. |
‘var’ |
Matlab variables. |
A datagroup that was created with the ‘monitor’ flag
can be queried as an ordinary datagroup, or as a collection of data
about a computational job, by setting datasource to
‘monitor’. This provides a quick and easy query
mechanism for finding a user’s most recent job, or the latest
job meeting certain other metadata criteria. It is provided for
convenience so that the user does not have to remember any
particular field names, values, or what time the datagroup was
created. In addition to standard.ID, standard.userID and user defined
metadata, the following standard metadata can be used together with
‘monitor’ to query a job monitoring datagroup.
standard.jobIndex |
Job index. Special query syntax jobIndex = max gets the highest index (most recent job). |
standard.jobName |
Name of job (same as datagroupname). |
standard.startDate |
Start date of job (when the datagroup was created). |
resultfields
The resultfields string indicates a
subset or summary of the query result fields that should be
returned. The default, *, returns all fields. A comma
separated list indicates particular fields that should be returned.
This list can be prefixed with the 'distinct' keyword to remove
duplicates from the results.
Alternatively the resultfields string contains an aggregate
function (count, max,
min, sum,
avg) applied to one field, e.g.
'count(b)'. These aggregate
functions return summary data from all the results matching a query
as follows:
count |
Total number of results. 'count(*)' counts all results 'count(b)' counts only results containing the field b. 'count(distinct b)' counts only results containing unique values of b. |
max |
Maximum value for a field. 'max(b)' or 'max(num(b))' treats all values as numbers. 'max(str(b))' treats all values as strings. |
min |
Minimum value for a field. 'min(b)' or 'min(num(b))' treats all values as numbers. 'min(str(b))' treats all values as strings. |
sum |
All values of a particular field added together. |
avg |
Average of all the values of a particular field. |
By default an aggregate function is applied to a particular field
over all the results. For example, 'avg(b)'
returns a single result containing a number, the average of all
b field values across the result
set. It is also possible to apply an aggregate function to groups
of values within the result set using the groupby keyword. For
example, 'avg(b) groupby standard.format'
returns the average value of b for each different file
format.
Query file metadata to find files archived on or after 1st September 2004 where iterations = 9000. A datasource argument is not required because ‘file’ is the default.
q = 'standard.archiveDate>=2004-09-01 & iterations=9000';
qresults = gd_query(q)
qresults =
[1x1 struct] [1x1 struct]
disp(qresults{1});
standard: [1x1 struct]
model: [1x1 struct]
params: [1 4.7000 5.3000]
iterations: 9000
disp(qresults{1}.standard.archiveDate);
2004-09-03 15:25:45
See gd_display for an example of displaying the full contents of query results.
Query to find files which have a name field equal to ‘test_design’ in their metadata and only return the fields standard.ID and params, removing any duplicates.
q = '*.name = test_design';
qresults = gd_query(q,'file','distinct standard.ID, params');
disp(qresults{1})
standard: [1x1 struct]
params: [1 4.7000 5.3000]
Query to find datagroups with comments containing the text ‘experiment’.
q = 'standard.comment like %experiment%';
gd_query(q,'datagroup');
Query variable metadata to find the metadata for all variables that are in a particular datagroup.
q = 'standard.datagroups.datagroupID = dg_ce868f40-8ds0-45...';
gd_query(q,'varmeta');
Query variables to find structures where field width is between 9 and 14 inclusive.
gd_query('width >= 9 & width <= 14','var');
Find files that have a comment in their metadata, using "" (two
double quotes) to indicate an empty value.
gd_query('standard.comment != ""');
Find the 10 most recent files archived by user ‘bob’. The default sort order is ascending so keyword ‘desc’ is used to list the dates in descending order.
q = 'standard.userID = bob';
gd_query(q,'file','*','orderby standard.archiveDate desc',10);
Find all the files in a particular datagroup and order the results first by iterations then by model.name. For custom metadata you must specify whether to sort numerically (num) or alphabetically (str).
q = 'standard.datagroups.datagroupID = dg_ce868f40-8ds0-45...';
gd_query(q,'file','*',...
'orderby num(iterations), str(model.name)');
Find the total number of variables archived by user ‘bob’.
qresults = gd_query('standard.userID = bob','var','count(*)');
gd_display(qresults{1})
*** Content of the structure ***
standard.count: 150
Find the maximum value for iterations in file metadata archived by user ‘bob’.
q = 'standard.userID = bob';
qresults = gd_query(q,'file','max(iterations)');
gd_display(qresults{1})
*** Content of the structure ***
standard.max: 12000
Find the maximum value for model.name in file metadata archived by user ‘bob’. Find the maximum alphabetically (str) rather than numerically (default).
gd_query(q,'file','max(str(model.name))');
Count the number of files of each different format that user bob has archived.
r = gd_query(q,'file','count(*) groupby standard.format');
gd_display(r)
*** Content of structure r{1} (Total structures: 2) ***
standard.format: dat
standard.count: 73
Press ENTER to continue ..., q to quit:
*** Content of structure r{2} (Total structures: 2) ***
standard.format: txt
standard.count: 22
*** No more results. ***
Find the latest job monitoring datagroup then find the latest job monitoring datagroup which matches some other criteria.
m.modelver = 0.6; m2.modelver = 0.71;
gd_datagroup('design model job xyz',m,'monitor');
gd_datagroup('design model job abc',m,'monitor');
gd_datagroup('design model job 999',m2,'monitor');
r1 = gd_query('standard.jobIndex = max','monitor');
r1{1}.standard.jobName
ans =
design model job 999
r2 = gd_query('standard.jobIndex = max & modelver <= 0.6', 'monitor');
r2{1}.standard.jobName
ans =
design model job abc
When querying standard date information (archiveDate or createDate), specify the date/time using the International Standard Date and Time Notation (ISO 8601) which is: "YYYY-MM-DD hh:mm:ss" (hh:mm:ss is optional).
The sum and avg aggregate functions will only work on fields containing numerical data. Non-numerical custom metadata is ignored in the calculation and non-numerical standard metadata (e.g. standard.comment) throws an error.
Treating numerical data as strings when using orderby, max or min can lead to unexpected results. This can be illustrated in Matlab where sort({‘1’,’5’,’10’}) returns '1' '10' '5'. Always use the numerical syntax, e.g. orderby(num(a)) or max(a), when working with numbers in custom metadata.
In the current release orderby, groupby, distinct and the aggregate functions cannot be used on standard metadata about datagroup relationships (standard.datagroups, standard.subdatagroups, standard.files and standard.vars).
Only results for data you are authorised to access will be returned. Function gd_addusers can be used to grant access to others.
A valid proxy certificate is required to query the database (see gd_createproxy from the Geodise Compute Toolbox).
Your certificate subject must have been added to the authorisation database.
gd_display, gd_createproxy, gd_archive, gd_retrieve, gd_archivefiles, gd_retrievefiles, gd_datagroup, gd_datagroupadd, gd_addusers
Copyright © 2007, The Geodise Project, University of Southampton