General information

One of the most powerful features of iRODS that is embedded in iCE is the metadata management. With very simple steps and iCommands you can add as many metadata tags as you need to your data objects (files) and/or collections (directories). Then, with custom queries you would be able to locate and list your files based on one or more metadata tags and values, despite the fact that your files or folders may be in different locations or resources.

How to add metadata

To learn how to add metadata to your data objects we will go through an easy use case in which we have three different data objects (files) located in different collections (directories). In this use case we want to add a metadata tag to these files that will designate their type and a timestamp.

So, first let's check the existing metadata tags on these three files that are in different directories by executing the following:

$ icd /snow/home/icepocsurf/

$ imeta ls -d extract-dir/testFile1.txt
AVUs defined for dataObj /snow/home/icepocsurf/extract-dir/testFile1.txt:
attribute: irods::access_time
value: 1681302055
units: irods::storage_tiering::migration_scheduled

$ imeta ls -d extract-dir/testFile2.txt
AVUs defined for dataObj /snow/home/icepocsurf/extract-dir/testFile2.txt:
attribute: irods::access_time
value: 1681301112
units: irods::storage_tiering::migration_scheduled

$ imeta ls -d transform-dir/testFile3.txt
AVUs defined for dataObj /snow/home/icepocsurf/transform-dir/testFile3.txt:
attribute: irods::access_time
value: 1681302284
units: irods::storage_tiering::migration_scheduled

Here, the object type descriptor is -d to specify that we're working with files. Other options are -C for collections, -R for resources and -u for users.

As you can see, all of the files already contain system metadata added automatically by iCE.

Each metadata tag consists of three parts. The attribute, the value, and the units.

To add new metadata tags to the three files we will execute the following iCommand:

$ imeta add -d extract-dir/testFile1.txt raw-data 01-05-2023 time
$ imeta add -d extract-dir/testFile2.txt raw-data 01-05-2023 time
$ imeta add -d transform-dir/testFile3.txt raw-data 01-05-2023 time

In these commands we follow the same syntax. After the "imeta add -d" we define the data object (file or folder) and then the attribute name (raw-data), the value (01-05-2023), and the units (time).

For more detailed information regarding iRODS imeta command please check the official iRODS documentation.

Now that we have added our metadata tags we can verify our updates by executing the following:

$ imeta ls -d extract-dir/testFile1.txt
AVUs defined for dataObj /snow/home/icepocsurf/extract-dir/testFile1.txt:
attribute: irods::access_time
value: 1681302055
units: irods::storage_tiering::migration_scheduled
----
attribute: raw-data
value: 01-05-2023
units: time

How to query metadata

Data objects can be queried based on their metadata using the iCommand iquest. The following simple example requests the collection name and data object name of all data objects with a 'raw-data' metadata tag:

$ iquest "SELECT COLL_NAME, DATA_NAME WHERE META_DATA_ATTR_NAME = 'raw-data' "
COLL_NAME = /snow/home/icepocsurf/extract-dir
DATA_NAME = testFile1.txt
------------------------------------------------------------
COLL_NAME = /snow/home/icepocsurf/extract-dir
DATA_NAME = testFile2.txt
------------------------------------------------------------
COLL_NAME = /snow/home/icepocsurf/extract-dir
DATA_NAME = testFile4.txt
------------------------------------------------------------
COLL_NAME = /snow/home/icepocsurf/transform-dir
DATA_NAME = testFile3.txt 
------------------------------------------------------------

To further refine the query we can specify the value of the 'raw-data' metadata tag:

$ iquest "SELECT COLL_NAME, DATA_NAME WHERE META_DATA_ATTR_NAME = 'raw-data' AND META_DATA_ATTR_VALUE = '02-05-2023'"
COLL_NAME = /snow/home/icepocsurf/extract-dir
DATA_NAME = testFile4.txt
------------------------------------------------------------
  • No labels