Yoda with iBridges hands-on

This tutorial allows you to exercise the typical Research Data Management flows that iBridges supports.

If you need help configuring iBridges for your local Yoda instance, please refer to these instructions.

iBridges is a piece of software that runs as a graphical user interface for an iRODS instance (i.e.: it is an iRODS client). The typical situation that you may encounter is that an IT department will offer an iRODS or Yoda instance for your institution, which you can access through a choice of interfaces, including iBridges and iCommands.

For this tutorial, we will assume that your institute is offering Yoda. Yoda is built in such a way that it governs an underlying iRODS server, offering its own web interface to steer its predefined RDM flow. With iBridges, you can work on that same iRODS server, talking in iRODS terms (i.e.: the iRODS protocol) to that server. We will use the corresponding terminology.

As a brief reminder, data objects are the combination of a file (dataset) and associated metadata. In terms of best RDM practices, that metadata belongs to the dataset; it is an intrinsic part of it. Data objects are grouped together into collections, which may be thought of as folders and can also have associated metadata. Metadata consists of information pieces that accompany the dataset or collection itself, in order not only to describe it but also to allow finding it by means of searching through those bits of information. The main idea behind this cooperating around data is that scientists can gather data and share it with others, each working at their own pace. Thus, years after the dataset has been put together, somebody else can find and reuse it in novel research projects.

All this interaction is possible with iBridges. In this tutorial, we will be showcasing and exercising these flows as though you are a scientist or a data steward.

Table of Contents

1. Finding and reusing existing data

This exercise will teach you how to search for and download datasets using the iBridges interface.

In this scenario, we are going to pretend that you are a researcher who knows that a dataset exists somewhere in the iRODS server of your institute. You want to use this dataset, but you only have limited information about it - that it involves some flights performance statistics. That is precisely what you need for your research! Let us find it now.

1.1 Searching for a dataset

In iBridges, you can search for datasets that have been uploaded to your iRODS instance by you or by others within your groups. You can search by one or more of:

object name
collection name
metadata (key-value pairs)
checksum

Please note as you follow these steps that the search fields are case sensitive.

a) Search by object name

From the program bar at the top of your screen, select 'Options' and 'Search'
Enter the full name of the item you are searching for under 'Object name', or use the wildcard '%' to complete the start, middle or end of the string (or both) as required, e.g.:
- flights.png
- %flights.png
- %flights%
Hit "Enter" (or click "Search") to run the search

The result is probably going to disappoint you: you will not find anything, because the desired item has a different name. Let us try a different search method.

b) Search by collection

Let's see if the data you are after is in a collection containing the word 'flights'.

In the Search box, enter '%flights%' (without the quotes) in the 'Collection name' field.
Hit "Enter" (or click "Search") to run the search

The result is probably going to disappoint you this time as well: you will not find anything. Let us try yet a different search method.

c) Search by metadata

In the Search box, enter 'Tag' (without the quotes) in the first 'Key' field.
In the corresponding 'Value' field, enter a word that you think is reasonable for the limited information you have about the dataset, such as: "flights"
Hit "Enter" (or click "Search") to run the search

Voilà! You should now have at least one result. But do you know whether it is the right one? Let's see how we can inspect the result.

1.2 Viewing your search results

From the list of results of your search, click on the one that you want to work with
At the bottom of the search box, choose 'Select and Close' (Other options are available to download the item(s) or to quit without action). This will bring the "Browser" screen in iBridges to the object or collection you selected.
Go to the "Browser" tab in iBridges. Select the item and choose the 'Metadata' tab to view the associated metadata.
Looking through the information shown about this file in the browser, can you answer now some of the questions to the right of this hand-out? For example: can you now explain why you were not able to find the data set when searching by name or by folder, but you were when you searched by metadata?

We will assume further that you have a found a dataset that contains a .csv file about flights. If you are unsure, please check with the facilitators now.

1.3 Bringing your search results to the research area

As you are looking now at the result you found, you can see that the path is contains the /.../vault-.../... word. This means that it is safely frozen for colleagues to cooperate around this dataset. Remember that, in Yoda, when data is in the Vault, you should always bring it to the Research area, before you can operate with it.

Unfortunately, iBridges does not support renaming or moving collections or objects within the same server yet. So, please, use the Yoda web portal now to bring the collection from the Vault to Research area, as you learned before. After that, come back to iBridges.

1.4 Downloading data

Now that you have found data in the Vault and brought it to the Research area, you can work normally with it. Following the scenario we are in, we will now assume that you want to process the data from the dataset on your laptop. This will require that you first download the data set. Let us do that with iBridges now.

In the "Data Transfer" screen from iBridges (click on this tab at the top of iBridges), on the "iRODS column" (i.e.: the one on the right of the screen), navigate to the Research area that you are going to be working on, and then to the folder that contains your dataset.
On the "Local" column (i.e.: left column), navigate to the folder in your laptop where you want to download the data to.
In the space between the two columns, find the button containing the arrow that points from right to left, and click it. This will bring a pop-up dialog, summarising what will be downloaded.
On the pop-up dialog, click on "Download" to start the transfer to your laptop.
Wait for the download to finish, and then click the "Close" button.

1.5 Processing data

You have downloaded a data set to your laptop. In order to process it you will have to run a program on it.

The data set we have prepared includes a "read-me" file. Scan quickly through it to understand what data is there. Since this dataset consists of a .csv file, you can use your favourite spreadsheet program to derive a subset and then filter and calculate something.

We propose to you a sample exercise you can carry out now: create a new file that contains the subset of the original data that contains only flights departing from the state of New York. Save the file in your laptop.

⟡ ⟡ ⟡

You have now completed this section and should understand the flow to find and reuse data. Feel free to move on to the next exercise at your own pace, but make sure you have answered the questions on the right of this page.

Flow

Questions to answer throughout this section:

What is the folder name containing data?
What is the file name containing the actual data?
When was the file created?
Who prepared the data package? What is their affiliation?
Which location tags have been given to the data set?

Food for thought:

What is the name of the root folder of the dataset?
How come you were able to find this result when searching by metadata, and not by object name or collection name?
Having answered the questions from this section, how would you now make a simple search that would show the same result you want, but using the file's name? And how about using the collection's name?
How would you replicate now all these searches, but using icommands?

2. Importing and managing data

In this scenario we are going to pretend that you are a researcher who has been collecting data, and wants to store it in iRODS through the iBridges interface. You will be assuming the role of a seasoned data practitioner who will complete the process without intervention from a data steward.

By the end of the exercise you will know how to import data, creating and editing the metadata as required.

You can work with any dataset you may have already on your laptop. For example, you can continue working with the subset of flights you created in the previous section. Alternatively, you can download something from the Internet to your laptop. For example, you can use the data portal from the Gemeente Amsterdam to search for data that may appeal to you: https://data.amsterdam.nl/datasets/ (Please verify the dataset license before you use it!)

Since Yoda manages its metadata in a certain way (i.e.: in a .json file and in iRODS), you would NOT want to disrupt what Yoda does. For a smooth flow during the following exercises, we would like you to work with a dataset that consists only of a single file.

2.1 Preparing a working place in your project area

The default page in iBridges is the 'Browser' tab showing the contents of your home directory, which is listed under 'iRODS path'.

We will now pretend that you are working in a project of your own. You will therefore need to create a collection for that project. Here are the steps you need to follow:

From your home directory, navigate to the training-area collection by double clicking on the name. If you make a mistake and enter the wrong space, use the blue return arrow next to the iRODS path to return to your prior location (or the orange home button to return to your home directory).
Click on the Create Collection button.
Give this new collection a name with the format "Project Z", where Z should be something that you will be happy to work with, such as: "Project Peter" or "Project Flamingos". Please, remember what you choose, because the other course attendants will be creating their own collections here too.

2.2 Adding metadata to your new collection

The dataset's metadata is crucial when you are working within RDM best practices. It will ensure that your dataset is reusable in the future. So you can best start with it, even before the data exists in iRODS. Let us tackle that right now by adding metadata at the collection level.

When using Yoda, you will want to keep Yoda's functionality and flows working. Because of the way Yoda handles metadata, you should only modify metadata on folders managed by Yoda (i.e.: anything within a Research are or a Vault area), through Yoda's web portal.

Therefore, please, go to the web portal, and add metadata to the folder as you learned before. Then, come back to iBridges.

2.3 Uploading data

Now that you have some metadata defined for your project collection, you are going to import the dataset you just found into the project folder that you created under the collection training-area a few steps ago. Remember? You called it Project <something>. For this exercise and for simplicity's sake, it will be enough to upload one or two files no larger than a few megabytes as though they are a full dataset; adding more would be overkill today.

Return to the view of your project folder in the training-area collection
Click on 'File Upload'
Locate your file, click 'Open' and confirm that you would like to upload the item to your collection

2.4 Adding metadata to data objects

In iRODS (and therefore, also, in iBridges), you can add or modify metadata at the level of the collection and for individual data objects.

Each metadata tag is composed of:

key (mandatory)
value (mandatory)
unit (optional)

(In iRODS terminology, these are referred to as AVU triplets. This stands for Attribute-Value-Unit.)

You have now imported one or more files that you can use for your research into your project folder. Think about what sort of metadata should be stored at the level of the individual data files.

In iBridges, on the Browser screen, select one of the files that you have uploaded and click on the 'Metadata' tab
Add the metadata items you find necessary with by entering the 'Key', 'Value' and (optionally) the 'Unit' in the respective fields and using the 'Add' button. As an example, if you were using the subset of the flights dataset that you created in last section, you could add a triplet like: Key="StartState", Value="NY", Units="".
If you are working through these exercises with colleagues, now would be a good moment to ask them to verify your metadata at both the collection and the file levels and engage in a little discussion to see if you agree on what you have written

2.5 Changing permissions

Since Yoda manages its permissions in a certain way (i.e.: in terms of groups, projects, etc.), to facilitate its flows, you would NOT want to disrupt what Yoda does. For a smooth flow for all colleagues, do not change any permission for the research or vault areas. You can change permissions in folders or files that you create.

In iRODS, data access permissions are managed through groups. Are you curious about which groups you belong to? At the top of the iBridges window, switch to the 'Info' tab to view your username, groups and server information for the iRODS instance you are connected to.

Now your project folder is complete with a dataset and high quality metadata. Let us now ensure that other intended users will have the right level of access to these files.

In the training-areacollection, click to select your project folder (i.e.: something like "Project Z", or "Project Peter" or "Project Flamingos", as you created it before)
Navigate to the 'Permissions' tab
In the table you can see which users and groups have access rights to the data objects in that collection
Choose one of your colleague's user name, and change their 'Access' level to 'read'. Make sure you click 'Add/Update' to effect the change!.

⟡ ⟡ ⟡

Well done! You have now completed an RDM workflow, taking good care over the findability and accessibility of your dataset.

Flow

Food for thought:

For the iCommands users, consider how these workflows would be tackled in the command line. E.g., uploading and downloading files can be done with the 'iput' and 'iget' commands.

❦ Epilogue

Well done for having completed this walk-through!

If you need help configuring iBridges for your local Yoda instance, please refer to these instructions.

For next steps with regards to RDM, iRODS, or other research services, you can:

visit our documentation pages:
- Yoda: Yoda Hosting
- iRODS: iRODS
visit our research services web page: https://www.surf.nl/en/research-it
or contact our service desk through the Service Desk Portal

Thank you for your attention, and we hope to have been of help for you today.

Complementary information:

iBridges is open source, and you can view the code and advanced documentation in the GitHub page: https://github.com/chStaiger/iBridges-Gui

Space shortcuts

Page tree

1. Finding and reusing existing data

1.1 Searching for a dataset

a) Search by object name

b) Search by collection

c) Search by metadata

1.2 Viewing your search results

1.3 Bringing your search results to the research area

1.4 Downloading data

1.5 Processing data

Questions to answer throughout this section:

Food for thought:

2. Importing and managing data

2.1 Preparing a working place in your project area

2.2 Adding metadata to your new collection

2.3 Uploading data

2.4 Adding metadata to data objects

2.5 Changing permissions

Food for thought:

❦ Epilogue

Complementary information: