Data Archive: Accessing from Snellius

When using Snellius it is possible to use the SURF Data Archive for preserving raw data and research output. The Data Archive is mounted on Snellius through an NFS mount and available on /archive. Users that have access to Snellius and the Data Archive can use this connection to efficiently move data between Snellius and the Data Archive.

On this page you will find information on how to use the Data Archive from Snellius. It covers basic movement of files from and to tape. But also, more advanced use-cases like packing and unpacking files and using the job queue to schedule data movement.

Access data on the Data Archive

The Data Archive Is mounted on Snellius through an NFS mount. To access your data simply go to your directory on the Data Archive:

cd /archive/<login>

You can use the standard Unix commands to work with your data. However, there are a couple of commands that can be used to work with archive data specifically. To learn more about these commands please refer to our documentation about the DMF commands.

The Data Archive is only mounted on the interact nodes and the staging nodes. From compute nodes it is not possible to reach /archive/<login>. To use archive data from Snellius the data needs to be copied to a local file system first.

Move data from Snellius to the Data Archive

To copy a file from one of the Snellius filesystems to the Data Archive you can simply move the file from any of the local file systems to the Data Archive:

cp <working-directory>/<your-file> /archive/<login>/

To move a group of files from a folder to the Data Archive:

cp <working-directory>/output* /archive/<login>/<working-directory>

Move data from the Data Archive to Snellius

To copy a file from the Data Archive to Snellius, you need to stage the file before moving it from the Data Archive to Snellius:

dmget /archive/<login>/<your-file>
cp /archive/<login>/<your-file> <working-directory>/

All data on the Data Archive needs to be staged before you can copy it. To stage your data use the dmget command. You can use a wildcard, *, to efficiently stage all data in a folder. You can use dmls to check if a file needs to be staged. For more information see the DMF commands.

To move a group of files from the Data Archive to Snellius, you need to stage all files before moving the data:

dmget /archive/<login>/<working-directory>/output*
cp /archive/<login>/<working-directory>/output*

Remove data from the Data Archive

To remove data from the Data Archive, you can use the regular remove command:

rm /archive/<login>/<your-file>

Transferring large amounts of data

On Snellius it is recommend using the staging partition to copy large data sets from or to the Data Archive. You can directly schedule transfers through srun. For example, to move a file from the data archive to scratch file system on Snellius:

srun -t 1:00:00 -p staging cp -r /archive/<login>/<your-file> /scratch-shared/<login>/

Make sure that the data is staged before running this job. You can also submit a slurm job to stage and copy the data:

jobScript.sh

#!/bin/bash
#SBATCH --ntasks=1
#SBATCH --partition=staging
#SBATCH –time=01:00:00          # estimated runtime

dmget /archive/<login>/<your-file>
cp /archive/<login>/<your-file> /scratch-shared/<login>/

It is also possible to chain your data staging jobs and compute jobs using job dependencies. When submitting the compute job, add a dependency on the staging job:

sbatch --dependency afterok:<staging-job-id> jobScript.sh

For more information please checkout the sbatch documentation.

Packing and unpacking data on Snellius

For efficient archiving of your research output you want to combine files that belong together in data packages that are 1 GB or larger., for further information on how to this see our documentation on Effective archive file management. To help you packing, and unpacking, your data on Snellius you can use dmftar, for more information see this page.

Space shortcuts

Page tree