Synopsis

This page gives an overview of the file systems of the Snellius supercomputer and details the various types system services available to end-users.

Overview of the filesystems

There are several filesystems available on Snellius:

File systemQuota (space)Quota (files)SpeedShared between nodesMount pointExpirationBackupNotes
Home200 GiB1.000.000Moderate,  NormalYes/home/<username>15 weeks after project expirationNightly incremental
Scratch-local8 TiB
(counted over all scratch-* space used)
3.000.000 (soft limit)FastNo/scratch-local/<username>Files older than 6 days are removed automaticallyNo backup

Scratch-shared8 TiB
(counted over all scratch-* space used)
3.000.000 (soft limit)FastYes/scratch-shared/<username>Files older than 14 days are removed automaticallyNo backup

Scratch-nodenonenoneVery fastNo/scratch-node/<user-specific>

When the job ends the file system is deleted.

You cannot retrieve any data from this file system after your job ends.

No backup

All fcn nodes, a subset of tcn nodes, and a subset of gcn nodes have this file system. These nodes can be requested with
#SBATCH --constraint=scratch-node

Environment variable TMPDIR  will point to a user-specific temporary directory in /scratch-node in this case.

See below for more details.

Projectbased on requestdependent on size of project space (see below)FastYes

/project/<project_name>

Project durationNo backup

Archive Servicebased on requestbased on requestVery slowYes/archive/<username>Project durationNightly

The home file system

Every user has their own home directory, which is accessible at /home/<login_name> .

Your home directory has default capacity quota of 200 GiB. The default i-node quota is 1,000,000.

The 200 GiB home directory is ample space for a work environment on the system for most users. If you think that it is not sufficient to accommodate your work environment on Snellius, or you need any of the features commented below, you can request extra storage space (project space) through your institution using this approval form. (users with an EINF or NWO grant can contact the servicedesk directly). Logins are per person and per project, and each login has its own home directory. Think of your home directory as the basis for arranging the work environment for your current computational project on Snellius. Note, however, that home directories are not intended for long term storage of large data sets. SURF provides the archive facility for that. Home directories are neither suitable for fast, large scale or parallel I/O. Use scratch and/or project space (see below) for fast and voluminous job I/O.

Home directory overnight backup service

SURF provides a versioned incremental backup service for your home directory that is run overnight. Files that have been backed up are retained in the backup repository for three weeks after they have been deleted from the file system. We can restore files and/or directories when you accidentally remove them or overwrite them with the wrong contents, or when they are corrupted because of some storage hardware failure – provided, of course, that a version already existed and was successfully backed up. Note that no consistent backup can be created of files that are removed, being changed, truncated, or appended to, while the backup process is running. The backup process will therefore simply skip files that are opened and in use by other processes.

To have a file successfully backed up:

  1. the file must reside on the file system when the backup runs
  2. the file must be closed
  3. The total length of the file's canonical pathname ( i.e.: the path name that starts from the file system mount point: '/gpfs/home<#>/<username>/...' ) must NOT exceed 4095 bytes. The file system will have no problem with that. Such files and directories will be created without any problem. However, the backup client software and its target storage cannot deal with such long pathnames, so, they will never ever be backed up!

Since the backup is an asynchronous system service process, end-users will not get any direct warnings when any of the above criteria is not met. As to the first one, it is hard to imagine how a backup routine could give sensible warnings about files that are non-existent, no longer present. Pertaining to the second one, the backup logs messages about files that are skipped because they are not at rest, but with 24/7 production in the batch and some people living in different time zones working interactively when the backup happens to run, trying to get these message to the respective file owners, would be a high volume of messaging, with very limited usefulness. Fortunately, the last crititerion is violated ralely, because almost 4 KiB is actually very long, even for a pathname that contains many multi-byte character like 'ä', 'ς', or 'ﻕ'.  However, if we repeatedly and persistently notice canonical pathnames that are too long for the backup to handle, we will notify the user that is responsible for creating such pathnames.    

Be very restrictive in allowing other logins to access your home directory

Technically, your login "owns" your home directory. This implies that you can change its access permissions. On a semi-public system, like Snellius, many people of completely unrelated affiliations have logins to access the system interactively, you should be very restrictive in using this capability. We understand that there are use cases in which you might want to share some data, and perhaps some executable programs, with specific other logins. Maybe you want to arrange this for logins that you yourself have personal control of, but that belong to a different project / account that also happens to be currently active on Snellius.  Use access control lists  (ACLs) for this purpose. More specifically:

  • You should never - not even by means of an ACL - give other logins write permission to your home directory (or to subdirectory thereof).
  • You should never give any permission to any unqualified 'other' to the root of your home directory. Use ACLs to enable specific groups and or users to read, and / or search and execute.

Since you are the directory owner, Snellius system administrators have no technical means by which they proactively can enforce these rules. That is: they cannot possibly prevent you from enabling permissions that are highly deficient from a data integrity point of view. Remember, also according to the usage agreement that you signed for your login, you are accountable for proper usage of the resources that you are handed to you. This is one of those aspects that are your own responsibility. System administrators can however revert, disable, unwanted permissions, and correct the undesirable consequences they may have had, after the fact, when they have detected them in filesystem meta-data analyses (analyses of quota, resource usage, etc.). There is a policy to take corrective actions to change, undo deficient permissions when they are detected, without prior notification of the registered home directory owner.

if write permissions have been enabled on your home directory, an additional consequence, that is not simply repaired by disabling permissions, could be, that your home directory, or a sub-directory thereof,  now contains some files and / or directories that are not owned by your login. In worst cases, you cannot even inspect nor remove them. Corrective action - changing ownership  - will ensue without prior notice to the registered home directory owner.

The scratch file systems

The scratch file systems are intended as fast temporary storage that can be used while running a job, and can be accessed by all users with a valid account on the system. 

There are several different types of scratch available on Snellius, as listed in the table above. Below, we describe them in detail, including any active quota and backup policies.

The /scratch-shared and /scratch-local spaces

Expiration policy

Scratch automatic cleanup and lack of backup

For the scratch-local and scratch-shared spaces there is an automated expiration policy of 6/14 days. Files and directories that are older, i.e. that have not changed their contents for this duration, are removed automatically.

There is no guarantee however that files are actually retained for at least 6/14 days. Serious hardware failure, for example, could cause loss of files that have not reached that age.

SURF provides no backup service on scratch space. Job end results, or any other precious job output that you want to keep, must be copied in time to your home directory, to the SURF archive facility, or to an off-site storage facility of your choice.

A user's default scratch space capacity quota is 8 TiB, which is counted over all data usage of scratch-local and scratch-shared of the user.

Quota

The i-node quota (number of files and directories per user) is set at a soft limit of 3 million files per user, and a hard limit that is set substantially higher. Most of our users will never hit the soft limit ceiling, as there is a programmed cleanup of files that are older than 6 days (on scratch-local) or 14 days (on scratch-shared). Users that produce enormous amounts of files per job may have to clean up files and directories themselves after the job, as they could reach their quota before the automatic cleanup is invoked.

If the soft limit is reached, a grace period of 7 days starts counting down. If you clean-up within the grace period, and do not grow to reach the hard limit, you will not notice anything of the limit. If the hard limit is reached or if you fail to clean up to get a usage below the soft limit in due time, the file system refuses to create new files and directories for you.

Access

Shared scratch space can be accessed on all nodes from two locations:

  • /scratch-local/  
  • /scratch-shared/

/scratch-local/ specifies a unique location on each node (and so acts like it is local), whereas /scratch-shared/ denotes the same location on every node:

# Different content for /scratch-local/paulm, depending on the node

snellius paulm@int1 14:26 ~$ ls -l /scratch-local/paulm
total 0
-rw-rw---- 1 paulm paulm 0 Mar  3 14:26 hello.txt 

snellius paulm@int3 14:26 ~$ ls -l /scratch-local/paulm/
total 0
 
# Same content for /scratch-shared/paulm

snellius paulm@int1 14:26 ~$ ls -l /scratch-shared/paulm/
total 4
drwxr-sr-x 2 paulm paulm 4096 Feb 17 22:16 Blender

snellius paulm@int3 14:26 ~$ ls -l /scratch-shared/paulm
total 4
drwxr-sr-x 2 paulm paulm 4096 Feb 17 22:16 Blender

So you can use /scratch-local for each process in a job to get a guaranteed unique location for storing/retrieving data that does not interfere with other processes in the same job. In fact, the $TMPDIR environment variable is set to a default value of /scratch-local/<loginname> and the corresponding directory is already created, or is created when you log in, or a batch job is started.

The /scratch-shared/  directory behaves like scratch space that is shared by all nodes. Please create your own subdirectory under /scratch-shared , e.g. with the command: 

$ mktemp -d -p /scratch-shared

The /scratch-local file system is not truly local

Note that the /scratch-local/ directories are not truly (physically) local to a node. All /scratch-local/ directories are in fact visible from all nodes (and by all users), if you know the canonicalized fully qualified directory names. This can be seen with:

$ readlink -f $TMPDIR 
/gpfs/scratch1/nodespecific/int1/<loginname>

In fact, all scratch-local and scratch-shared symbolic links are actually pointing to directories that store data on the same underlying GPFS file system, and they share the same single per-user quota regime, as mentioned above.

The /scratch-node space: truly node-local scratch

On a subset of nodes fast NVMe-based scratch file systems are available (all fcn nodes, some tcn and gcn nodes). Such node-local scratch spaces are faster than the shared scratch spaces, but as the name suggests, each node will have its own scratch file system that does not share data with other nodes. For certain use cases this restriction is no problem, though.

To use nodes with node-local scratch the SLURM constraint scratch-node needs to be used (e.g. #SBATCH --constraint=scratch-node). A user-specific partition will be created on each assigned node and mounted under /scratch-node. The environment variable $TMPDIR will point to the user-specific directory within /scratch-node that you should use in your job. Note that when requesting part of shared node you will also only get part of the local NVMe disk as well (either 25%, 50% or 75%, depending on the requested job resources). The $TMPDIR variable is only set to the location of node-local scratch if the constraint has been set. Otherwise the $TMPDIR variable is set to the node specific shared directory /scratch-local as described above, regardless if the node is equipped with a local disk or not. 

No quota are active on /scratch-node. No backup policy is active either.

/scratch-node/<userdir> deleted at end of job!

When a user's job is finished their respective file system mounted under /scratch-node  will be deleted. This means that data in /scratch-node can not be accessed anymore after a job has finished. You should therefore copy your data off of $TMPDIR to a more permanent location at the end of your job script.

Example of requesting and testing node-local scratch space
snellius paulm@int2 18:57 ~$ srun -p thin -t 0:10:00 --constraint=scratch-node --exclusive --pty /bin/bash 
srun: job 2036702 queued and waiting for resources
srun: job 2036702 has been allocated resources

# Our directory on the node-local scratch disk to use in the job
snellius paulm@tcn516 18:58 ~$ echo $TMPDIR
/scratch-node/paulm.2036702

snellius paulm@tcn516 18:58 ~$ ls -l /scratch-node/
total 0
drwxr-x--- 2 paulm paulm 6 Jan  9 18:58 paulm.2036702

snellius paulm@tcn516 18:58 ~$ ls -l /scratch-node/paulm.2036702/
total 0
 
snellius paulm@tcn516 18:58 ~$ df -kh /scratch-node/paulm.2036702/
Filesystem                               Size  Used Avail Use% Mounted on
/dev/mapper/vg_scratch-lv_paulm.2036702  5.9T   42G  5.8T   1% /scratch-node/paulm.2036702

# The user-specific directory is a separate file system, mounted under /scratch-node
snellius paulm@tcn516 18:58 ~$ mount | grep paulm
/dev/mapper/vg_scratch-lv_paulm.2036702 on /scratch-node/paulm.2036702 type xfs (rw,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota)

# Simple write-speed test
snellius paulm@tcn516 19:01 ~$ dd if=/dev/zero of=/scratch-node/paulm.2036702/test bs=1M count=10240
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB, 10 GiB) copied, 4.20136 s, 2.6 GB/s 

Node-local system directories such as /tmp, /var/tmp

Use of /tmp, /var/tmp, ...

Truly local directories, such as /tmp and /var/tmp, should be regarded as "off limits" for users. They are too small and too slow to be used for job outputs. Furthermore, they are needed by the operating system itself. They can be emptied without further notice at node reboot, at node re-install - in fact at several other occasions.

If you (accidentally) fill up /tmp  or /var/tmp  on a node, the operating system will experience problems. Ultimately your job (and on an interactive node you and other users as well) will experience problems, and our system administrators and/or your fellow users won't like you.

  • Use the scratch file systems instead.
  • In your job command files you can use $TMPDIR. This is a per job step unique directory in /scratch-local  (i.e. therefore also unique per node).
  • On the login node you can also use $TMPDIR


The project file system

The purpose of project space is to enable fast and bulky reading and/or writing of files by large and/or many jobs. A project space is not meant for long-term storage of data. No automatic backup of data on project spaces is provided. In some sense, project spaces can be seen as "user managed scratch". This implies that project users themselves must take care not to run into their quota limit and to backup and recover data when the project expires.

Practically speaking a project file system can be used when:

  1. you need additional storage space, but do not require a backup.
  2. you need to share files within a collaboration.

By default accounts on our systems are not provisioned with a project space. It can be requested when you apply for an account, or by contacting our service desk (depending on the type of account different conditions may apply, contact us to know if your account is eligible for a project space).

No backup for project spaces

Note that SURF provides no backup service on project spaces. If you have not arranged for a backup, and associated restore possibility, your data will be irrevocably lost in case serious damage is caused to your files or to the file system at large (e.g. by failing hardware or human error). SURF provides the archive facility for long-term data storage, but you may of course also use off-site storage of your choice. But it is your own responsibility to archive and to keep track of what you archived when and where.

End date and expiration

The project space itself has an agreed upon end date. But there is no expiration policy for the age of individual files and directories in your project space. Project users themselves must take care not to run into their quota limits, deleting and/or compacting and archiving data no longer needed.

When the agreed upon period of time of your Snellius project space expires, the project space will be made inaccessible. If no further notice from the project space users is received, the files and directories in your project space will eventually be deleted after a grace period of an additional four weeks.

All members of the group used for quota administration will receive a notification on their e-mail address registered in the SURF user administration, 30 days in advance of the expiration date. A second notification mail will be sent out the day after expiration.

In principle the lifetime of a project directory is not extended beyond the lifetime of an associated compute project, as project spaces for projects that cannot be active are wasting high-performance storage resources. In some cases, however, a follow-up project could make efficient use of the same data without first having to stage them from an archive into a new project space. This may be a valid reason for retaining a Snellius project space "in between projects". Demonstrating, before the grace period has ended, that the project proposal for a follow-up project and destined "heir" of the project space has actually been submitted, is mandatory. New limits and expiration dates will have to be established and motivated by the needs of the follow-up project.

Quota on size and number of files

The exact capacity is project dependent. The quota of maximum number of files is derived from the capacity quota: each project is allocated basic quota of 1 million files and on top of that a surplus that is a non-linear function of the capacity quota.

Y = 1,000,000 + 100,000 * sqrt(X) * log(X)

where X is the capacity of the project space in TiB and Y is the maximum number of files allowed on that project space.

For convenience, the table below contains some reference values for resulting number of files quota, and the resulting average file size. Note that for large project spaces the average file size must be larger than for smaller projects.

Capacity (TiB)Number of filesAvg. file size (MiB)
11,000,0001.05
51,359,8813.86
101,728,1416.07
503,766,21813.92
1005,605,17018.71
2008,492,95224.50
30010,879,24128.91

Project space quota are per group

Quota on project file systems are per group, rather than per user. Users of the project space must be member of the group used for quota administration for the project and they must write files and directories with this group ownership. In most cases this works correctly by default, but some commands that try to set group ownership (e.g. "rsync -a" or "cp -p") will fail without extra options. See the tutorial on using project space for sharing files, for more information.

For users involved in more than one data project it is theoretically possible to store data in multiple project directories using any quota group that they are member of quasi-randomly. This is unwanted behaviour: files and directories with a group ownership used for the quota administration of a particular data project must all be placed under their respective project root directory. Conversely, only subdirectories and files located belonging to the project should be placed under that directory. SURF will enforce these rules, if needed, with periodic corrective actions that change group ownership without further notice.

The archive file system

The Data Archive is not a traditional file system, nor is it specific to Snellius. It is an independent facility for long-term storage of data, which uses a tape storage backend. It is accessible from other systems as well. For more information, see this separate page about the archive

The archive file system Service is intended for long term storage of large amounts of data. Most of this data is (eventually) stored on tape, and therefore accessing it may take a while. For users of the data archive it is accessible from login and staging nodes at the path /archive/<username>.

The archive system is designed to only handle large files efficiently. If you want to archive many smaller files, please compress them first in a single tar file, before copying it to the archive. Never store a large amount of small files on the archive: they may be scattered across different tapes and it will put a large load on the archive to retrieve all those files if you need them at a later stage. See this section of the documentation for more information on using the archive appropriately.

other file systems at SURF - grid storage / dCache  

SURF grid storage / dCache is an independent storage element in SURF meant for data processing and data processing services, accessible from multiple computing services. Depending on the use case, SURF can provide advice on what service is most suitable for a given application. 

SURF grid storage / dCache is a large scalable remote storage system for processing huge volumes of data. It uses the dCache system for storing and retrieving huge amounts of data, distributed among a large number of server nodes. It consists of magnetic tape storage and disk storage and both are addressed by a common file system. There are several protocols and storage clients to interact with grid storage / dCache including Advanced dCache API (ADA).

You may use the grid storage if your data does not fit within the storage allocation on project space or if your application is I/O intensive.

For questions and requests about grid storage / dCache, please contact SURF servicedesk and consult SURF advisors.


Disk quota

You can check home directory quota, scratch quota, and project space quota using the myquota end-user tooling available on the system. These commands are installed in the directory "/gpfs/admin/hpc/usertools". 
For more information on how to enable these commands and how to use them, please read our Tutorial Checking disk usage.