Synopsis

This page gives an overview of the Snellius supercomputer and details the various types of file systems, nodes, and system services available to end-users.

System overview

Snellius is the Dutch national supercomputer. Snellius is a general purpose capability system and is designed to be a well balanced system. Snellius is the machine of choice if you need one or more of:

many cores,
large symmetric multi-processing nodes,
high memory,
a fast interconnect,
a lot of work space on disk,
or a fast I/O subsystem.

Node types

Snellius is planned to be built in three consecutive expansion phases. All phases are planned to be in operation until end of life of the machine. Since Snellius will grow in phases, it will become increasingly heterogeneous when phase 2 and phase 3 will be operational. In order to maintain a clear reference to node flavours i.e. int, tcn, gcn, we will introduce a node type acronym. This will account for the node flavour along with which phase the node was implemented in (PH1, PH2, PH3). A thin CPU-only node that was implemented in phase 1 will follow the Node Type Acronym PH1.tcn.

The set of Snellius node available to end-users comprises three interactive nodes and a large number of batch nodes, or "worker nodes". We distinguish the following different node flavours:

CPU-only interactive nodes (int),
CPU-only "thin" compute nodes (tcn), some of which have truly node-local NVMe based scratch space,
GPU-enhanced compute nodes with NVIDIA GPUs (gcn), some of which have truly node-local NVMe based scratch space,
CPU-only "fat" compute nodes (fcn) which have more memory than the default worker nodes as well as truly node-local NVMe based scratch space,
CPU-only high-memory compute nodes (hcn) with even more memory than fat nodes,
CPU only not-for-computing "service" nodes (srv), that are primarily intended to facilitate the running of user-submitted jobs that automate data transfers into or out of the Snellius system.

Phase 1 (Q3 2021)

The table below, lists the available Snellius node types available in Phase 1.

# Nodes	Node Flavour	Node Type Acronym	Lenovo Node Type	CPU SKU	CPU Cores per Node	Accelerator(s)	DIMMs	Total memory per node (per core)	Other characteristics
3	int	PH1.int	ThinkSystem SR665	AMD EPYC 7F32 (2x) 8 Cores/Socket 3.7GHz 180W	16	not applicable	16 x 16GiB 3200MHz, DDR4	256 GiB (16 GiB)	Local storage (not user accessible): 6.4TB NVMe SSD Intel P5600 Network connectivity: 1xHDR100, 100GbE ConnectX-6 VPI Dual port 2x25GbE SFP28 Mellanox OCP
504	tcn	PH1.tcn	ThinkSystem SR645	AMD Rome 7H12 (2x) 64 Cores/Socket 2.6GHz 280W	128	not applicable	16 x 16GiB 3200MHz, DDR4	256 GiB (2 GiB)	Network connections: 1xHDR100 ConnectX-6 single port 2x25GbE SFP28 OCP
72	fcn	PH1.fcn	ThinkSystem SR645	AMD Rome 7H12 (2x) 64 Cores/Socket 2.6GHz 280W	128	not applicable	16 x 64GiB 3200MHz, DDR4	1 TiB (8 GiB)	Local scratch: 6.4TB NVMe SSD Intel P5600 See /scratch-node space below Network connectivity: 1xHDR100 ConnectX-6 single port 2x25GbE SFP28 OCP
2	hcn	PH1.hcn4T	ThinkSystem SR665	AMD Rome 7H12 (2x) 64 Cores/Socket 2.6GHz 280W	128	not applicable	32 x 128GiB 2666 MHz, DDR4	4 TiB (32 GiB)	Network connectivity: 1xHDR100 ConnectX-6 single port 2x25GbE SFP28 OCP
2	hcn	PH1.hcn8T	ThinkSystem SR665	AMD Rome 7H12 (2x) 64 Cores/Socket 2.6GHz 280W	128	not applicable	32 x 256GiB 2666 MHz, DDR4	8 TiB (64 GiB)	Network connectivity: 1xHDR100 ConnectX-6 single port 2x25GbE SFP28 OCP
36	gcn	PH1.gcn	ThinkSystem SD650-N v2	Intel Xeon Platinum 8360Y (2x) 36 Cores/Socket 2.4 GHz (Speed Select SKU) 250W	72	NVIDIA A100 (4x) 40 GiB HMB2 memory with 5 active memory stacks per GPU	16 x 32 GiB 3200 MHz, DDR4	512GiB 160GiB HMB2 (7.111 GiB)	Network connectivity: 2xHDR100 ConnectX-6 single port 2x25GbE SFP28 LOM 1x1GbE RJ45 LOM
7	srv	PH1.srv	ThinkSystem SR665	AMD EPYC 7F32 (2x) 8 Cores/Socket 3.7GHz 180W	16	n/a	16 x 16GiB 3200MHz, DDR5	256 GiB (16 GiB)	Local SSD scratch: 6.4TB NVMe SSD Intel P5600 Network connectivity: 1xHDR100, 100GbE ConnectX-6 VPI Dual port 2x25GbE SFP28 Mellanox OCP

Phase 1A + 1B + 1C (Q4 2022)

# Nodes

Node Flavour

Node Type Acronym

Lenovo Node Type

CPU SKU

CPU Cores per Node

Accelerator(s)

DIMMs

Total memory per node (per core)

Other characteristics

21

tcn

ThinkSystem SR645

AMD Rome 7H12 (2x)

64 Cores/Socket
2.6GHz
280W

128

not applicable

16 x 16GiB
3200MHz, DDR4

256 GiB
(2 GiB)

Local NVMe scratch:

6.4TB NVMe SSD Intel P5600
See /scratch-node space below

Network connectivity:

1xHDR100 ConnectX-6 single port
2x25GbE SFP28 OCP

36

gcn

ThinkSystem SD650-N v2

Intel Xeon Platinum 8360Y (2x)

36 Cores/Socket
2.4 GHz (Speed Select SKU)
250W

72

NVIDIA A100 (4x)

40 GiB HMB2 Memory with 5 active memory stacks per GPU

16 x 32GiB
3200 MHz, DDR4

512 GiB
160 GiB HMB2

(7.111 GiB)

Local NVMe scratch:

ThinkSystem PM983 2.5" 7mm 7.68TB
Read Intensive Entry NVMe PCIe 3.0 x4
Trayless SSD
See /scratch-node space below

Network connectivity:

2xHDR100 ConnectX-6 single port
2x25GbE SFP28 LOM
1x1GbE RJ45 LOM

Phase 2 (Q3 2023)

# Nodes

Node Flavour

Node Type Acronym

Lenovo Node Type

CPU SKU

CPU Cores per Node

Accelerator(s)

DIMMs

Total memory per node (per core)

Other characteristics

714

tcn

ThinkSystem SD665v3

AMD Genoa 9654 (2x)

96 Cores/Socket
2.4GHz
360W

192

not applicable

24 x 16GiB
4800MHz, DDR5

384 GiB
(2 GiB)

Network connectivity:

1xNDR ConnectX-7 single port (200Gbps within a rack, 100Gbps outside the rack)
2x25GbE SFP28 OCP

Phase 2A (LISA replacement, Q3 2023)

# Nodes

Node Flavour

Node Type Acronym

Lenovo Node Type

CPU SKU

CPU Cores per Node

Accelerator(s)

DIMMs

Total memory per node (per core)

Other characteristics

72

tcn

ThinkSystem SD665v3

AMD Genoa 9654 (2x)

96 Cores/Socket
2.4GHz
360W

192

not applicable

24 x 16GiB
4800MHz, DDR5

384 GiB
(2 GiB)

Local NVMe scratch:

6.4TB NVMe SSD
See /scratch-node space below

Network connectivity:

1xNDR ConnectX-7 single port (200Gbps within a rack, 100Gbps outside the rack)
2x25GbE SFP28 OCP

Phase 3 (estimated Q1 2024)

There are three options for this extension:

CPU thin nodes (same future generation AMD EPYC processors, aggregate: 2.4 PFLOP/s), or
GPU nodes (future generation NVIDIA GPUs, aggregate: 10.3 PFLOP/s), or
Storage (the amount still needs to be determined)

The choice will be made 1.5 years after the start of production of Phase 1 and will be based on actual usage and demand of the system.

When Phase 3 is complete Snellius will have a total performance (CPU+GPU) in the range 13.6 - 21.5 PFLOP/s.

File systems

There are several filesystems available on Snellius:

File system	Quota (space)	Quota (files)	Speed	Shared between nodes	Mount point	Expiration	Backup	Notes
Home	200 GiB	1.000.000	Moderate, Normal	Yes	/home/<username>	15 weeks after project expiration	Nightly incremental
Scratch-local	8 TiB (counted over all scratch-* space used)	3.000.000 (soft limit)	Fast	No	/scratch-local/<username>	Files older than 6 days are removed automatically	No backup
Scratch-shared	8 TiB (counted over all scratch-* space used)	3.000.000 (soft limit)	Fast	Yes	/scratch-shared/<username>	Files older than 14 days are removed automatically	No backup
Scratch-node	none	none	Very fast	No	/scratch-node/<user-specific>	When the job ends the file system is deleted. You cannot retrieve any data from this file system after your job ends.	No backup	All fcn nodes, a subset of tcn nodes, and a subset of gcn nodes have this file system. These nodes can be requested with `#SBATCH --constraint=scratch-node` Environment variable `TMPDIR` will point to a user-specific temporary directory in `/scratch-node` in this case. See below for more details.
Project	based on request	dependent on size of project space (see below)	Fast	Yes	/project/<project_name>	Project duration	No backup
Archive Service	based on request	based on request	Very slow	Yes	/archive/<username>	Project duration	Nightly

The home file system

Every user has their own home directory, which is accessible at /home/<login_name> .

Your home directory has default capacity quota of 200 GiB. The default i-node quota is 1,000,000.

The 200 GiB home directory is ample space for a work environment on the system for most users. But our helpdesk can be contacted if you think that it is not sufficient to accommodate your work environment on Snellius. Logins are per person and per project, and each login has its own home directory. Think of your home directory as the basis for arranging the work environment for your current computational project on Snellius. Note, however, that home directories are not intended for long term storage of large data sets. SURF provides the archive facility for that. Home directories are neither suitable for fast, large scale or parallel I/O. Use scratch and/or project space (see below) for fast and voluminous job I/O.

Home directory overnight backup service

SURF provides a versioned incremental backup service for your home directory that is run overnight. Files that have been backed up are retained in the backup repository for three weeks after they have been deleted from the file system. We can restore files and/or directories when you accidentally remove them or overwrite them with the wrong contents, or when they are corrupted because of some storage hardware failure – provided, of course, that a version already existed and was successfully backed up. Note that no consistent backup can be created of files that are removed, being changed, truncated, or appended to, while the backup process is running. The backup process will therefore simply skip files that are opened and in use by other processes.

To have a file successfully backed up:

the file must reside on the file system when the backup runs
the file must be closed
The total length of the file's canonical pathname ( i.e.: the path name that starts from the file system mount point: '/gpfs/home<#>/<username>/...' ) must NOT exceed 4095 bytes. The file system will have no problem with that. Such files and directories will be created without any problem. However, the backup client software and its target storage cannot deal with such long pathnames, so, they will never ever be backed up!

Since the backup is an asynchronous system service process, end-users will not get any direct warnings when any of the above criteria is not met. As to the first one, it is hard to imagine how a backup routine could give sensible warnings about files that are non-existent, no longer present. pertaining to the second one, the backup logs messages about files that are skipped because they are not at rest, but with 24/7 production in the batch and some people living in different time zones working interactively when the happens to run, trying to get these message to the respective file owners, would be a high volume of messaging, with very limited usefulness. Fortunately, the last crititerion is violated ralely, because almost 4 KiB is actually very long, even for a pathname that contains many multi-byte character like 'ä', 'ς', or 'ﻕ'. However, if we repeatedly and persistently notice canonical pathnames that are too long for the backup to handle under the home directory of a particular login, we will notify the user that is responsible for creating such pathnames.

Be very restrictive in allowing other logins to access your home directory

Technically, your login "owns" your home directory. This implies that you can change its access permissions. On a semi-public system, like Snellius, many people of completely unrelated affiliations have logins to access the system interactively, you should be very restrictive in using this capability. We understand that there are use cases in which you might want to share some data, and perhaps some executable programs, with specific other logins. Maybe you want to arrange this for logins that you yourself have personal control of, but that belong to a different project / account that also happens to be currently active on Snellius. Use access control lists (ACLs) for this purpose. More specifically:

You should never - not even by means of an ACL - give other logins write permission to your home directory (or to subdirectory thereof).
You should never give any permission to any unqualified 'other' to the root of your home directory. Use ACLs to enable specific groups and or users to read, and / or search and execute.

Since you are the directory owner, Snellius system administrators have no technical means by which they proactively can enforce these rules. That is: they cannot possibly prevent you from enabling permissions that are highly deficient from a data integrity point of view. Remember, also according to the usage agreement that you signed for your login, you are accountable for proper usage of the resources that you are handed to you. This is one of those aspects that are your own responsibility. System administrators can however revert, disable, unwanted permissions, and correct the undesirable consequences they may have had, after the fact, when they have detected them in filesystem meta-data analyses (analyses of quota, resource usage, etc.). There is a policy to take corrective actions to change, undo deficient permissions when they are detected, without prior notification of the registered home directory owner. If write permissions have been enabled on your home directory, an additional consequence, that is not simply repaired by disabling permissions, could be, that your home directory, or a sub-directory thereof, now contains some files and / or directories that are not owned by your login. In worst cases, you cannot even inspect nor remove them. Corrective action - changing ownership - will ensue without prior notice to the registered home directory owner.

The scratch file systems

The scratch file systems are intended as fast temporary storage that can be used while running a job, and can be accessed by all users with a valid account on the system.

There are several different types of scratch available on Snellius, as listed in the table above. Below, we describe them in detail, including any active quota and backup policies.

The /scratch-shared and /scratch-local spaces

Expiration policy

Scratch automatic cleanup and lack of backup

For the scratch-local and scratch-shared spaces there is an automated expiration policy of 6/14 days. Files and directories that are older, i.e. that have not changed their contents for this duration, are removed automatically.

There is no guarantee however that files are actually retained for at least 6/14 days. Serious hardware failure, for example, could cause loss of files that have not reached that age.

SURF provides no backup service on scratch space. Job end results, or any other precious job output that you want to keep, must be copied in time to your home directory, to the SURF archive facility, or to an off-site storage facility of your choice.

A user's default scratch space capacity quota is 8 TiB, which is counted over all data usage of scratch-local and scratch-shared of the user.

Quota

The i-node quota (number of files and directories per user) is set at a soft limit of 3 million files per user, and a hard limit that is set substantially higher. Most of our users will never hit the soft limit ceiling, as there is a programmed cleanup of files that are older than 6 days (on scratch-local) or 14 days (on scratch-shared). Users that produce enormous amounts of files per job may have to clean up files and directories themselves after the job, as they could reach their quota before the automatic cleanup is invoked.

If the soft limit is reached, a grace period of 7 days starts counting down. If you clean-up within the grace period, and do not grow to reach the hard limit, you will not notice anything of the limit. If the hard limit is reached or if you fail to clean up to get a usage below the soft limit in due time, the file system refuses to create new files and directories for you.

Access

Shared scratch space can be accessed on all nodes from two locations:

/scratch-local/
/scratch-shared/

/scratch-local/ specifies a unique location on each node (and so acts like it is local), whereas /scratch-shared/ denotes the same location on every node:

# Different content for /scratch-local/paulm, depending on the node

snellius paulm@int1 14:26 ~$ ls -l /scratch-local/paulm
total 0
-rw-rw---- 1 paulm paulm 0 Mar  3 14:26 hello.txt 

snellius paulm@int3 14:26 ~$ ls -l /scratch-local/paulm/
total 0
 
# Same content for /scratch-shared/paulm

snellius paulm@int1 14:26 ~$ ls -l /scratch-shared/paulm/
total 4
drwxr-sr-x 2 paulm paulm 4096 Feb 17 22:16 Blender

snellius paulm@int3 14:26 ~$ ls -l /scratch-shared/paulm
total 4
drwxr-sr-x 2 paulm paulm 4096 Feb 17 22:16 Blender

So you can use /scratch-local for each process in a job to get a guaranteed unique location for storing/retrieving data that does not interfere with other processes in the same job. In fact, the $TMPDIR environment variable is set to a default value of /scratch-local/<loginname> and the corresponding directory is already created, or is created when you log in, or a batch job is started.

The /scratch-shared/ directory behaves like scratch space that is shared by all nodes. Please create your own subdirectory under /scratch-shared , e.g. with the command:

$ mktemp -d -p /scratch-shared

The /scratch-local file system is not truly local

Note that the /scratch-local/ directories are not truly (physically) local to a node. All /scratch-local/ directories are in fact visible from all nodes (and by all users), if you know the canonicalized fully qualified directory names. This can be seen with:

$ readlink -f $TMPDIR 
/gpfs/scratch1/nodespecific/int1/<loginname>

In fact, all scratch-local and scratch-shared symbolic links are actually pointing to directories that store data on the same underlying GPFS file system, and they share the same single per-user quota regime, as mentioned above.

The /scratch-node space: truly node-local scratch

On a subset of nodes fast NVMe-based scratch file systems are available (all fcn nodes, some tcn and gcn nodes). Such node-local scratch spaces are faster than the shared scratch spaces, but as the name suggests, each node will have its own scratch file system that does not share data with other nodes. For certain use cases this restriction is no problem, though.

To use nodes with node-local scratch the SLURM constraint scratch-node needs to be used (e.g. #SBATCH --constraint=scratch-node). A user-specific partition will be created on each assigned node and mounted under /scratch-node. The environment variable $TMPDIR will point to the user-specific directory within /scratch-node that you should use in your job. Note that when requesting part of shared node you will also only get part of the local NVMe disk as well (either 25%, 50% or 75%, depending on the requested job resources).

No quota are active on /scratch-node. No backup policy is active either.

/scratch-node/<userdir> deleted at end of job!

When a user's job is finished their respective file system mounted under /scratch-node will be deleted. This means that data in /scratch-node can not be accessed anymore after a job has finished. You should therefore copy your data off of $TMPDIR to a more permanent location at the end of your job script.

Example of requesting and testing node-local scratch space

snellius paulm@int2 18:57 ~$ srun -p thin -t 0:10:00 --constraint=scratch-node --exclusive --pty /bin/bash 
srun: job 2036702 queued and waiting for resources
srun: job 2036702 has been allocated resources

# Our directory on the node-local scratch disk to use in the job
snellius paulm@tcn516 18:58 ~$ echo $TMPDIR
/scratch-node/paulm.2036702

snellius paulm@tcn516 18:58 ~$ ls -l /scratch-node/
total 0
drwxr-x--- 2 paulm paulm 6 Jan  9 18:58 paulm.2036702

snellius paulm@tcn516 18:58 ~$ ls -l /scratch-node/paulm.2036702/
total 0
 
snellius paulm@tcn516 18:58 ~$ df -kh /scratch-node/paulm.2036702/
Filesystem                               Size  Used Avail Use% Mounted on
/dev/mapper/vg_scratch-lv_paulm.2036702  5.9T   42G  5.8T   1% /scratch-node/paulm.2036702

# The user-specific directory is a separate file system, mounted under /scratch-node
snellius paulm@tcn516 18:58 ~$ mount | grep paulm
/dev/mapper/vg_scratch-lv_paulm.2036702 on /scratch-node/paulm.2036702 type xfs (rw,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota)

# Simple write-speed test
snellius paulm@tcn516 19:01 ~$ dd if=/dev/zero of=/scratch-node/paulm.2036702/test bs=1M count=10240
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB, 10 GiB) copied, 4.20136 s, 2.6 GB/s

Node-local system directories such as /tmp, /var/tmp

Use of /tmp, /var/tmp, ...

Truly local directories, such as /tmp and /var/tmp, should be regarded as "off limits" for users. They are too small and too slow to be used for job outputs. Furthermore, they are needed by the operating system itself. They can be emptied without further notice at node reboot, at node re-install - in fact at several other occasions.

If you (accidentally) fill up /tmp or /var/tmp on a node, the operating system will experience problems. Ultimately your job (and on an interactive node you and other users as well) will experience problems, and our system administrators and/or your fellow users won't like you.

Use the scratch file systems instead.
In your job command files you can use $TMPDIR. This is a per job step unique directory in /scratch-local (i.e. therefore also unique per node).
On the login node you can also use $TMPDIR

The project file system

The purpose of project space is to enable fast and bulky reading and/or writing of files by large and/or many jobs. A project space is not meant for long-term storage of data. No automatic backup of data on project spaces is provided. In some sense, project spaces can be seen as "user managed scratch". This implies that project users themselves must take care not to run into their quota limit and to backup and recover data when the project expires.

Practically speaking a project file system can be used when:

you need additional storage space, but do not require a backup.
you need to share files within a collaboration.

By default accounts on our systems are not provisioned with a project space. It can be requested when you apply for an account, or by contacting our service desk (depending on the type of account different conditions may apply, contact us to know if your account is eligible for a project space).

No backup for project spaces

Note that SURF provides no backup service on project spaces. If you have not arranged for a backup, and associated restore possibility, your data will be irrevocably lost in case serious damage is caused to your files or to the file system at large (e.g. by failing hardware or human error). SURF provides the archive facility for long-term data storage, but you may of course also use off-site storage of your choice. But it is your own responsibility to archive and to keep track of what you archived when and where.

End date and expiration

The project space itself has an agreed upon end date. But there is no expiration policy for the age of individual files and directories in your project space. Project users themselves must take care not to run into their quota limits, deleting and/or compacting and archiving data no longer needed.

When the agreed upon period of time of your Snellius project space expires, the project space will be made inaccessible. If no further notice from the project space users is received, the files and directories in your project space will eventually be deleted after a grace period of an additional four weeks.

All members of the group used for quota administration will receive a notification on their e-mail address registered in the SURF user administration, 30 days in advance of the expiration date. A second notification mail will be sent out the day after expiration.

In principle the lifetime of a project directory is not extended beyond the lifetime of an associated compute project, as project spaces for projects that cannot be active are wasting high-performance storage resources. In some cases, however, a follow-up project could make efficient use of the same data without first having to stage them from an archive into a new project space. This may be a valid reason for retaining a Snellius project space "in between projects". Demonstrating, before the grace period has ended, that the project proposal for a follow-up project and destined "heir" of the project space has actually been submitted, is mandatory. New limits and expiration dates will have to be established and motivated by the needs of the follow-up project.

Quota on size and number of files

The exact capacity is project dependent. The quota of maximum number of files is derived from the capacity quota: each project is allocated basic quota of 1 million files and on top of that a surplus that is a non-linear function of the capacity quota. The table below contains some reference values for resulting number of files quota, and the resulting average file size. Note that for large project spaces the average file size must be larger than for smaller projects.

Capacity (TiB)	Number of files	Avg. file size (MiB)
1	1,000,000	1.05
5	1,359,881	3.86
10	1,728,141	6.07
50	3,766,218	13.92
100	5,605,170	18.71
200	8,492,952	24.50
300	10,879,241	28.91

Project space quota are per group

Quota on project file systems are per group, rather than per user. Users of the project space must be member of the group used for quota administration for the project and they must write files and directories with this group ownership. In most cases this works correctly by default, but some commands that try to set group ownership (e.g. "rsync -a" or "cp -p") will fail without extra options. See the tutorial on using project space for sharing files, for more information.

For users involved in more than one data project it is theoretically possible to store data in multiple project directories using any quota group that they are member of quasi-randomly. This is unwanted behaviour: files and directories with a group ownership used for the quota administration of a particular data project must all be placed under their respective project root directory. Conversely, only subdirectories and files located belonging to the project should be placed under that directory. SURF will enforce these rules, if needed, with periodic corrective actions that change group ownership without further notice.

The archive file system

The Data Archive is not a traditional file system, nor is it specific to Snellius. It is an independent facility for long-term storage of data, which uses a tape storage backend. It is accessible from other systems as well. For more information, see this separate page about the archive

The archive file system Service is intended for long term storage of large amounts of data. Most of this data is (eventually) stored on tape, and therefore accessing it may take a while. For users of the data archive it is accessible from login and staging nodes at the path /archive/<username>.

The archive system is designed to only handle large files efficiently. If you want to archive many smaller files, please compress them first in a single tar file, before copying it to the archive. Never store a large amount of small files on the archive: they may be scattered across different tapes and it will put a large load on the archive to retrieve all those files if you need them at a later stage. See this section of the documentation for more information on using the archive appropriately.

Disk quota

You can check home directory quota, scratch quota, and project space quota using the myquota end-user tooling available on the system. These commands are installed in the directory "/gpfs/admin/hpc/usertools".
For more information on how to enable these commands and how to use them, please read our Tutorial myquota end-user tooling, Snellius/GPFS implementation.

Interconnect

All compute nodes on Snellius will use the same interconnect which is based on Infiniband HDR100 (100Gbps), fat tree topology.

Hostkey fingerprints

When you log in to a new system for the first time with the SSH protocol, the system returns a hostkey fingerprint to you:

The authenticity of host 'snellius.surf.nl (145.136.63.187)' can't be established.
ED25519 key fingerprint is SHA256:2Vy9858ldWu3Xjt1a58MbhD5CjLIh1LCb8n/up0izGw.
This key is not known by any other names
Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
Warning: Permanently added 'snellius.surf.nl' (ED25519) to the list of known hosts.

Before you type "yes" to the question posed to you, you can verify this fingerprint against the list of correct fingerprints for Snellius below, to check that you are indeed logged to the correct system.

ED25519
===
SHA256:2Vy9858ldWu3Xjt1a58MbhD5CjLIh1LCb8n/up0izGw
MD5:22:2d:8c:fa:ca:24:a8:de:6d:08:c2:ad:a2:34:19:61

ECDSA
===
SHA256:BWIyocmUn0wm9gkNhc9CG5MPEQcHFCHxtyPtmkVMbak
MD5:ee:f3:26:54:11:ec:dd:d5:9f:8e:c1:94:fa:99:55:ea

RSA
===
SHA256:saJqHp4Ls1P+23/N/9Jt5kMWGvX8OpqUgZxYUZdV9+s
MD5:21:ac:01:67:67:e4:e8:7b:70:e8:c3:90:d2:02:9f:88

It is also possible to configure your SSH client to retrieve the correct SSH hostkey fingerprints from the SURF DNS automatically, without you having to check these fingerprints manually. In order to enable this, add the following to your ~/.ssh/config: VerifyHostKeyDNS yes. Or you can use the the following SSH command switch to temporarly enable this: -o VerifyHostKeyDNS=yes

For more information about such a setup, check out this blog post

Space shortcuts

Page tree

System overview

Node types

Phase 1 (Q3 2021)

Phase 1A + 1B + 1C (Q4 2022)

Phase 2 (Q3 2023)

Phase 2A (LISA replacement, Q3 2023)

Phase 3 (estimated Q1 2024)

File systems

The home file system

Home directory overnight backup service

Be very restrictive in allowing other logins to access your home directory

The /scratch-shared and /scratch-local spaces

Expiration policy

Quota

Access

The /scratch-local file system is not truly local

The /scratch-node space: truly node-local scratch

Node-local system directories such as /tmp, /var/tmp

The project file system

End date and expiration

Quota on size and number of files

The archive file system

Disk quota

Interconnect

Hostkey fingerprints

Space shortcuts

Page tree

Snellius hardware and file systems

System overview

Node types

Phase 1 (Q3 2021)

Phase 1A + 1B + 1C (Q4 2022)

Phase 2 (Q3 2023)

Phase 2A (LISA replacement, Q3 2023)

Phase 3 (estimated Q1 2024)

File systems

The home file system

Home directory overnight backup service

Be very restrictive in allowing other logins to access your home directory

The /scratch-shared and /scratch-local spaces

Expiration policy

Quota

Access

The /scratch-local file system is not truly local

The /scratch-node space: truly node-local scratch

Node-local system directories such as /tmp, /var/tmp

The project file system

End date and expiration

Quota on size and number of files

The archive file system

Disk quota

Interconnect

Hostkey fingerprints