Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Info
iconfalse
titleSynopsis

This page gives an overview of the Snellius supercomputer and details the various types of file systems, nodes, and system services available to end-users.

Table of Contents

System overview

Snellius is the Dutch national supercomputer. Snellius is a general purpose capability system and is designed to be a well balanced system. If you need one or more of: many cores, large symmetric multi-processing nodes, high memory, a fast interconnect, a lot of work space on disk, or a fast I/O subsystem then Snellius is the machine of choice.

Node types 

Snellius is planned to be built in three consecutive expansion phases. All phases are planned to be in operation until end of life of the machine. Since Snellius will grow in phases, it will become increasingly heterogeneous when phase 2 and phase 3 will be operational. In order to maintain a clear reference to node flavours i.e. int, tcn, gcn, we will introduce a node type acronym. This will account for the node flavour along with which phase the node was implemented in (PH1, PH2, PH3). A thin CPU-only node that was implemented in phase 1 will follow the Node Type Acronym PH1.tcn. 

The set of Snellius node available to end-users comprises three interactive nodes and a large number of batch nodes, or "worker nodes".  We distinguish the following different node flavours:

  • CPU-only interactive nodes (int),
  • CPU-only "thin" compute nodes (tcn),
  • GPU-enhanced compute nodes with NVIDIA GPUs (gcn),
  • CPU-only "fat" compute nodes (fcn) which have more memory than the default worker nodes as well as truly node-local NVMe based scratch space,
  • CPU-only high-memory compute nodes (hcn) with even more memory than fat nodes,
  • CPU only not-for-computing "service" nodes (srv),  that are primarily intended to facilitate the running of user-submitted jobs that automate data transfers into or out of the Snellius system.

Phase 1 (Q3 2021)

The table below, lists the available Snellius node types available in Phase 1.

# Nodes

Node Flavour

Node Type Acronym

Lenovo Node Type

CPU SKU

CPU Cores  per Node

Accelerator(s)


DIMMs 

Total memory per node, (per core)

Other characteristics

3intPH1.intThinkSystem SR665

AMD EPYC 7F32 (2x),

8 Cores/Socket, 3.7GHz, 180W

16N.A.

16x16GiB16x16GB, 3200MHz, DDR4


256 GiBGB, (16 GiBGB)
  • 6.4TiB 4TB NVMe SSD Intel P5600
  • 1xHDR100, 100GbE ConnectX-6 VPI Dual port
  • 2x25GbE SFP28 Mellanox OCP
504

tcn

PH1.tcn

ThinkSystem SR645

AMD Rome 7H12 (2x),

64 Cores/Socket, 2.6GHz, 280W

128N.A.

16x16GiB, 3200MHz, DDR4 

256 GiBGB, (2 GiBGB
  • 1xHDR100 ConnectX-6 single port
  • 2x25GbE SFP28 OCP
72fcnPH1.fcn

ThinkSystem SR645

AMD Rome 7H12 (2x),

64 Cores/Socket, 2.6GHz, 280W

128N.A.

16x64GiB16x64GB, 3200MHz, DDR4 

1 TiBTB, (8 GiBGB)
  • 6.4TiB 4TB NVMe SSD Intel P5600
  • 1xHDR100 ConnectX-6 single port
  • 2x25GbE SFP28 OCP
2

hcn

PH1.hcn4TThinkSystem SR665

AMD Rome 7H12 (2x),

64 Cores/Socket, 2.6GHz, 280W

128N.A.

32x128GiB32x128GB,

2666 MHz, DDR4 

4 TiBTB, (32 GiBGB)
  • 1xHDR100 ConnectX-6 single port
  • 2x25GbE SFP28 OCP
2hcnPH1.hcn8TThinkSystem SR665

AMD Rome 7H12 (2x),

64 Cores/Socket, 2.6GHz, 280W

128N.A.

32x256GiB32x256GB,

2666 MHz, DDR4 

8 TiBTB, (64 GiBGB)
  • 1xHDR100 ConnectX-6 single port
  • 2x25GbE SFP28 OCP
36gcnPH1.gcnThinkSystem SD650-N v2

Intel Xeon Platinum 8360Y (2x),

36 Cores/Socket, 2.4 GHz (Speed Select SKU), 250W


72

NVIDIA A100 (4x),

40 GiB HMB2 memory with 5 active memory stacks per GPU

16x32 GiBGB,

3200 MHz, DDR4

  • 512GiB512GB
  • 160GiB 160GB HMB2

(7.111 GiB)

  • 2xHDR100 ConnectX-6 single port
  • 2x25GbE SFP28 LOM
  • 1x1GbE RJ45 LOM
7srvPH1.srvThinkSystem SR665

AMD EPYC 7F32 (2x)

8 Cores/Socket, 3.7GHz, 180W

16N.A.16x16GiB 16x16GB 3200MHz, DDR5256 GiBGB, (16 GiBGB)
  • 6.4TiB 4TB NVMe SSD Intel P5600
  • 1xHDR100, 100GbE ConnectX-6 VPI Dual port
  • 2x25GbE SFP28 Mellanox OCP

Phase 2 (Q3 2022)

An extension will be added with more CPU-only thin nodes (future generation AMD EPYC processors, 2 GB per core), with a peak performance of 5.1 PFLOP/s.

Phase 3 (Q3 2023)

There are three options for this extension:

  1. CPU thin nodes (same future generation AMD EPYC processors, aggregate: 2.4 PFLOP/s), or
  2. GPU nodes (future generation NVIDIA GPUs, aggregate: 10.3 PFLOP/s), or
  3. Storage (the amount still needs to be determined)

The choice will be made 1.5 years after the start of production of Phase 1 and will be based on actual usage and demand of the system.

When Phase 3 is complete Snellius will have a total performance (CPU+GPU) in the range 13.6 - 21.5 PFLOP/s. 

File system

There are several filesystems available on Snellius:

File systemQuotaSpeedShared between nodesMount PointExpirationBackup
Home200 GiBNormalYes/home/<username>15 weeks after project expirationNightly incremental
Scratch-localN.A.FastNo/scratch/localEnd of jobNo backup
Scratch-sharedN.A. FastYes/scratch/sharedFiles older than 14 days are removed automaticallyNo backup
Projectbased on requestNormalYes

/project/<project_name>

Project durationNo backup
Archive Servicebased on requestVery slowNo/archive/<username>Project durationNightly

The home file system

Every user has their own home directory, which is accessible at /home/<login_name>. Your home directory has default capacity quota of 200 GiB (i.e. 200 x 230 bytes). No quota on number of files and directories is enforced.

The 200 GiB home dir is ample space for a work environment on the system for most users. But our helpdesk can be contacted if you think that it is not sufficient to accommodate your work environment on Snellius. Note, however, that home directories are not intended for long term storage of large data sets. SURF provides the archive facility for that. Home directories are neither suitable for fast, large scale or parallel I/O. Use scratch and/or project space (see below) for fast and voluminous job I/O.

SURF provides a versioned incremental backup service for your home directory that is run overnight. Files that have been backed up are retained in the backup repository for three weeks after they have been deleted from the file system. We can restore files and/or directories when you accidentally remove them or overwrite them with the wrong contents, or when they are corrupted because of some storage hardware failure – provided, of course, that a version already existed and was successfully backed up. Note that no consistent backup can be created of files that are removed, being changed, truncated, or appended to, while the backup process is running. The backup process will therefore simply skip files that are opened and in use by other processes.

To have a file successfully backed up:

  • the file must reside on the file system when the backup runs
  • the file must be closed

The scratch file system

The scratch file system is intended as fast, temporary storage that can be used while running a job, and can be accessed by all users with a valid account on the system. 

Your default scratch space capacity quota is 8 TiB (i.e. 8 x 240 bytes). The i-node quota (number of files and directories per user) is set at a soft limit of 3 million files per user, and a hard limit that is set substantially higher. Most of our users will never hit the soft limit ceiling, as there is a programmed cleanup of files that are older than 6 days (on scratch-local) or 14 days (on scratch-shared). Users that produce enormous amounts of files per job may have to clean up files and directories themselves after the job, as they could reach their quota before the automatic cleanup is invoked.

If the soft limit is reached, a grace period of 7 days starts counting down. If you clean-up within the grace period, and do not grow to reach the hard limit, you will not notice anything of the limit. If the hard limit is reached or if you fail to clean up to get a usage below the soft limit in due time, the file system refuses to create new files and directories for you.

Scratch file systems can be accessed from two locations:

  • /scratch-local/
  • /scratch-shared/

/scratch-local/ behaves like it is local to each node, whereas /scratch-shared/ denotes the same location on every node. But in fact not even the /scratch-local/ directories are truly (physically) local. All scratch directories fall under the same single per-user quota regime.

All /scratch-local/ directories are in fact visible from all nodes (and by all users), if you know the canonized fully qualified file names. This can be seen with:

Code Block
$ readlink -f $TMPDIR
/scratch/nodespecific/int2/<loginname>

Note that TMPDIR environment variable is set to a default value of /scratch-local/<loginname> and the corresponding directory is already created, or is created when you log in, or a batch job is started.

The /scratch-shared/  directory behaves like scratch space that is shared by all nodes. Please create your own subdirectory under scratch-shared, e.g. with the command: 

Code Block
$ mktemp -d -p /scratch-shared
Note
titleScratch cleanup and lack of backup

For scratch space there is an automated expiration policy of 6/14 days (for scratch-local / scratch-shared). Files and directories that are older, i.e. that have not changed their contents for this duration, are removed automatically.

There is no guarantee however that files are actually retained for at least 6/14 days. Serious hardware failure, for example, could cause loss of files that have not reached that age.

SURF provides no backup service on scratch space. Job end results, or any other precious job output that you want to keep, must be copied in time to your home directory, to the SURF archive facility, or to an off-site storage facility of your choice.

Truly local directories, such as /tmp and /var/tmp, should be regarded as "off limits" for users. They are too small and too slow to be used for job outputs. Furthermore, they are needed by the operating system itself. They can be emptied without further notice at node reboot, at node re-install - in fact at several other occasions.

If you (accidentally) fill up /tmp  or /var/tmp  on a node, the operating system will experience problems. Ultimately your job (and on an interactive node you and other users as well) will experience problems.

  • Our system administrators won't like you.
  • Your fellow users won't like you.
  • Use the scratch file systems instead.
  • In your job command files you can use $TMPDIR : a per job step unique directory in /scratch-local  (i.e. Also unique per node).
  • On the login node you can also use $TMPDIR .

The project file system

A project file system can be used

  1. If you need additional storage space, but do not require a backup.
  2. If you need to share files within a collaboration.

By default accounts on our systems are not provisioned with a project space. It can be requested when you apply for an account, or by contacting our service desk (depending on the type of account different conditions may apply, contact us to know if your account is eligible for a project space).

The purpose of project space is to enable fast and bulky reading and/or writing of files by large and/or many jobs. A project space is not meant for long-term storage of data. In some sense, project spaces can be seen as "user managed scratch". This implies that project users themselves must take care not to run into their quota limit and to backup and recover data when the project expires.

The project space itself has an agreed upon end date. But there is no expiration policy for the age of individual files and directories in your project space. Project users themselves must take care not to run into their quota limits, by deleting and/or compacting and archiving data no longer needed.

Note that SURF provides no backup service on project space. If you have not arranged for a backup, a restore possibility, your data will be irrevocably lost in case serious damage is caused to your files or to the file system at large (e.g. by failing hardware or human error). SURF provides the archive facility for long-term data storage, but you may of course also use off-site storage of your choice. But it is your own responsibility to archive and to keep track of what you archived when and where.

When the agreed upon period of time of your Snellius project space expires, the project space will be made inaccessible. If no further notice from the project space users is received, the files and directories in your project space will eventually be deleted after a grace period of an additional four weeks.

All members of the group used for quota administration will receive a notification on their e-mail address registered in the SURF user administration, 30 days in advance of the expiration date. A second notification mail will be sent out the day after expiration.

Quota on project file systems are per group, rather than per user. Users of the project space must be member of the group used for quota administration for the project and they must write files and directories with this group ownership. In most cases this works correctly by default, but some commands that try to set group ownership (e.g. "rsync -a" or "cp -p") will fail without extra options. See the tutorial on using project space for sharing files, for more information.

For users involved in more than one data project it is theoretically possible to store data in multiple project directories using any quota group that they are member of quasi-randomly. This is unwanted behaviour: files and directories with a group ownership used for the quota administration of a particular data project must all be placed under their respective project root directory. Conversely, only subdirectories and files located belonging to the project should be placed under that directory. SURF will enforce these rules, if needed, with periodic corrective actions that change group ownership without further notice.

In principle the lifetime of a project directory is not extended beyond the lifetime of an associated compute project, as project spaces for projects that cannot be active are wasting high-performance storage resources. In some cases, however, a follow up project could make efficient use of the same data without first having to stage them from an archive into a new project space. This may be a valid reason for retaining a Snellius project space "in between projects". Demonstrating, before the grace period has ended, that the project proposal for a follow up project and destined "heir" of the project space has actually been submitted, is mandatory. New limits and expiration dates will have to be established and motivated by the needs of the follow-up project.

The archive file system

The archive file system Service is intended for long term storage of large amounts of data. Most of this data is (eventually) stored on tape, and therefore accessing it may take a while. The archive is accessible from login and staging nodes at the path /archive/<username>.

The archive system is designed to only handle large files efficiently. If you want to archive many smaller files, please compress them first in a single tar file, before copying it to the archive. Never store a large amount of small files on the archive: they may be scattered across different tapes and it will put a large load on the archive to retrieve all those files if you need them at a later stage. See this section of the documentation for more information on using the archive appropriately.


Interconnect

All compute nodes on Snellius will use the same interconnect which is based on Infiniband HDR100 (100Gbps), fat tree topology.