Synopsis

This page gives an overview of the Dutch national supercomputer Snellius supercomputer and details the various types of file systems, nodes, and system services available to end-users.

Snellius is a general purpose capability system and is designed to be a well balanced system, meaning Snellius is designed to handle tasks which require:

  • many cores
  • large symmetric multi-processing nodes
  • high memory
  • a fast interconnect
  • a lot of work space on disk
  • a fast I/O subsystem

Nodes Overview


Node types

The set of Snellius node available to end-users comprises three interactive nodes and a large number of batch nodes, or "worker nodes".  We distinguish the following different node flavours:

  • (int) : CPU-only interactive nodes ,
  • (tcn) : CPU-only "thin" compute nodes, some of which have truly node-local NVMe based scratch space
  • (fcn) : CPU-only "fat"  compute nodes  which have more memory than the default worker nodes as well as truly node-local NVMe based scratch space,
  • (hcn) : CPU-only "high-memory" compute nodes with even more memory than fat nodes,
  • (gcn) : GPU-enhanced "gpu" compute nodes with NVIDIA GPUs, some of which have truly node-local NVMe based scratch space,
  • (srv) : CPU-only not-for-computing "service"  nodes, that are primarily intended to facilitate the running of user-submitted jobs that automate data transfers into or out of the Snellius system.


The table below lists the current available Snellius node types.

# Nodes

Node flavour

Lenovo Node Type

CPU SKU

CPU Cores  per Node

Accelerator(s)


DIMMs 

Total memory per node

Local storage

Network connectivity

3intThinkSystem SR665

AMD EPYC 7F32 (2x)

8 Cores/Socket
3.7GHz
180W

16N/A

16 x 16GiB
3200MHz, DDR4


256 GiB
(16 GiB per core)

 

  • 1x HDR100, 100GbE ConnectX-6 VPI Dual port
  • 2x 25GbE SFP28 Mellanox OCP
525

tcn
(Rome)

ThinkSystem SR645



AMD Rome 7H12 (2x)

64 Cores/Socket
2.6GHz
280W

128N/A

16 x 16GiB
3200MHz,
DDR4 

256 GiB
(2 GiB per core) 

A subset of 21 nodes contain:

  • 1x HDR100 ConnectX-6 single port
  • 2x 25GbE SFP28 OCP
738

tcn
(Genoa)

ThinkSystem SD665v3

AMD Genoa 9654 (2x)

96 Cores/Socket
2.4GHz
360W

192

N/A

24 x 16GiB
4800MHz, DDR5 

384 GiB
(2 GiB per core) 

A subset of 72 nodes contain:

  • /scratch-node: 6.4TB NVMe SSD 
  • 1x NDR ConnectX-7 single port (200Gbps within a rack, 100Gbps outside the rack)
  • 2x 25GbE SFP28 OCP
72fcn
(Rome)

ThinkSystem SR645

AMD Rome 7H12 (2x)

64 Cores/Socket
2.6GHz
280W

128N/A

16 x 64GiB
3200MHz,
DDR4 

1 TiB
(8 GiB per core)
  • 1x HDR100 ConnectX-6 single port
  • 2x 25GbE SFP28 OCP
48fcn
(Genoa)

Per 1/12/23

ThinkSystem SD665v3

AMD Genoa 9654 (2x)

96 Cores/Socket
2.4GHz
360W

192N/A


 

1.5 TiB
(8 GiB per core)
  • 1x NDR ConnectX-7 single port (200Gbps within a rack, 100Gbps outside the rack)
  • 2x 25GbE SFP28 OCP
2

hcn
(4TiB)

ThinkSystem SR665

AMD Rome 7H12 (2x)

64 Cores/Socket
2.6GHz
280W

128N/A

32 x 128GiB

2666 MHz, DDR4 

4 TiB
(32 GiB per core)
N/A
  • 1x HDR100 ConnectX-6 single port
  • 2x 25GbE SFP28 OCP

 

2hcn
(8 TiB)
ThinkSystem SR665

AMD Rome 7H12 (2x)

64 Cores/Socket
2.6GHz
280W

128N/A

32 x 256GiB
2666 MHz, DDR4 

8 TiB
(64 GiB per core)

N/A

  • 1x HDR100 ConnectX-6 single port
  • 2x 25GbE SFP28 OCP

 

72gcnThinkSystem SD650-N v2

Intel Xeon Platinum 8360Y (2x)

36 Cores/Socket
2.4 GHz (Speed Select SKU)
250W


72

NVIDIA A100 (4x)

40 GiB HBM2 memory with 5 active memory stacks per GPU

16 x 32 GiB
3200 MHz, DDR4

512GiB
160GiB HBM2

(7.111 GiB per core)

A subset of 36 nodes contain:

  • /scratch-node: 7.68TB NVMe SSD ThinkSystem PM983
  • 2x HDR200 ConnectX-6 single port
  • 2x 25GbE SFP28 LOM
  • 1x 1GbE RJ45 LOM
7srvThinkSystem SR665

AMD EPYC 7F32 (2x)

8 Cores/Socket
3.7GHz
180W

16N/A16 x 16GiB
3200MHz, DDR5
256 GiB
(16 GiB per core)

 

  • 1x HDR100, 100GbE ConnectX-6 VPI Dual port
  • 2x 25GbE SFP28 Mellanox OCP

Nodes per expansion phase

Snellius is planned to be built in three consecutive expansion phases. All phases are planned to be in operation until end of life of the machine. Since Snellius will grow in phases, it will become increasingly heterogeneous when phase 2 and phase 3 will be operational. In order to maintain a clear reference to node flavours i.e. int, tcn, gcn, we will introduce a node type acronym. This will account for the node flavour along with which phase the node was implemented in (PH1, PH2, PH3). A thin CPU-only node that was implemented in phase 1 will follow the Node Type Acronym PH1.tcn. 

Phase 1 (Q3 2021)

The table below, lists the available Snellius node types available in Phase 1.

# Nodes

Node Flavour

Node Type Acronym

Lenovo Node Type

CPU SKU

CPU Cores  per Node

Accelerator(s)


DIMMs 

Total memory per node (per core)

Other characteristics

3intPH1.intThinkSystem SR665

AMD EPYC 7F32 (2x)

8 Cores/Socket
3.7GHz
180W

16N/A

16 x 16GiB
3200MHz, DDR4


256 GiB
(16 GiB)

Local storage (not user accessible):

Network connectivity:

  • 1x HDR100, 100GbE ConnectX-6 VPI Dual port
  • 2x 25GbE SFP28 Mellanox OCP
504

tcn

PH1.tcn

ThinkSystem SR645

AMD Rome 7H12 (2x)

64 Cores/Socket
2.6GHz
280W

128N/A

16 x 16GiB
3200MHz,
DDR4 

256 GiB
(2 GiB) 

Network connections:

  • 1x HDR100 ConnectX-6 single port
  • 2x 25GbE SFP28 OCP
72fcnPH1.fcn

ThinkSystem SR645

AMD Rome 7H12 (2x)

64 Cores/Socket
2.6GHz
280W

128N/A

16 x 64GiB
3200MHz,
DDR4 

1 TiB
(8 GiB)

Local scratch:

  • 6.4TB NVMe SSD Intel P5600
  • See /scratch-node space below

Network connectivity:

  • 1x HDR100 ConnectX-6 single port
  • 2x 25GbE SFP28 OCP
2

hcn

PH1.hcn4TThinkSystem SR665

AMD Rome 7H12 (2x)

64 Cores/Socket
2.6GHz
280W

128N/A

32 x 128GiB

2666 MHz, DDR4 

4 TiB
(32 GiB)

Network connectivity:

  • 1x HDR100 ConnectX-6 single port
  • 2x 25GbE SFP28 OCP
2hcnPH1.hcn8TThinkSystem SR665

AMD Rome 7H12 (2x)

64 Cores/Socket
2.6GHz
280W

128N/A

32 x 256GiB
2666 MHz, DDR4 

8 TiB
(64 GiB)

Network connectivity:

  • 1x HDR100 ConnectX-6 single port
  • 2x 25GbE SFP28 OCP
36gcnPH1.gcnThinkSystem SD650-N v2

Intel Xeon Platinum 8360Y (2x)

36 Cores/Socket
2.4 GHz (Speed Select SKU)
250W


72

NVIDIA A100 (4x)

40 GiB HBM2 memory with 5 active memory stacks per GPU

16 x 32 GiB
3200 MHz, DDR4

512GiB
160GiB HBM2

(7.111 GiB)

Network connectivity:

  • 2x HDR100 ConnectX-6 single port
  • 2x 25GbE SFP28 LOM
  • 1x 1GbE RJ45 LOM
7srvPH1.srvThinkSystem SR665

AMD EPYC 7F32 (2x)

8 Cores/Socket
3.7GHz
180W

16N/A16 x 16GiB
3200MHz, DDR5
256 GiB
(16 GiB)

Local SSD scratch:

Network connectivity:

  • 1x HDR100, 100GbE ConnectX-6 VPI Dual port
  • 2x 25GbE SFP28 Mellanox OCP

Phase 1A + 1B + 1C (Q4 2022)

# Nodes

Node Flavour

Node Type Acronym

Lenovo Node Type

CPU SKU

CPU Cores  per Node

Accelerator(s)


DIMMs 

Total memory per node (per core)

Other characteristics

21tcn

ThinkSystem SR645

AMD Rome 7H12 (2x)

64 Cores/Socket
2.6GHz
280W

128N/A

16 x 16GiB
3200MHz,
DDR4 

256 GiB
(2 GiB) 

Local NVMe scratch:

  • 6.4TB NVMe SSD Intel P5600
  • See /scratch-node space below

Network connectivity:

  • 1x HDR100 ConnectX-6 single port
  • 2x 25GbE SFP28 OCP
36gcn
ThinkSystem SD650-N v2Intel Xeon Platinum 8360Y (2x)

36 Cores/Socket
2.4 GHz (Speed Select SKU)
250W
72

NVIDIA A100 (4x)

40 GiB HBM2 Memory with 5 active memory stacks per GPU

16 x 32GiB
3200 MHz, DDR4


512 GiB
160 GiB HBM2

(7.111 GiB)

Local NVMe scratch:

  • ThinkSystem PM983 2.5" 7mm 7.68TB
  • Read Intensive Entry NVMe PCIe 3.0 x4
  • Trayless SSD
  • See /scratch-node space below

Network connectivity:

  • 2x HDR100 ConnectX-6 single port
  • 2x 25GbE SFP28 LOM
  • 1x 1GbE RJ45 LOM

Phase 2 (Q3 2023)

# Nodes

Node Flavour

Node Type Acronym

Lenovo Node Type

CPU SKU

CPU Cores  per Node

Accelerator(s)

DIMMs 

Total memory per node (per core)

Other characteristics

714

tcn


ThinkSystem SD665v3

AMD Genoa 9654 (2x)

96 Cores/Socket
2.4GHz
360W

192

N/A

24 x 16GiB
4800MHz, DDR5 

384 GiB
(2 GiB) 

Network connectivity:

  • 1x NDR ConnectX-7 single port (200Gbps within a rack, 100Gbps outside the rack)
  • 2x 25GbE SFP28 OCP

Phase 2A (LISA replacement, Q3 2023)

# Nodes

Node Flavour

Node Type Acronym

Lenovo Node Type

CPU SKU

CPU Cores  per Node

Accelerator(s)

DIMMs 

Total memory per node (per core)

Other characteristics

72

tcn


ThinkSystem SD665v3

AMD Genoa 9654 (2x)

96 Cores/Socket
2.4GHz
360W

192

N/A

24 x 16GiB
4800MHz, DDR5 

384 GiB
(2 GiB) 

Local NVMe scratch:

  • 6.4TB NVMe SSD
  • See /scratch-node space below

Network connectivity:

  • 1x NDR ConnectX-7 single port (200Gbps within a rack, 100Gbps outside the rack)
  • 2x 25GbE SFP28 OCP

Phase 3 (estimated Q1-2 2024)

The phase 3 update will add 88 nodes with 4 NVIDIA Hopper GPUs each.

As part of the Phase 3 update a subset of existing thin Genoa nodes was already upgraded with more memory to turn them into extra fat nodes (having 1.5 TiB and a local 6.4 TB SSD).

When Phase 3 is complete Snellius will have a total performance (CPU+GPU) in the range 13.6 - 21.5 PFLOP/s. 

Interconnect

All compute nodes on Snellius  use the same interconnect, which is based on Infiniband HDR100 (100Gbps), fat tree topology.

With phase 2 and phase 3 extensions added,  there is also a single InfiniBand fabric, but  part of it is based on InfiniBand NDR, to connect the older tree and the new tree with sufficient  bandwidth,