Nodes Overview
Node types
The set of Snellius node available to end-users comprises three interactive nodes and a large number of batch nodes, or "worker nodes". We distinguish the following different node flavours:
(int)
: CPU-only interactive nodes ,(tcn)
: CPU-only"thin"
compute nodes, some of which have truly node-local NVMe based scratch space(fcn)
: CPU-only"fat"
compute nodes which have more memory than the default worker nodes as well as truly node-local NVMe based scratch space,(hcn)
: CPU-only"high-memory"
compute nodes with even more memory than fat nodes,(gcn)
: GPU-enhanced"gpu"
compute nodes with NVIDIA GPUs, some of which have truly node-local NVMe based scratch space,(srv)
: CPU-only not-for-computing"service"
nodes, that are primarily intended to facilitate the running of user-submitted jobs that automate data transfers into or out of the Snellius system.
The table below lists the current available Snellius node types.
# Nodes | Node flavour | Lenovo Node Type | CPU SKU | CPU Cores per Node | Accelerator(s) | DIMMs | Total memory per node | Local storage | Network connectivity |
---|---|---|---|---|---|---|---|---|---|
3 | int | ThinkSystem SR665 | AMD EPYC 7F32 (2x) 8 Cores/Socket | 16 | N/A | 16 x 16GiB | 256 GiB (16 GiB per core) |
|
|
525 | tcn | ThinkSystem SR645 | AMD Rome 7H12 (2x) 64 Cores/Socket | 128 | N/A | 16 x 16GiB | 256 GiB (2 GiB per core) | A subset of 21 nodes contain:
|
|
738 | tcn | ThinkSystem SD665v3 | AMD Genoa 9654 (2x) 96 Cores/Socket | 192 | N/A | 24 x 16GiB | 384 GiB | A subset of 72 nodes contain:
|
|
72 | fcn (Rome) | ThinkSystem SR645 | AMD Rome 7H12 (2x) 64 Cores/Socket | 128 | N/A | 16 x 64GiB | 1 TiB (8 GiB per core) |
|
|
48 | fcn (Genoa) Per 1/12/23 | ThinkSystem SD665v3 | AMD Genoa 9654 (2x) 96 Cores/Socket | 192 | N/A |
| 1.5 TiB (8 GiB per core) |
|
|
2 | hcn | ThinkSystem SR665 | AMD Rome 7H12 (2x) 64 Cores/Socket | 128 | N/A | 32 x 128GiB 2666 MHz, DDR4 | 4 TiB (32 GiB per core) | N/A |
|
2 | hcn (8 TiB) | ThinkSystem SR665 | AMD Rome 7H12 (2x) 64 Cores/Socket | 128 | N/A | 32 x 256GiB | 8 TiB (64 GiB per core) | N/A |
|
72 | gcn | ThinkSystem SD650-N v2 | Intel Xeon Platinum 8360Y (2x) 36 Cores/Socket | 72 | NVIDIA A100 (4x) 40 GiB HMB2 memory with 5 active memory stacks per GPU | 16 x 32 GiB | 512GiB (7.111 GiB per core) | A subset of 36 nodes contain:
|
|
7 | srv | ThinkSystem SR665 | AMD EPYC 7F32 (2x) 8 Cores/Socket | 16 | N/A | 16 x 16GiB 3200MHz, DDR5 | 256 GiB (16 GiB per core) |
|
|
Nodes per expansion phase
Snellius is planned to be built in three consecutive expansion phases. All phases are planned to be in operation until end of life of the machine. Since Snellius will grow in phases, it will become increasingly heterogeneous when phase 2 and phase 3 will be operational. In order to maintain a clear reference to node flavours i.e. int, tcn, gcn, we will introduce a node type acronym. This will account for the node flavour along with which phase the node was implemented in (PH1, PH2, PH3). A thin CPU-only node that was implemented in phase 1 will follow the Node Type Acronym PH1.tcn.
Phase 1 (Q3 2021)
The table below, lists the available Snellius node types available in Phase 1.
# Nodes | Node Flavour | Node Type Acronym | Lenovo Node Type | CPU SKU | CPU Cores per Node | Accelerator(s) | DIMMs | Total memory per node (per core) | Other characteristics |
---|---|---|---|---|---|---|---|---|---|
3 | int | PH1.int | ThinkSystem SR665 | AMD EPYC 7F32 (2x) 8 Cores/Socket | 16 | N/A | 16 x 16GiB | 256 GiB (16 GiB) | Local storage (not user accessible):
Network connectivity:
|
504 | tcn | PH1.tcn | ThinkSystem SR645 | AMD Rome 7H12 (2x) 64 Cores/Socket | 128 | N/A | 16 x 16GiB | 256 GiB (2 GiB) | Network connections:
|
72 | fcn | PH1.fcn | ThinkSystem SR645 | AMD Rome 7H12 (2x) 64 Cores/Socket | 128 | N/A | 16 x 64GiB | 1 TiB (8 GiB) | Local scratch:
Network connectivity:
|
2 | hcn | PH1.hcn4T | ThinkSystem SR665 | AMD Rome 7H12 (2x) 64 Cores/Socket | 128 | N/A | 32 x 128GiB 2666 MHz, DDR4 | 4 TiB (32 GiB) | Network connectivity:
|
2 | hcn | PH1.hcn8T | ThinkSystem SR665 | AMD Rome 7H12 (2x) 64 Cores/Socket | 128 | N/A | 32 x 256GiB | 8 TiB (64 GiB) | Network connectivity:
|
36 | gcn | PH1.gcn | ThinkSystem SD650-N v2 | Intel Xeon Platinum 8360Y (2x) 36 Cores/Socket | 72 | NVIDIA A100 (4x) 40 GiB HMB2 memory with 5 active memory stacks per GPU | 16 x 32 GiB | 512GiB (7.111 GiB) | Network connectivity:
|
7 | srv | PH1.srv | ThinkSystem SR665 | AMD EPYC 7F32 (2x) 8 Cores/Socket | 16 | N/A | 16 x 16GiB 3200MHz, DDR5 | 256 GiB (16 GiB) | Local SSD scratch:
Network connectivity:
|
Phase 1A + 1B + 1C (Q4 2022)
# Nodes | Node Flavour | Node Type Acronym | Lenovo Node Type | CPU SKU | CPU Cores per Node | Accelerator(s) | DIMMs | Total memory per node (per core) | Other characteristics |
---|---|---|---|---|---|---|---|---|---|
21 | tcn | ThinkSystem SR645 | AMD Rome 7H12 (2x) 64 Cores/Socket | 128 | N/A | 16 x 16GiB | 256 GiB (2 GiB) | Local NVMe scratch:
Network connectivity:
| |
36 | gcn | ThinkSystem SD650-N v2 | Intel Xeon Platinum 8360Y (2x) 36 Cores/Socket 2.4 GHz (Speed Select SKU) 250W | 72 | NVIDIA A100 (4x) 40 GiB HMB2 Memory with 5 active memory stacks per GPU | 16 x 32GiB | 512 GiB (7.111 GiB) | Local NVMe scratch:
Network connectivity:
|
Phase 2 (Q3 2023)
# Nodes | Node Flavour | Node Type Acronym | Lenovo Node Type | CPU SKU | CPU Cores per Node | Accelerator(s) | DIMMs | Total memory per node (per core) | Other characteristics |
714 | tcn | ThinkSystem SD665v3 | AMD Genoa 9654 (2x) 96 Cores/Socket | 192 | N/A | 24 x 16GiB | 384 GiB | Network connectivity:
|
Phase 2A (LISA replacement, Q3 2023)
# Nodes | Node Flavour | Node Type Acronym | Lenovo Node Type | CPU SKU | CPU Cores per Node | Accelerator(s) | DIMMs | Total memory per node (per core) | Other characteristics |
72 | tcn | ThinkSystem SD665v3 | AMD Genoa 9654 (2x) 96 Cores/Socket | 192 | N/A | 24 x 16GiB | 384 GiB | Local NVMe scratch:
Network connectivity:
|
Phase 3 (estimated Q1-2 2024)
The phase 3 update will add 88 nodes with 4 NVIDIA Hopper GPUs each.
As part of the Phase 3 update a subset of existing thin Genoa nodes was already upgraded with more memory to turn them into extra fat nodes (having 1.5 TiB and a local 6.4 TB SSD).
When Phase 3 is complete Snellius will have a total performance (CPU+GPU) in the range 13.6 - 21.5 PFLOP/s.
Interconnect
All compute nodes on Snellius use the same interconnect, which is based on Infiniband HDR100 (100Gbps), fat tree topology.
With phase 2 and phase 3 extensions added, there is also a single InfiniBand fabric, but part of it is based on InfiniBand NDR, to connect the older tree and the new tree with sufficient bandwidth,