Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Info
titleSynopsis
Here we summarize important changes for existing Cartesius users that start to work on Snellius. It contains information on the data migration of user files, the software environment, batch system and more. This page lists the main differences between Snellius and Cartesius, and will refer to the full documentation where applicable.

Table of Contents

Known issues

Snellius is a new system, and you may find issues in the first few weeks of usage. Some of the issues are known to us and are in the process of being fixed. Please see the separate page Snellius known issues for things we are aware of and you do not need to report to the Service Desk.


Data migration: check if all your files and directories are present

As described in the Cartesius to Snellius migration page, SURF has migrated any relevant user data from Cartesius into Snellius for you.

Please check that all the files in your home directories and project spaces assigned to you (if any) are correctly migrated. If you find files are missing then please contact our Service Desk right away.

Home directories

The home directory of a Cartesius login was migrated if the following two conditions both applied:

  • The login is associated with an SBU account that was active on, or after, the Cutoff Date of 1 June 2021 (i.e. did not expire before the Cutoff Date).
  • A valid Usage Agreement for the login exists, has been accepted/duly renewed by the person to whom the login was handed out to. This agreement can be reviewed and accepted here.

Warning
titleCartesius home backup available only until 31 December 2021

Note that for home directories a daily backup service is maintained, both on Cartesius and Snellius. Offline backups of Cartesius home directories will be kept, until 31 December 2021, including for backups of directories that are not migrated. Consequently, non-migrated home directories will become unavailable and non-restorable after 31 December 2021.

Project spaces

A project space was migrated to Snellius if the following two conditions both applied to at least one member login of the group co-owning the project space:   

  • The login is associated with an SBU account that was active on, or after, the Cutoff Date (i.e. did not expire before the Cutoff Date).
  • A valid Usage Agreement for the login exists
    Warning

    Because of the different block size between Snellius and Cartesius, disk occupancy of the files in your home on Snellius may exceed the 200GB quota. This will results in an error when trying to create new files of the type:

    Code Block
    error: file write error: Disk quota exceeded

    Please try to free up space by removing unused files and/or creating compressed archives (you can use the scratch filesystem to temporary stores the archived files, before cleaning up your home).  


    Project spaces

    A project space was migrated to Snellius if the following two conditions both applied to at least one member login of the group co-owning the project space:   

    • The login is associated with an SBU account that was active on, or after, the Cutoff Date (i.e. did not expire before the Cutoff Date).
    • A valid Usage Agreement for the login exists, has been accepted/duly renewed by the person to whom the login was handed out to. This agreement can be reviewed and accepted here.

    The group co-owing the project space is the group of logins that share the allocated disk quota and have read and write access to the project space root.

    Warning

    Note that for project spaces no backup service is in place, as project space is a user-managed scratch resource, not a data-preservation resource. All project space data that is not included in the above will not be migrated to Snellius and will become unavailable as soon as Cartesius is taken offline.

    Warning

    Because of the different block size between Snellius and Cartesius, disk occupancy of the files in your home on Snellius may exceed the 200GB quota. This will results in an error when trying to create new files of the type:

    Code Block
    error: file write error: Disk quota exceeded

    Please try to free up space by removing unused files and/or creating compressed archives (you can use the scratch filesystem to temporary stores the archived files, before cleaning up your home).  


    Scratch spaces

    Warning
    titleScratch - Possible Data Loss

    Files that resided on scratch filesystems of Cartesius were not migrated to Snellius.


    Non-native file systems

    Warning
    titleArchive

    The migration only pertained to data on native Cartesius filesystems. In particular, data associated with the same login, but residing on the SURF Data Archive facility, are not affected in any way. 

    Porting your environment to Snellius: hidden files and directories

    In the home directories of Cartesius users that have been migrated to Snellius, there are a number of files and subdirectories that play a crucial role in the setup of the login-specific environment and the configuration of particular applications. Obvious examples are files like ".bashrc", ".cshrc", and ".vimrc", and directories like ".ssh" or ".matlab". More generally, the files and directories that have such a constituting and customising role for the personal user environment, have a name and location with the following characteristics:

    • They are located directly in the root of your home directory
    • Their name starts with a dot character ("."), e.g. ".bashrc". This makes them "hidden" directory entries

    Since Snellius is different from Cartesius, a substantial amount of the contents in those files will not work correctly on Snellius at all, or lead to unexpected, erratic, or unwanted behaviour on Snellius. As these files need to be adapted to the new Snellius environment SURF applies the following modification to your home directory contents, after the final data migration synchronisation run:

    1. A subdirectory, named "CARTESIUS.hidden-directory-entries", is created in the root of your home directory.
    2. All files and directories directly in the root of your home directory that have an name that start with a “.” are subsequently moved into the newly added subdirectory denoted in step 1.

      Note

      There is one exception to this rule: Since logins associated with a PRACE account can only log in by means of ssh key-based (passwordless) authentication, the ".ssh" subdirectory of their home directory will NOT be moved. For non-PRACE users the ".ssh" directory is moved into the "CARTESIUS.hidden-directory-entries" directory.

    3. For a small number of environment bootstrapping files - ".bash_profile", ".bashrc", ".cshrc" – a standardized version, with minimal contents, and suited for Snellius, is created in the home directory for you.

    Further customisation of hidden files and directories is up to the user, especially the files mentioned in step 3. You can, of course, port (elements of) scripts in the Cartesius.hidden-directory-entries  back into scripts in the root of your Snellius home directory.

    Note also that if you  have locally built modules on Cartesius with eblocalinstall then these will be located in the directory .local/easybuild  within the CARTESIUS.hidden-directory-entries  directory. See also the section on locally built modules below.

    Note

    The CARTESIUS.hidden-directory-entries  subdirectory at first glance may appear to be empty. The entries that have been moved in there are by definition "hidden". To list them, you must use a command that show hidden files, such as "ls -la".

    In the highly unlikely case that the name "CARTESIUS.hidden-directory-entries" conflicts with a migrated, and therefore already existing, entry in your home directory, the name of the new subdirectory is slightly modified, extended with a short suffix, to resolve the name conflict.

    Updated module environment

    On Snellius we updated the environment module system to LMOD. This brings changes in the way modules can be loaded and used on Snellius. Full information can be found in our Environment Modules tutorial, or the section Loading modules in our HPC User Guide on how to write job scripts.

    No default module versions

    One main difference with Cartesius is that default modules are not set on Snellius, so users need to specify the full module name (including version) in order to properly load a module. For example:

    Code Block
    languagebash
    # Two different module versions are available for HDF5
    [paulm@int3 ~]$ module avail HDF5
    
    ------------------------------------------------------------------------------------------- /sw/arch/Centos8/EB_production/2021/modulefiles/data --------------------------------------------------------------------------------------------
       HDF5/1.10.7-gompi-2021a    HDF5/1.10.7-iimpi-2021a
    
    Use "module spider" to find all possible modules and extensions.
    Use "module keyword key1 key2 ..." to search for all possible modules matching any of the "keys".
    
    # Can't load a module without specifying the exact version you want to load
    [paulm@int3 ~]$ module load HDF5
    Lmod has detected the following error:  These module(s) or extension(s) exist but cannot be loaded as requested: "HDF5"
       Try: "module spider HDF5" to see how to load the module(s).
    
    # Load module based on full version number
    [paulm@int3 ~]$ module load HDF5/1.10.7-gompi-2021a
    [paulm@int3 ~]$ 
    

    Searching/listing modules

    To search for a specific module by name the module avail (abbr. module av ) and module spider commands are available. If you don't provide any options the module avail  command will list all available modules. The command in LMOD will search not just on the module name (e.g. GCC), but the full module version string. This allows more flexible searching, but also might return a lot of results:

    Code Block
    [paulm@int3 ~]$ module av Python
    
    ------------------------------------------------------------------------------------------- /sw/arch/Centos8/EB_production/2021/modulefiles/tools -------------------------------------------------------------------------------------------
       IPython/7.25.0-GCCcore-10.3.0
    
    ------------------------------------------------------------------------------------------- /sw/arch/Centos8/EB_production/2021/modulefiles/lang --------------------------------------------------------------------------------------------
       Python/2.7.18-GCCcore-10.3.0-bare    Python/3.9.5-GCCcore-10.3.0-bare    Python/3.9.5-GCCcore-10.3.0
    
    ------------------------------------------------------------------------------------------- /sw/arch/Centos8/EB_production/2021/modulefiles/devel -------------------------------------------------------------------------------------------
       pkgconfig/1.5.4-GCCcore-10.3.0-python
    
    
    [paulm@int3 ~]$ module av 6.2.1
    
    ------------------------------------------------------------------------------------------- /sw/arch/Centos8/EB_production/2021/modulefiles/math --------------------------------------------------------------------------------------------
       GMP/6.2.1-GCCcore-10.3.0
    
    
    # Many results, due to matching of "GCCcore"
    [paulm@int3 ~]$ module av GCC
    
    ----------------------------------------------------------------------------------------- /home/paulm/.local/easybuild/Centos8/2021/modulefiles/all -----------------------------------------------------------------------------------------
       freeglut/3.2.1-GCCcore-10.3.0    glew/2.1.0-GCCcore-10.3.0    Mesa-demos/8.4.0-GCCcore-10.3.0
    
    ------------------------------------------------------------------------------------------- /sw/arch/Centos8/EB_production/2021/modulefiles/phys --------------------------------------------------------------------------------------------
       UDUNITS/2.2.28-GCCcore-10.3.0
    
    ...

    You can use a slash ("/") in the search string to match only the package name part. For example, to list only the available versions of the GCC module:

    Code Block
    [paulm@int3 ~]$ module av GCC/
    
    ----------------------------------------------------------------------------------------- /sw/arch/Centos8/EB_production/2021/modulefiles/compiler ------------------------------------------------------------------------------------------
       GCC/10.3.0 (L)
    
      Where:
       L:  Module is loaded
    Note

    The module avail  command does a case-insensitive search, while the module load  and unload  commands are case-sensitive to the name of the module provided.

    Module spider

    The module spider  command is an alternative and can be used to get more detailed information on a specific module. For example, for module Python we can show its description and list all available versions:

    Code Block
    [paulm@int3 ~]$ module spider Python
    
    -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
      Python:
    -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
        Description:
          Python is a programming language that lets you work more quickly and integrate your systems more effectively.
    
         Versions:
            Python/2.7.18-GCCcore-10.3.0-bare
            Python/3.9.5-GCCcore-10.3.0-bare
            Python/3.9.5-GCCcore-10.3.0
         Other possible modules matches:
            IPython
    
    -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
      To find other possible module matches execute:
    
          $ module -r spider '.*Python.*'
    
    -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
      For detailed information about a specific "Python" package (including how to load the modules) use the module's full name.
      Note that names that have a trailing (E) are extensions provided by other modules.
      For example:
    
         $ module spider Python/3.9.5-GCCcore-10.3.0
    -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    
    

    Module dependencies

    Another difference is that dependencies between modules are tracked more accurately, so when you unload a module all its dependencies (if possible) will be unloaded as well. So you no longer have to use module purge in certain cases.


    The new software stack (2021)

    The modules environment on Snellius (and Lisa) provides the new 2021 modules collection. This contains recent versions of the many software packages that were available in the 2020 environment on Cartesius.

    Note that the 2020 and 2019 collections that were available on Cartesius are not available on Snellius. The latter is due to differences between Snellius and Cartesius that would make it time-consuming to install and test all the software packages in the previous environments:

    • Snellius contains (mostly) nodes with AMD CPUs, versus Intel CPUs on Cartesius
    • Snellius uses a different operating system compared to Cartesius

    Locally built modules

    It is possible, through the EasyBuild system that is available on Snellius, to build your own modules. For example, if you want to have a different version of a module then is available in the global modules environment. Or if you want to add a package that we do not provide globally. Installing local modules could already be done on Cartesius, but the modules you might have built on Cartesius will not work on Snellius, due to CPU and OS differences. Those locally-installed modules therefore will need to be rebuilt on Snellius using EasyBuild. See EasyBuild tutorial for general instructions.

    Note

    Existing locally installed modules you built on Cartesius will be located in your migrated home directory on Snellius in CARTESIUS.hidden-directory-entries/.local/easybuild/RedHatEnterpriseServer7  (as per the porting environment section above). It is recommended to start over fresh and locally build the modules on Snellius that you need with eblocalinstall  (which is part of the eb  module), instead of trying to reuse files from this directory.

    After locally installing the modules you need you can then completely remove the directory RedHatEnterpriseServer7.

    SURF(sara) compiler wrappers

    These are no longer used on Snellius. When you load, say, GCC/10.3.0 the gcc command will simply refer to /sw/arch/Centos8/EB_production/2021/software/GCCcore/10.3.0/bin/gcc, instead of to a wrapper script.

    The batch system

    On Snellius the same SLURM scheduling system as on Cartesius is used to submit and control user jobs. There are some differences in its setup, though.

    Updated partitions

    The partitions available on Snellius are described in the Snellius usage and accounting page. Some of partitions names have been changed to reflect the different type of compute nodes, please check the table for more information on each partition configuration.

    On Cartesius each partition had a "..._short" version that was limited to jobs of up to 1 hour walltime, for example "gpu" and "gpu_short". On Snellius "short" jobs with a walltime of at most 1 hour no longer have to be submitted to a separate partition. In fact, "..._short" partitions are not available anymore. For the "thin", "fat" and "gpu" partitions the SLURM scheduler always keeps a number of nodes available that can only run short jobs. This provides the benefit of the "short" partitions of Cartesius without having to explicitly submit a job to a different partition.

    Since the partitions are now homogeneous in terms of hardware they contain, there is no more reason to use constraints to select specific CPU types, as was needed on Cartesius. 

    Shared jobs and single-node jobs

    On Snellius single node jobs from different users can execute simultaneously on the same node, known as "shared jobs". This is partly needed to get efficient node usage due to the larger number of CPU cores in the Snellius nodes compared to Cartesius. From accounting point of view, you will be charged only for the resources allocated to your job. Check the "Shared usage accounting" section in the Snellius usage and accounting page for more information.

    Jobs that only use a single node (either by using "-N 1", or when not specifying the number of nodes) are now by default started as a shared job. SLURM will also warn you about this:

    Code Block
    languagebash
    # Single node (-N 1) is implicit, shared job
    [paulm@int3 ~]$ sbatch -t 1:00 hostname.job
    sbatch: Single node jobs run on a shared node by default. Add --exclusive if you want to use a node exclusively.
    Submitted batch job 2070
    
    # Single node exclusive job
    [paulm@int3 ~]$ sbatch -t 1:00 --exclusive hostname.job
    Submitted batch job 2071

    If you want to make sure your job is the only one running on a node (known as "exclusive use") then you can use the --exclusive  option with sbatch.

    Resources on shared node are "partitioned" by the system using cgroups, so that you can access and see only the resources allocated to your job (e.g: if you request 1/4 of a thin node, you will have access to 32 cores and 64 GiB of memory). This does not apply to the memory bandwidth which is not partitioned and it is shared between jobs on the same node. If your code is makes intensive access to memory, you may want to consider exclusive node usage (even though you are using only part of the cores on the node) since access to memory could be the limiting factor for the performances of your application.

    Note

    Note that the allocated resources (full or part of the node)  will always be accounted and subtracted from your budget, regardless of the actual resources used in the job. For example, if your job only uses 32 out of 128 CPU cores in a node, but you submit it as an exclusive job, then it will be accounted for the full 128 cores. So one of the benefits of using shared jobs is that you can have smaller resource usage by only using part of a node.

    Minimum shared job size

    Even though a shared job will only use part of a node, there are limits in place on the minimum size of a shared job. For example, you cannot allocate a job that uses only a single CPU core, a job needs to allocated at least 1/4th of the CPU cores in a node. See Snellius usage and accounting for the limits per partition and details on the accounting of shared jobs.