Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents
maxLevel1

Trouble with connecting to Snellius

There are a few common issues that may be the cause of failing to connect to snellius.

Usage Agreement

In order to make use of the Snellius or Lisa servicesservice, you need to read and accept the Usage Agreement.

For this you need to visit https://portal.cua.surfsarasurf.nl/home/ and login with your login and password.

When your project is finished

Expired home directories and project spaces will be deleted

The SURFsara Usage Agreement states that data will be removed within 6 months after the expiration date of an agreement (Contract, Project Agreement, NWO (EInfra)grant, etc.). If a login (and its home directory) has no association for longer than 15 weeks with an active account/budget on the basis of which access to our systems is granted, we will delete the login and its home directory.

Data access granted to others

In some cases, owners of home directories have granted access to their data to others via group memberships or "access control lists" (ACL's). If the others still need these data, they need to take action to preserve the data for themselves.

Project spaces

For project spaces, by and large, the same applies as for home directories. Differences with home directories have to do with the fact that project spaces, unlike home directories, are created as collectively owned by the logins that are members of a disk quota group. Project space allocation is an integral part of NWO grants. When the NWO grant (or other contractual basis) expires, and there is no new or prolonged grant or contract within 15 weeks, the project space is expired as well and will be cleaned up. If there is a new grant or prolongation arrangement, the project space will remain. However, the logins that are no longer associated with the new account, will be removed from the quota group. Files in the project space associated with the UID of an expired login should be assigned to another group member that is still active.

Principal investigators of an account are warned 90, 60 and 30 days before their account expires, so there is enough time to take the appropriate measures before expiration date.

I want to acknowledge SURF for the usage of Snellius and/or Lisa and the support I got

We would appreciate if you put a text like this in your publications about projects wherein Lisa played a role:

We thank SURF (www.surf.nl) for the support in using the <Lisa Compute Cluster|National Supercomputer Snellius>.

What does Snellius mean

LichtbrekingImage Removed

Mathematician Willebrord Snel van Royen

Willebrord Snel van Royen (Leiden, 1580-1626), also known by his Latin name Snellius, was a Dutch mathematician and physicist, humanist, linguist and astronomer. He was professor of mathematics at Leiden University from 1613 until his death in 1626. He is best known for Snell's law, named after him, which indicates how light rays are broken when light passes through different materials (e.g. from air to glass, as in the image above).

Portret van Willebrord Snel van RoyenImage Removed

The system is down:

Please check the system status page.

You are temporarily banned:

You will receive a 24hrs ban on a login node after 5 failed login attempts.
The interactive nodes are protected by fail2ban; since we have multiple login nodes it can happen that you are banned at one login host but not at the other login host.

Info

We encourage you to use SSH public keys setup to access Snellius. See this information on how to upload your public key to Snellius.

Attempting to connect from a non-whitelisted IP

The

What does LISA mean

We think the name 'Lisa' is appropriate for the system, because:

  • We like the name Lisa
  • The name is short and easy to type
  • 'Lisa' is easily understandable

If one wants, 'Lisa' can stand for:

  • Lisa Supercomputer Amsterdam
  • Linux Supercomputer Amsterdam

The first one honors the fact that large essential portions of the software that make systems like Lisa possible are from the open source community: GNU. GNU stands for "GNU's not Unix". The second honors the fact that de operating system on Lisa is Debian Linux. Without the availability of free, open source operating systems a cluster like Lisa would be nearly impossible.

My job doesn't start with a status 'ReqNodeNotAvail'

This usually happens when a maintenance session is planned. You can see planned maintenance on the system status pageor in the message of the day when logging in on Snellius or Lisa. Jobs with a maximum wall clock time longer than the time until the start of the maintenance, will not start until after the maintenance and are indicated with a status 'ReqNodeNotAvail' in the squeue output. A workaround is to use a shorter maximum wall clock time.

Help, I can't login! Is my account blocked?

There can be different reasons why this happens:

  • One of the most common reasons is that the system is in maintenance. You can see planned and ongoing maintenance on the system status page.
  • Snellius

    interactive nodes only accept (GSI-)ssh connections from known, white-listed IP, ranges. You may be trying to connect with an IP by using an IP address that is not in a white-listed range.

    So, you might find that the system cannot be accessed while traveling. For these moments, please use the doornode.

    If you need access from a location that you will use regularly and long-term, please contact us through the service desk with your external IP address. Please take care that you report the CORRECT public IP address. As many sites nowadays use private IP space and a network address translation scheme, your public IP address is NOT necessarily an address that is configured directly on your local system and hence not necessarily known to your system. The following ranges are by definition private IP address ranges that cannot be whitelisted:

    • 192.168.0.0 - 192.168.255.255
    • 172.16.0.0 - 172.31.255.255
    • 10.0.0.0 - 10.255.255.255

    You can easily find out the public IP address that you use, by visiting sites like http://www.whatismyip.com, https://whatismyipaddress.com, or https://www.whatsmyip.org, echoip.cua.surf.nl with your web browser.

    How to connect to the Snellius system from abroad

    Using a doornode

    At times you'll find yourself on the road and get this good idea, which you would like to test with a simulation on Snellius. When you try to login, you'll find that access is often not possible, which can be quite frustrating. The problem is that Snellius uses a white-list of ip-addresses and only from those locations you can access the system. To help you in these situations, we have setup a separate login server, that can be accessed from anywhere in the world: doornode.surfsara.nl (thus using `ssh user@doornode.surfsara.nl`). This server can be accessed with your usual login and password, after which you get a menu with systems that you can login to. Select 'Snellius' and type your password a second time. You are now logged on to Snellius. Please note that you cannot copy files or use X11 when using the door node.If you are sure that you will access Snellius more regularly from the same location, please send your ip-address our service desk and ask it to be white-listed. You can find the blocked ip-address when logging in to Snellius using ssh -v [login]@snellius.surf.nl

    How to disconnect

    Simply issue the command

    Code Block
    languagetext
    logout

    or

    Code Block
    languagetext
    exit

    in the terminal window. Do not forget the 'Enter' after this command.

    More information

    More information about using Linux systems in general can be found on the web, for example:

    • The UNIX Tutorial for Beginners contains a useful into Unix. NOTE: some examples (especially those about variables) are for another shell (csh) then the default shell on Snellius (bash).
    • The Advanced Bash-Scripting Guide gives an in-depth but readable overview of the usage of the standard login shell 'bash', with examples..


    Data management policy

    Expired home directories and project spaces will be deleted

    The SURFsara Usage Agreement states that data will be removed within 6 months after the expiration date of an agreement (Contract, Project Agreement, NWO (EInfra)grant, etc.). If a login (and its home directory) has no association for longer than 15 weeks with an active account/budget on the basis of which access to our systems is granted, we will delete the login and its home directory.

    Data access granted to others

    In some cases, owners of home directories have granted access to their data to others via group memberships or "access control lists" (ACL's). If the others still need these data, they need to take action to preserve the data for themselves.

    Project spaces

    For project spaces, by and large, the same applies as for home directories. Differences with home directories have to do with the fact that project spaces, unlike home directories, are created as collectively owned by the logins that are members of a disk quota group. Project space allocation is an integral part of NWO grants. When the NWO grant (or other contractual basis) expires, and there is no new or prolonged grant or contract within 15 weeks, the project space is expired as well and will be cleaned up. If there is a new grant or prolongation arrangement, the project space will remain. However, the logins that are no longer associated with the new account, will be removed from the quota group. Files in the project space associated with the UID of an expired login should be assigned to another group member that is still active.

    Principal investigators of an account are warned 90, 60 and 30 days before their account expires, so there is enough time to take the appropriate measures before expiration date.



    Questions about running jobs

    I expect output from my program, but no output is generated

    The output of any program is buffered before it is written to e.g. stdout;
    You can disable buffering of output in most languages.

    Note

    Please re-enable buffering after you are done debugging, as unbuffered output will negatively influence performance of your program.

    Python

    You can enable unbuffered output with python's built-in -u  flag:

    Code Block
    python -u <main.py>

    or by setting the environment variable PYTHONUNBUFFERED :

    Code Block
    PYTHONUNBUFFERED=1 python <main.py>

    Fortran 

    I need to see the output of my batch job immediately while executing or my program crashed but the output seems cut off!

    If your program is compiled with the GNU gfortran compiler, set the following environment variable:

    Code Block
    languagetext
    export GFORTRAN_UNBUFFERED_ALL=y

    C

    For C programs, the buffering can be changed using the command setvbuf. E.g. standard output can be unbuffering using:

    Code Block
    languagetext
    #include <stdio.h>
    ...
    setvbuf(stdout, NULL, _IONBF, 0);

    My job doesn't start with a status 'ReqNodeNotAvail'

    This usually happens when a maintenance session is planned. You can see planned maintenance on the system status pageor in the message of the day when logging in on Snellius. Jobs with a maximum wall clock time longer than the time until the start of the maintenance, will not start until after the maintenance and are indicated with a status 'ReqNodeNotAvail' in the squeue output. A workaround is to use a shorter maximum wall clock time.

    I can't use CVS.

    Snellius does not support the default remote shell 'rsh' for security reasons. Please use:

    Code Block
    languagetext
    export CVS_RSH=ssh


    How can I determine the memory usage of my application?

    The SLURM batch scheduler logs the memory usage of your application and it can be retrieved after your job has ended. By issuing the command

    Code Block
    languagetext
    job-statistics -j <JOB_ID>

    will show the average and maximum memory use per MPI task, which MPI task used the maximum memory and on what node.

    Example usage might look like this

    Code Block
    languagetext
    $ job-statistics -j 1155623
    ...
                  AveRSS :  11077K
                  MaxRSS :  11576K
              MaxRSSTask :  46
              MaxRSSNode :  tcn828
    ...

    If you want to print the memory usage as part of your application, note that the linux system call getrusage() isn't fully implemented under linux, see 'man 2 getrusage'. Please compile and use the C routine printmem() (listed below), which prints the memory usage.

    Code Block
    languagetext
    titleC routine printmem()
    #include <stdio.h>
    int printmem()
    {
    char buf[30];
            snprintf(buf, 30, "/proc/%u/statm", (unsigned)getpid());
            FILE* pf = fopen(buf, "r");
            if (pf) {
                unsigned size; //       total program size
                unsigned resident;//   resident set size
                unsigned share;//      shared pages
                unsigned text;//       text (code)
                unsigned lib;//        library
                unsigned data;//       data/stack
                unsigned dt;//         dirty pages (unused in Linux 2.6)
                fscanf(pf, "%u" /* %u %u %u %u %u"*/, &size/*, &resident, &share, &text, &lib, &data*/);
                printf("KB used: %u\n",size);
                fclose(pf);
                return((int)size);
            }
    }


    Which nodes are allocated to my job?

    The environment variable $SLURM_NODELIST contains the names of the nodes. The format is something like: tcn[9006-9008]. Using the program scontrol you can obtain the nodenames, one name per line:

    Code Block
    languagetext
    $ scontrol show hostnames
    tcn9006
    tcn9007
    tcn9008

    Using the command nodeset, you get all node names from the $SLURM_NODELIST variable on one line:

    Code Block
    languagetext
    $ nodeset -e $SLURM_NODELIST
    tcn9006 tcn9007 tcn9008

    Connection refused

    A "connection refused" error on Lisa always means that you have been locked out by the Intrusion Detection System due to either having used a wrong password for 5 consecutive times or because you or your program tried to reconnect to a session that was not closed yet (MobaXterm / WinSCP and similar). The lock-out lasts 24 hours and cannot be lifted by us.

    In order to avoid this sort of issues I recommend that you use SSH public key authentication as explained here: SSH

    What does maintenance mean?

    A few times per year, you will see in the 'message of the day' (the message you get when you login in), that maintenance is planned. During this period the system will be upgraded or adapted.

    Consequences for you:

    • During maintenance, you cannot log in
    • Jobs, that would still be running at the start of the maintenance, will not be started


    Miscellaneous

    Can I receive mail on my login?

    No, you can't receive messages from outside the system. The batch nodes can send mail to your login, but, in order to read them, you have to forward (using the $HOME/.forward file) mail sent to your login.

    Code Block
    --mail-user=me@home.nl

    Put the following line in your job:

    Code Block
    echo "Job $SLURM_JOBID started at `date`" | mail $USER -s "Job $SLURM_JOBID"

    and edit the file $HOME/.forward, example:

    Code Block
    me@home.nl

    What information should be present in NWO Small Requests (Pilot grants) for Snellius?

    We expect certain information to be present in the NWO small grant applications. Putting this information there already helps us to evaluate grants quickly and efficiently which in turn results in faster process times for the applicant and also lesser questions asked. Please refer to this page for tips on what details do we require in the application form and also refer to the examples present on that page.

    Acknowledge SURF for the usage of Snellius and provided support I got

    We would appreciate if you put a text like this in your publications about projects wherein Snellius played a role:

    We thank SURF (www.surf.nl) for the support in using the National Supercomputer Snellius.