Introduction

In order to use or create certain software packages, often the environment (i.e. the environment variables) have to be adapted. For example the PATH environment variable that is used to locate programs, and the LD_RUN_PATH environment variable that is used to locate shared libraries.

By using environment modules (Lmod on Snellius and Lisa) is an attempt to simplify the user's experience within each cluster. Within this tutorial we will provide an example that describes the compilation of a program (called myprogram.c that uses NETCDF (a library to store and retrieve scientific data). The NETCDF package consists of several libraries and a few programs.

An example without the use of modules

In this example we will only use one library (libnetcdf.a) and one program (ncdump).

In order to compile the program, one has to locate the place where the netcdf library is situated and where the include files are situated. Let us assume that these locations are respectively:

  • /usr/local/netcdf/lib
  • /usr/local/netcdf/include

Compilation of the program myprog.c:

cc -o myprof myprog.c -I/usr/local/netcdf/include -L/usr/local/netcdf/lib -l netcdf 

This is not too complicated, but one can imagine that compiling a program that uses many libraries can become cumbersome, and that problems arise when system managment decides to place the libraries in other places.

To use the program ncdump that comes with the NETCDF package, one would have to options: The first one, calling the program using it's full PATH:

/usr/local/netcdf/bin/ncdump

Or extending the PATH and calling the program by its name:

PATH=/usr/local/netcdf/bin:$PATH
ncdump

The same example, but now using modules

By loading the proper module, everything becomes much more simpler:

First check which versions of the NETCDF package are available on the system by issuing the following command

module load 2021
module avail netcdf

module avail netcdf, will display the available modules (for the all of the installed netcdf software) on the system. In this case we will chose netcdf version 4.8.0 which was compiled using the 2021 Compiler toolchain with GCC and OpenMPI.

module load netCDF/4.8.0-gompi-2021a
cc -o prog.c -lnetcdf
ncdump

On Snellius and Lisa it is no longer necessary to know where the NETCDF stuff is located, the only thing to remember is to issue the module load command. The module will handle everything to properly locate programs and shared libraries.

How modules work

The module load netCDF/4.8.0-gompi-2021a command modifies some environment variables:

  • The PATH variable is extended, so that the program ncdump can be found
  • Some environment variables are extended with the location of the include and library locations

This explains how the program ncdump can be found. To make it possible that the simple compile command works, the cc command has to know how to handle the environment variables that define the location of the include and library directories. To make that possible, a wrapper has been written that translates this information and calls the 'real' compiler as follows with the appropriate -I and -L flags. So cc is not the compiler itself, but a script that calls the 'real' compiler with the correct flags.

Most used module commands

  • module avail
    lists the modules that are available
  • module load modulename
    sets the environment up to use modulename
  • module add modulename
    same as previous
  • module display modulename
    shows what is done when you issue the command module add modulename
  • module unload modulename
    removes the environment neccessary for modulename
  • module rm modulename
    same as previous
  • module list
    shows the loaded modules
  • module help
    tells how to use the module command
  • module spider
    The module spider command reports all the modules that can be loaded on a system

Features

  • Users need to specify the full module names in order to properly load the modules. Default modules are disabled
    • Example: In order to load Python as a module you need to specify the version i.e. module load Python/3.9.5-GCCcore-10.3.0-bare
  • Users are able to specify a partial match of a version.
    • Example: So abc/17 will try to match the “best” abc/17.*.*
  • module avail and module spider will use case independent sorting.

Caveats using modules

  • Do not use the module command in login scripts (.bash_profile, .bashrc), unless really necessary. Load modules where necessary, for example in job scripts. For interactive use, place these commands in a file that you source at the beginning of a session. Placing module commands in login scripts can be the cause of a job that will not run, while it ran fine a month ago, because some minor changes in the login scripts were applied.
  • Use only the modules you need, and understand why they are needed.
  • Often, the order in which modules are loaded is important, for example to create a good PATH variable. It is good to realize, that loading a module that is already loaded has no effect: the module system 'remembers' which modules are already loaded. So, to reliably put the path of package 'one' before that of package 'two', irrespective of packages already loaded, specify:


   module unload one two
   module load two one

Modules and shells

The module command is shell-independent, but the implementation is shell-dependent:

  • bash: module is an exported shell function
  • csh/tcsh: module is an alias
  • ksh: module is a shell function
  • No labels