Modules and EasyBuild
The Davidson College Research Computing Department manages software installations on the cluster nodes using EasyBuild framework. This framework allows users to use environment modules, which in return provides access to software managed by the Research Computing Department on cluster nodes. Using software through environment modules ensures consistency and reliability by providing an isolated software installation with the necessary compilers and libraries.
Module Command
EasyBuild allows access to the environment modules as Lmod modules. You can access these modules using the module
command.
Retrieving Available Modules
To list available modules, use the module avail
command. In addition to listing available modules, you can see certain tags next to the module names. The (L)
tag is for loaded modules, and the (D)
tag indicates the default version of a module.
Example
On JupyterLab or on compute nodes, you can open a terminal window and run the module avail
command to list available modules.
When a new JupyterLab session is started, it loads certain modules by default. Here is an example output:
----------------------------- /opt/pub/modules/generic/Core -------------------------------
Bison/3.8.2 Go/1.22.1 foss/2020b
CERTMgr/3.0.3 Java/11.0.2 foss/2021a
CFSSL/1.6.0 Java/11.0.20 foss/2022a (D)
EasyBuild/4.8.1 Mathematica/13.2.1 iimpi/2022a
FastQC/0.11.9-Java-11 MinIO-Client imkl/2021.4.0
GCCcore/10.3.0 Miniconda3/24.3.0-0 (L) intel/2021a
GCCcore/11.3.0 (L,D) NVHPC/21.2 intel/2022a
(...)
In this output the meaning of the tags are as follows:
-Miniconda3 module with version 24.3.0-0 (shortened as Miniconda3/24.3.0-0) is loaded,
-foss/2022a is the default version for foss module, and
-GCCcore/11.3.0 module is the default version when GCCcore module is loaded, and this module is currently loaded.
Searching for Modules
Loading a module can require other modules to be loaded first. In order to check if a module is available to be used and which other modules needs to be loaded, run module spider [module-name]
command
Example
PyTorch-bundle
module requires GCC
and OpenMPI
modules to be loaded beforehand. This can be checked by module spider PyTorch-bundle
command. A snippet of the output can be as follows:
(...)
-----------------------------------------------------------------------------------------
PyTorch-bundle: PyTorch-bundle/1.12.1-CUDA-11.7.0
-----------------------------------------------------------------------------------------
Description:
PyTorch with compatible versions of official Torch extensions.
You will need to load all module(s) on any one of the lines below before the
"PyTorch-bundle/1.12.1-CUDA-11.7.0" module is available to load.
GCC/11.3.0 OpenMPI/4.1.4
(...)
Based on the output of the command, GCC/11.3.0 and OpenMPI/4.1.4 modules need to be loaded. Run the following command to load the PyTorch module: module load GCC/11.3.0 OpenMPI/4.1.4 PyTorch
. This command will load the modules in the listed order.
You can verify PyTorch module running by listing the available devices:
anscott:gpu0:~$ module purge ## Unload all of the modules
anscott:gpu0:~$ module load GCC OpenMPI PyTorch-bundle ## Load the modules in the hierarchical order
anscott:gpu0:~$ module load IPython ## Load iPython module to test pytorch
anscott:gpu0:~$ salloc --gpus 1 ## Allocate a GPU to be used
salloc: Granted job allocation 247278
salloc: Waiting for resource configuration
salloc: Nodes gpu0 are ready for job
bash-4.4$ ipython ## Start iPython session
Python 3.10.4 (main, May 13 2024, 00:59:34) [GCC 11.3.0]
Type 'copyright', 'credits' or 'license' for more information
IPython 8.14.0 -- An enhanced Interactive Python. Type '?' for help.
In [1]: import torch ## Import PyTorch library
In [2]: torch.cuda.is_available() ## Check if there are any GPUs available
Out[2]: True
In [3]: torch.cuda.device_count() ## Check the number of GPUs available
Out[3]: 1
Warning
Different versions of the same module may have varying dependencies. Therefore, some modules require specifying the version number when using the spider command.
Loading/Unloading Modules
Building software may require a series of compilers and libraries, called dependency chain
. In order to load a software, you must enable the dependencies it requires at first, only then it will be available to be loaded. Use module load [module-name]
to load modules, and module unload [module-name]
to unload modules.
Tip
It can be easier to use the Tab
button to use the autocomplete functionality of the commandline interface. For instance, you can use this functionality to list the available versions of a package. Press on Tab
twice after you type module load Julia
into your terminal, and it will display all of the versions available for the module Julia.
anscott:gpu0:~$ module load Julia ## Press on Tab twice for autocomplete
Julia Julia/1.7.0 Julia/1.7.2 Julia/1.8.5 Julia/1.9.3
Note
Use module purge
to unload all of the loaded modules.
Getting Information about Modules
If you would like to get more information regarding a module, use module help [module-name]
command.
Example
If you would like to learn more about the module PyTorch
, you can use the module help PyTorch
command.
----------------------- Module Specific Help for "PyTorch/1.12.1" --------------------------
Description
===========
Tensors and Dynamic neural networks in Python with strong GPU acceleration.
PyTorch is a deep learning framework that puts Python first.
More information
================
- Homepage: https://pytorch.org/
Collections
When you are working on a project, loading each module repedeately can be cumbersome. In order to save the modules you currently have loaded, you can save them as a collection. Collections are only available to individual users, and they persist through sessions.
Creating Collections
To save the current modules, you can use the module save [collection-name]
command.
Note
Good to start with module purge when you are starting a new session.
Example
Loading TensorFlow/2.11.0-CUDA-11.7.0
requires GCC
and OpenMPI
modules to be loaded first. Once you load the modules you want, you can save them as a collection named 'tensor-modules'.
anscott:gpu0:~$ module purge
anscott:gpu0:~$ module load GCC OpenMPI TensorFlow/2.11.0-CUDA-11.7.0
anscott:gpu0:~$ module save tensor-modules
Saved current collection of modules to: "tensor-modules"
Now, newly created tensor-modules collection can be restored in a future session, which only loads GCC, OpenMPI and TensorFlow modules.
Listing Collections
To list collections you have created, you can use the module savelist
command.
Example
anscott:gpu0:~$ module savelist
Named collection list :
1) tensor-modules
Listing Contents of a Collection
To list the modules included in a collection, you can use the module describe [collection-name]
command.
Example
In order to list the modules loaded in the collection created in the previous example, use the module describe tensor-modules
command.
anscott:gpu0:~$ module describe tensor-modules
Collection "tensor-modules" contains:
1) GCCcore/11.3.0 15) ncurses/6.3 29) double-conversion/3.2.0
2) binutils/2.38 16) libreadline/8.1.2 30) flatbuffers/2.0.7
3) GCC 17) SQLite/3.38.3 31) giflib/5.2.1
4) slurm/22.05.11 18) GMP/6.2.1 32) ICU/71.1
5) PMIx/4.1.2 19) libffi/3.4.2 33) JsonCpp/1.9.5
6) UCX/1.12.1 20) OpenSSL/1.1 34) NASM/2.15.05
7) UCC/1.0.0 21) Python/3.10.4 35) libjpeg-turbo/2.1.3
8) libfabric/1.15.1 22) pybind11/2.9.2 36) LMDB/0.9.29
9) OpenMPI 23) SciPy-bundle/2022.05 37) nsync/1.25.0
10) OpenBLAS/0.3.20 24) Szip/2.1.1 38) protobuf/3.19.4
11) FlexiBLAS/3.2.0 25) HDF5/1.12.2 39) protobuf-python/3.19.4
12) FFTW/3.3.10 26) h5py/3.7.0 40) snappy/1.1.9
13) FFTW.MPI/3.3.10 27) cURL/7.83.0 41) networkx/2.8.4
14) ScaLAPACK/2.2.0-fb 28) dill/0.3.6 42) TensorFlow/2.11.0
Restoring Collections
To load a collection, you can use the module restore [collection-name]
command.
Example
In order to load the collection created in the previous example, use the module restore tensor-modules
command.
anscott:gpu0:~$ module restore tensor-modules
Restoring modules from user's tensor-modules
Deleting a Collection
To delete a collection, you can use the module disable [collection-name]
command.
Example
In order to delete the collection created in the previous example, use the module disable tensor-modules
command.
anscott:gpu0:~$ module disable tensor-modules
Disabling tensor-modules collection by renaming with a "~"
Further Reading
- You can read more on how to use module command using the
man module
command. - Using module and conda to manage packages can result in overwriting the same environment variables. If you activate an environment with conda first, and then load a module with module, then module will take precedence as it may overwrite the environment variables.
- If you need a new module to be installed, contact ti@davidson.edu.