HCLM

Introduction and general information

We have our own computational cluster (10 nodes, 96 CPU cores and ~1Tb RAM each) located at:

hclm.ifp.tuwien.ac.at

To monitor the usage of resources see this page.

Issues

Currently it's not possible to use openmpi for multinode computations. However, this can be done with intel-mpi. If possible consider using it instead as a workaround to be able to run multinode computations.

Data storage

There are two available locations to store your data: /home directory and /mnt/scratch.

The capacity of the /home directory is ~787 GiB, currently there are no user quotas and it is meant to contain only the important data needed for computations or recent results, i.e. not suitable for the long-term storage. It is provided as an NFS-share and the data is located at login node.

The capacity of the /mnt/scratch is ~8.2 TiB, but it is meant as a short-term storage to perform calculations that require big amount of data to be stored during the computation. It is provided as a parallel file system (BeeGFS) and the data is distributed among compute nodes (~850 GiB on each node).

For an easier access, use $DATA environmental variable to access you location at /mnt/scratch.

For the long-term storage use hclmbck data server, which is also available as NFS-share at /mnt/hclmbck. For convenience, the following environmental variables are defined:

BACKUP  -> /mnt/hclmbck/BACKUP/$USER
ARCHIVE -> /mnt/hclmbck/ARCHIVE/$USER
OLDHCLM -> /mnt/hclmbck/OLDHCLM/home/$USER

Software environment

To be able to handle multiple versions of various libraries and programs, the module environment system is employed (modules are generated via spack).

To view the list of available modules type:

module avail

To view the list of currently loaded modules use:

module list

To load the module use:

module load module_name

To unload all used modules use:

module purge

For further possibilities issue module help.

Note that some libraries loaded via module might interfere with system libraries preventing some of the system-wide tools from a normal operation (for example a segfault can occur; some of the impacted utilities are nano, ncdu, and htop). If you encounter such an issue, try to mitigate it by unloading modules.

SSH setup

For an easier access one might want to set up ssh-config file, ~/.ssh/config:

Host hclm                                                                       
  User username                                                                   
  Hostname hclm.ifp.tuwien.ac.at

And use the following command to login:

ssh hclm

For more info about ssh config file see this page.

To keep the ssh session open for prolonged time, use the following keyword:

ServerAliveInterval 60

SLURM

The slurm system is used to manage the queue with computational jobs. Use sinfo to see the information on the current setup and available computational resources.

Partitions

There are three partitions on the cluster:

short for jobs that should run less than 3 hours (3 nodes)
long for jobs that should run more than 3 hours (5 nodes)
cqs for long-term jobs of Computational Quantum Science research group (2 nodes)

Submission script example

Here is a template for your run.sh file:

#!/bin/bash

#SBATCH -J JOB_NAME
#SBATCH -N 1
#SBATCH --ntasks-per-node=96
#SBATCH --partition=long
# 3 days walltime, the format is MM:SS, or HH:MM:SS, or D-HH:MM:SS
#SBATCH -t 3-0:00:00

# environment variables to set
export OMP_NUM_THREADS=1

# modules to load
module purge
module load --auto w2dynamics/1.1.5-gcc-11.4.0-33hh33i

# commands to run
mpirun -n $SLURM_NTASKS DMFT.py input_file.in

It will ask for one node, request 96 tasks/threads to be available and see that it does not run more than three days (walltime limit).

Multiple jobs per node

It is possible to submit smaller jobs that require only a few tasks and have them automatically assigned to the same node. For that one needs to specify an estimate of the required memory that the job will consume via --mem flag:

#!/bin/bash

#SBATCH -J JOB_NAME
#SBATCH -N 1
#SBATCH --ntasks=4
#SBATCH --cpus-per-task=2
#SBATCH --partition=long
# Different units can be specified using the suffix [K|M|G|T].
#SBATCH --mem=100G

mpirun -n $SLURM_NTASKS myjob

In this example the job will promise not to use more than 100Gb of RAM and will use only 4 CPU cores. Note: in this example it was assumed that there is no benefit of using the hyperthreding. Thus, to avoid overcommitting too many jobs on a single node, i.e. go beyond 96 tasks per node, the --cpus-per-task flag is set to two, since scheduler allows the usage of up to 192 CPU cores at a single node.

For the description of parameters for the submission script, see the following page.

SLURM commands

To submit your job use:

sbatch run.sh

To check the status of your jobs:

squeue -u $USER

To cancel the job:

scancel job_id

To run a bash session interactively on the compute node, use the following command:

srun -N 1 --pty bash -i

To get an estimate when the job in the queue would possible be running, use the command:

squeue -j %JOBID% --start

with %JOBID% replaced with the job id of your job. Note that this relies on a specified wall-times for each job, so might no be that much reliable for long and cqs partitions.

House rules

Currently, the slurm system is set up in the FIFO convention and to make the cluster available, the suggestion is to employ the following house rules:

if one submits a lot of jobs that do not run too long (let's say up to 2-3 hours), one shall reduce their priority (use --nice flag and set it to the positive value not higher than 4294898063); that way other people can wedge their jobs in between;
if one submits a lot of long-running jobs, one shall not occupy all of the nodes, i.e. exclude some of the nodes from the execution (there is --exclude flag).

If you need to communicate something with all of the users of the cluster, please use the following mailing list: hclm_users@list.tuwien.ac.at

Library specifics

The LAPACK and BLAS libraries are provided by either openblas or intel-oneapi-mkl.
For the MPI interface there are two providers as well: openmpi or intel-oneapi-mpi.
For hdf5 with c++ and fortran support, use the hdf5/1.14.3-gcc-11.4.0-vkldg6y module.

Julia

Please install julia and setup the proper environment yourself following the instructions from the official page.
If you're new to julia and want to get introduced to the current workflow, check this link.

w2dynamics

There are two versions of w2dynamics installed on the cluster, 1.1.4 and 1.1.5. Use the following commands to load either of them:

module load --auto w2dynamics/1.1.4-gcc-11.4.0-4e7xlay

module load --auto w2dynamics/1.1.5-gcc-11.4.0-33hh33i

To get a more accurate resolution of the self-energy, you might want to try a worm-sampling method (add the following to your input file):

[General]
FTType              = none_worm
SelfEnergy          = symmetric_improved_worm

[QMC]
WormMeasQQ            = 1
PercentageWormInsert  = 0.3
PercentageWormReplace = 0.1
WormSearchEta         = 1

Wien2k

Wien2k version 24.1 is installed in /opt/WIEN2k_24.1 directory. For the initial setup of your environment run /opt/WIEN2k_24.1/userconfig tool and answer the relevant questions. One would need to load the following modules:

module load --auto intel-oneapi-compilers/2021.3.0-gcc-11.4.0-akvxchv
module load --auto intel-oneapi-mkl/2021.4.0-gcc-11.4.0-p7fre5c
module load --auto intel-oneapi-mpi/2021.12.0-gcc-11.4.0-ywfnwb7
module load --auto fftw/3.3.10-gcc-11.4.0-2wmq6zs

Here is an example of run job script that will use 4 OpenMP threads and parallelize the calculation over 24 k-points (mind that for small systems the overhead to perform k-parallelization might be too big and make the parallelization of no use at all):

#!/bin/bash

#SBATCH -J JOB_NAME
#SBATCH -N 1
#SBATCH --ntasks-per-node=96
#SBATCH --partition=long
# 3 days walltime, the format is MM:SS, or HH:MM:SS, or D-HH:MM:SS
#SBATCH -t 3-0:00:00 

# environment variables to set
# use 4 OpenMP threads
export OMP_NUM_THREADS=4

# modules to load
module purge
module load --auto intel-oneapi-compilers/2021.3.0-gcc-11.4.0-akvxchv
module load --auto intel-oneapi-mkl/2021.4.0-gcc-11.4.0-p7fre5c
module load --auto intel-oneapi-mpi/2021.12.0-gcc-11.4.0-ywfnwb7
module load --auto fftw/3.3.10-gcc-11.4.0-2wmq6zs

# commands to run
# initialize wien2k calculation with high precision setting
init_lapw -prec 3

# create .machines files for k-parallelization with 24 k-points running
# in parallel

> .machines
for (( i=1; i<=24; i++ )); do
        echo 1:localhost >> .machines
done

# run wien2k calculation in parallel mode
run_lapw -p

# save the run
save_lapw scf

If one is interested in wannierization, load the wannier90 and python too:

module load --auto wannier90/3.1.0-gcc-11.4.0-rxoj6qv
module load python/3.11.7-gcc-11.4.0-cdz73gs
module load py-numpy/1.26.4-gcc-11.4.0-szszv5k

For the tutorials on wien2k see this and this documents. And for the lectures covering the theory behind wien2k check this workshop.

USPEX

USPEX is installed in /opt/USPEX but since the license is provided on individual basis, you'd need to register here and after that ask HCLM admins to add you to the corresponding group.

Modules to load:

module load py-numpy/1.26.4-gcc-11.4.0-szszv5k 
module load py-scipy/1.11.4-gcc-11.4.0-vm2oiki
module load py-spglib/2.0.2-gcc-11.4.0-diunhli
module load sqlite/3.43.2-gcc-11.4.0-bwyqrlw
module load py-ase/3.21.1-gcc-11.4.0-ltiwsxu
module load py-matplotlib/3.8.3-gcc-11.4.0-o3qu62w

One needs to set up the following environmental variables in .bashrc:

#### ------------- USPEX v.10.5.0 ------------- ####
export MCRROOT=/opt/USPEX/USPEX_v10.5
export PATH=/opt/USPEX/USPEX_v10.5/application/archive/:$PATH
export USPEXPATH=/opt/USPEX/USPEX_v10.5/application/archive/src
###----------------------------------------------

There is a problem with loading randomTopology module, as a workaround use:

0.00  : fracTopRand

in the INPUT.txt file, i.e. disable the corresponding feature.

In order to be able to use local submission feature, copy the /opt/USPEX/USPEX_templates/Submission directory into the directory where the calculation takes place.

To utilize the job parameters set up through slurm, use the following command in commandExecutable block of INPUT.txt:

mpirun -n \$SLURM_NTASKS --bind-to none vasp_std > log

One of the possible workflows is to have a running script

#!/bin/sh
TIME=300
while [ ! -f ./USPEX_IS_DONE ] ; do
   date >> log
   USPEX -r >> log
   sleep $TIME
done

that is started on the login node using screen, tmux or nohup depending on your personal choice (this is necessary for the script to keep running after you log out from the cluster).
TIME variable should be adjusted depending on how long you expect a single calculation to run, but the exact value is not crucial: it could only be inefficient if the value is too big.

Table of contents

1Introduction and general information
2Issues
3Data storage
4Software environment
5SSH setup
6SLURM
- 6.1Partitions
- 6.2Submission script example
- 6.3Multiple jobs per node
- 6.4SLURM commands
7House rules
8Library specifics
9Julia
10w2dynamics
11Wien2k
12USPEX

Page tree