HCLM

Introduction and general information

We have our own computational cluster (10 nodes, 96 CPU cores and ~1Tb RAM each) located at:

hclm.ifp.tuwien.ac.at

To monitor the usage of resources see this page.

Data storage

There are two available locations to store your data: /home directory and /mnt/scratch.

The capacity of the /home directory is ~837 GiB, currently there are no user quotas and it is meant to contain only the important data needed for computations or recent results, i.e. not suitable for the long-term storage. It is provided as an NFS-share and the data is located at login node.

The capacity of the /mnt/scratch is ~8.2 TiB, but it is meant as a short-term storage to perform calculations that require big amount of data to be stored during the computation. It is provided as a parallel file system (BeeGFS) and the data is distributed among compute nodes (~850 GiB on each node).

For an easier access, use $DATA environmental variable to access you location at /mnt/scratch.

For the long-term storage use hclmbck data server.

Software environment

To be able to handle multiple versions of various libraries and programs, the module environment system is employed (modules are generated via spack).

To view the list of available modules type:

module avail

To view the list of currently loaded modules use:

module list

To load the module use:

module load module_name

To unload all used modules use:

module purge

For further possibilities issue module help.

Note that some libraries loaded via module might interfere with system libraries preventing some of the system-wide tools from a normal operation (for example a segfault can occur; some of the impacted utilities are nano and ncdu). If you encounter such an issue, try to mitigate it by unloading modules.

SSH setup

For an easier access one might want to set up ssh-config file, ~/.ssh/config:

Host hclm                                                                       
  User username                                                                   
  Hostname hclm.ifp.tuwien.ac.at

And use the following command to login:

ssh hclm

For more info about ssh config file see this page.

To keep the ssh session open for prolonged time, use the following keyword:

ServerAliveInterval 60

SLURM

The slurm system is used to manage the queue with computational jobs. Use sinfo to see the information on the current setup and available computational resources.

Here is a template for your run.sh file:

#!/bin/bash

#SBATCH -J JOB_NAME
#SBATCH -N 1
#SBATCH --tasks-per-node=96
#SBATCH --partition=compute
# 3 days walltime, the format is MM:SS, or HH:MM:SS, or D-HH:MM:SS
#SBATCH -t 3-0:00:00

# environment variables to set
export OMP_NUM_THREADS=1

# modules to load
module purge
module load --auto w2dynamics/1.1.5-gcc-11.4.0-33hh33i

# commands to run
mpirun -n $SLURM_TASKS_PER_NODE DMFT.py input_file.in

It will ask for one node, request 96 tasks/threads to be available and see that it does not run more than three days (walltime limit).

To submit your job use:

sbatch run.sh

To check the status of your jobs:

squeue -u $USER

To cancel the job:

scancel job_id

To run a bash session interactively on the compute node, use the following command:

srun -N 1 --pty bash

Julia

Please install julia and setup the proper environment yourself following the instructions from the official page.
If you're new to julia and want to get introduced to the current workflow, check this link.

w2dynamics

There are two versions of w2dynamics installed on the cluster, 1.1.4 and 1.1.5. Use the following commands to load either of them:

module load --auto w2dynamics/1.1.4-gcc-11.4.0-4e7xlay

module load --auto w2dynamics/1.1.5-gcc-11.4.0-33hh33i

Wien2k

Wien2k version 23.2 is installed in /opt/WIEN2k_23.2 directory. For the initial setup of your environment run /opt/WIEN2k_23.2/userconfig tool and answer the relevant questions. One would need to load the following modules:

module load --auto intel-oneapi-compilers/2021.3.0-gcc-11.4.0-akvxchv
module load --auto intel-oneapi-mkl/2021.4.0-gcc-11.4.0-p7fre5c
module load --auto intel-oneapi-mpi/2021.12.0-gcc-11.4.0-ywfnwb7
module load --auto fftw/3.3.10-gcc-11.4.0-2wmq6zs

Here is an example of run job script that will use 4 OpenMP threads and parallelize the calculation over 24 k-points (mind that for small systems the overhead to perform k-parallelization might be too big and make the parallelization of no use at all):

#!/bin/bash

#SBATCH -J JOB_NAME
#SBATCH -N 1
#SBATCH --tasks-per-node=96
#SBATCH --partition=compute
# 3 days walltime, the format is MM:SS, or HH:MM:SS, or D-HH:MM:SS
#SBATCH -t 3-0:00:00 

# environment variables to set
# use 4 OpenMP threads
export OMP_NUM_THREADS=4

# modules to load
module purge
module load --auto intel-oneapi-compilers/2021.3.0-gcc-11.4.0-akvxchv
module load --auto intel-oneapi-mkl/2021.4.0-gcc-11.4.0-p7fre5c
module load --auto intel-oneapi-mpi/2021.12.0-gcc-11.4.0-ywfnwb7
module load --auto fftw/3.3.10-gcc-11.4.0-2wmq6zs

# commands to run
# initialize wien2k calculation with high precision setting
init_lapw -prec 3

# create .machines files for k-parallelization with 24 k-points running
# in parallel

> .machines
for (( i=1; i<=24; i++ )); do
        echo 1:localhost >> .machines
done

# run wien2k calculation in parallel mode
run_lapw -p

# save the run
save_lapw scf

If one is interested in wannierization, load the wannier90 too:

module load --auto wannier90/3.1.0-gcc-11.4.0-rxoj6qv

Table of contents

Page tree