Running Gromacs in HPC

GROMACS version: 2022.5
GROMACS modification: No
Hi everyone!

I am trying to optimize the duration of my simulations using Gromacs on a HPC cluster. I am using Lysozyme in water tutorial for this optimization exercise. I am submitting my jobs using the Slurm script. My current slurm script looks something like this:

#!/bin/bash
#SBATCH --job-name=Lysozyme_in_water_Job
#SBATCH --ntasks=1
#SBATCH --nodes=1
#SBATCH --cpus-per-task=70
#SBATCH --time=01:00:00
#SBATCH --partition=standard
#SBATCH --account=lab

module load gromacs/2022.5
srun gmx mdrun -deffnm md_1ns -v

The simulation speed that I am getting here is:
Core t (s) Wall t (s) (%)
Time: 2538.208 36.267 6998.7
(ns/day) (hour/ns)
Performance: 238.239 0.101

Now when i am trying to increase the number of nodes from one to four using the following slurm script:

#SBATCH --job-name=Lysozyme_in_water_Job
#SBATCH --ntasks=4
#SBATCH --nodes=4
#SBATCH --cpus-per-task=70
#SBATCH --time=01:00:00
#SBATCH --partition=standard
#SBATCH --account=lab

module load gromacs/2022.5
srun gmx mdrun -deffnm md_1ns -v

The performance is:
Core t (s) Wall t (s) (%)
Time: 29466.398 420.952 6999.9
(ns/day) (hour/ns)
Performance: 20.525 1.169

Now I am a bit confused regarding this since increasing the number of nodes should speedup the process; however, in my case the performance is going down. Can someone help me with this. I am sure I am missing out on something.

Thanks in advance

Hi,
here you find some materials how to tune GROMACS performance on CPU. As I guess is your case. You can look at the following hands-on

Depending on the reason of the tuning, you can choice the inputs suggested in the hands-on or specific system that you want to run in the future.
\Alessandra

In general you should get a better performance when you go from 1 to 4 nodes (70 to 280 cpu cores). You should check the log file to make sure all four nodes are used.

You should also check that you are using MPI. In most cases you’d need gmx_mpi to run using MPI. But in some installations it might be renamed gmx. I’d expect a command line more similar to:
mpirun -np 4 gmx_mpi mdrun -deffnm md_1ns
But it may vary from one HPC system to another.

Many Thanks @alevilla for your help. HackMD tutorial looks great to me. I will try to go through this.

Thanks

Hi Alessandra,

Many thanks for your help. I was following up the tutorial given in HackMD. I tried using the following SLURM script for my job:

#!/bin/bash

#SBATCH --job-name=Lysozyme_in_water_Job
#SBATCH --time=01:00:00 # maximum execution time of 10 minutes
#SBATCH --nodes=3 # requesting 1 compute node
#SBATCH --ntasks=3 # use 1 MPI rank (task)
#SBATCH --cpus-per-task=70 # modify this number of CPU cores per MPI task
#SBATCH --partition=standard
#SBATCH --account=lab

load the necessary modules

module load gromacs/2022.5

#name of the executable

exe=“gmx_mpi”

export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK

srun gmx_mpi mdrun -v -deffnm md_1ns

When I executed this I got the following error:
No OpenFabrics connection schemes reported that they were able to be
used on a specific port. As such, the openib BTL (OpenFabrics
support) will be disabled for this port.

Local host: r1u35n1
Local device: bnxt_re1
Local port: 1
CPCs attempted: rdmacm


No OpenFabrics connection schemes reported that they were able to be
used on a specific port. As such, the openib BTL (OpenFabrics
support) will be disabled for this port.

Local host: r1u14n2
Local device: bnxt_re1
Local port: 1
CPCs attempted: rdmacm


No OpenFabrics connection schemes reported that they were able to be
used on a specific port. As such, the openib BTL (OpenFabrics
support) will be disabled for this port.

Local host: r1u33n2
Local device: bnxt_re1
Local port: 1
CPCs attempted: rdmacm


At least one pair of MPI processes are unable to reach each other for
MPI communications. This means that no Open MPI device has indicated
that it can be used to communicate between these processes. This is
an error; Open MPI requires that all MPI processes be able to reach
each other. This error can sometimes be the result of forgetting to
specify the “self” BTL.

~
Could you please give your suggestion.

many thanks in advance.

Hi,
Sorry I forgot to mention when I link the material. As you probably notice each tutorial (I have linked) differs according to HPC system for which it was designed. In general slurm can be configured differently in different HPC cluster. Module can be called or label different in HPC centra. One can not take one example and implement directly on another system. Please check the documentation of the HPC center or ask directly for help.
I hope it help
Alessandra