Keywords in command line for getting good performance in gromacs

I am trying to run gromacs 2023.2 gpu on HPC. I am getting super slow performance in HPC. I run the calculation for the test purpose on 8 cores for 10 mins only. The projected steps of calculation is about 15,000 in 10 minuites, unfortunately couldn’t get performance upto mark. The script file and the last comments in output file are as shown below.
SLURM SCRIPT FILE
#!/bin/bash
#SBATCH -N 1
#SBATCH --ntasks-per-node=8
#SBATCH --gres=gpu:A100-SXM4:1
#SBATCH --partition=testp
#SBATCH --time=00:10:00
#SBATCH --error=error_test.%J.err
#SBATCH --output=output_test.%J.out

echo “Starting at date
echo “Running on hosts: $SLURM_NODELIST”
echo “Running on $SLURM_NNODES nodes.”
echo “Running $SLURM_NTASKS tasks.”
echo “Job id is $SLURM_JOBID”
echo “Job submission directory is : $SLURM_SUBMIT_DIR”
cd $SLURM_SUBMIT_DIR

source /opt/hpcx-v2.9.0-gcc-MLNX_OFED_LINUX-5.4-1.0.3.0-ubuntu20.04-x86_64/env.sh

source /nlsasfs/home/groupiiiv/sarthakt/software_cdac/gromacs-2023.2/install/bin/GMXRC

mpirun -mca pml ucx -x UCX_NET_DEVICES -np 8 /nlsasfs/home/groupiiiv/sarthakt/software_cdac/gromacs-2023.2/build/bin/gmx_mpi mdrun -ntomp 4 --deffnm md_0_10 -cpi md_0_10.cpt -noappend

OUTPUT FILE
Started mdrun on rank 0 Thu Oct 26 11:14:28 2023

** Step Time**
** 0 0.00000**

** Energies (kJ/mol)**
** Bond U-B Proper Dih. Improper Dih. CMAP Dih.**
** 5.17908e+03 1.42265e+04 1.66364e+04 8.73544e+02 -8.25547e+02**
** LJ-14 Coulomb-14 LJ (SR) Coulomb (SR) Coul. recip.**
** 5.15878e+03 7.27696e+04 8.95562e+04 -1.15064e+06 3.48482e+03**
** Potential Kinetic En. Total Energy Conserved En. Temperature**
** -9.43579e+05 1.87862e+05 -7.55718e+05 -7.55674e+05 3.01866e+02**
** Pressure (bar) Constr. rmsd**
** 2.79474e+02 2.98607e-06**

DD step 99 load imb.: force 5.3%
step 600: timed with pme grid 56 56 56, coulomb cutoff 1.200: 65983.2 M-cycles
step 800: timed with pme grid 48 48 48, coulomb cutoff 1.400: 58358.5 M-cycles
step 1000: timed with pme grid 44 44 44, coulomb cutoff 1.527: 60890.5 M-cycles
step 1200: timed with pme grid 40 40 40, coulomb cutoff 1.680: 60729.1 M-cycles
step 1400: timed with pme grid 36 36 36, coulomb cutoff 1.866: 58230.7 M-cycles
step 1400: the maximum allowed grid scaling limits the PME load balancing to a coulomb cut-off of 1.866
step 1600: timed with pme grid 36 36 36, coulomb cutoff 1.866: 60691.8 M-cycles
step 1800: timed with pme grid 40 40 40, coulomb cutoff 1.680: 56804.9 M-cycles
step 2000: timed with pme grid 42 42 42, coulomb cutoff 1.600: 61399.7 M-cycles

Received the TERM signal, stopping within 100 steps

THANK ALL OF YOU IN ADVANCE. FEEL FREE TO ASK MORE DETAILS, SO THAT I CAN START MY CALCULATIONS HAPPILY. :D
GROMACS version:
GROMACS modification: Yes/No
Here post your question
CAN ANYONE SUGGEST ME THE SOLUTIONS TO ENHANCE PERFORMANCE?

If you are running on 8 cores of a single node, there is no reason to use MPI. You are also specifying 8 MPI tasks with 4 OpenMP threads per task, so you’re probably just overloading the hardware. gmx mdrun -nt 8 is sufficient if using 8 cores.

Also note that mdrun is still doing some tuning of cutoffs so the performance report will not be accurate. You need to run for a larger number of steps and should use -resethway to reset the timing calculation to omit the tuning part.

Thank you sir for your immediate response. I will try your advice and update you soon. Thank you so much again.

Regards,
Sarthak

Hi all, Related to the issue addressed above I am getting the term signal error regularly,

for now I have tried my production runs with
nohup gmx mdrun -s step7.tpr -v -deffnm step7
nohup mpirun -np 4 gmx_mpi mdrun -s step7.tpr -deffnm step7_1 -ntomp 16 -gpu_id 0123
nohup mpirun -np 3 gmx_mpi mdrun -s step7.tpr -deffnm step7_1 -ntomp 12 -gpu_id 012
Although we are in process of creating slurm Workload Management system, I expect trial run to complete smoothly. Can you please guide me with the right flags for simulation system with 214297 atoms. Also, Previously i never felt the need of specifying GPUs for -nb -pme etc tags. please suggest if i should use them ideally? I am using gromacs 2025 and below are our cluster details

$ lscpu
lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 112
On-line CPU(s) list: 0-111
Thread(s) per core: 2
Core(s) per socket: 28
Socket(s): 2
NUMA node(s): 2
Vendor ID: GenuineIntel
CPU family: 6
Model: 106
Model name: Intel(R) Xeon(R) Gold 6330 CPU @ 2.00GHz
Stepping: 6
CPU MHz: 2000.000
CPU max MHz: 3100.0000
CPU min MHz: 800.0000
BogoMIPS: 4000.00
Virtualization: VT-x
L1d cache: 48K
L1i cache: 32K
L2 cache: 1280K
L3 cache: 43008K
NUMA node0 CPU(s): 0-27,56-83
NUMA node1 CPU(s): 28-55,84-111

$ lspci | grep -i --color ‘vga|3d|2d’
04:00.0 VGA compatible controller: ASPEED Technology, Inc. ASPEED Graphics Family (rev 41)
0a:00.0 3D controller: NVIDIA Corporation GA102GL [A10] (rev a1)
0b:00.0 3D controller: NVIDIA Corporation GA102GL [A10] (rev a1)
0e:00.0 3D controller: NVIDIA Corporation GA102GL [A10] (rev a1)
12:00.0 3D controller: NVIDIA Corporation GA102GL [A10] (rev a1)
21:00.0 3D controller: NVIDIA Corporation GA102GL [A10] (rev a1)
22:00.0 3D controller: NVIDIA Corporation GA102GL [A10] (rev a1)
25:00.0 3D controller: NVIDIA Corporation GA102GL [A10] (rev a1)
29:00.0 3D controller: NVIDIA Corporation GA102GL [A10] (rev a1)
6a:00.0 Non-Volatile memory controller: Intel Corporation NVMe DC SSD [3DNAND, Sentinel Rock Controller]
6b:00.0 Non-Volatile memory controller: Intel Corporation NVMe DC SSD [3DNAND, Sentinel Rock Controller]