Different behavior of gmx_mpi with two similar setups

amin_sagar · November 15, 2022, 11:39pm

GROMACS version:2021.6
GROMACS modification: No
Dear Gromacs Users/developers,
I am trying to install Gromacs with MPI support on two workstations with 2 and 4 GPUs to run replica exchange simulations.
On the first one (with 2 GPUs), I have no problems and replica exchange simulations run fine with command

gmx_mpi mdrun -multidir rep0 rep1 rep2 rep3 rep4 rep5 rep6 rep7 rep8 rep9 rep10 rep11 rep12 rep13 rep14 -s rex-after-equil1 -deffnm rex-after-equil1 -cpi rex.cpt -hrex -replex 500 -plumed plumed.dat -nb gpu -bonded gpu -v -pin on -pme gpu -noappend

The PME tasks also seem to be offloaded to GPU as the log file says

2 GPUs selected for this run.
Mapping of GPU IDs to the 30 GPU tasks in the 15 ranks on this node:
  PP:0,PME:0,PP:0,PME:0,PP:0,PME:0,PP:0,PME:0,PP:0,PME:0,PP:0,PME:0,PP:0,PME:0,PP:0,PME:1,PP:1,PME:1,PP:1,PME:1,PP:1,PME:1,PP:1,PME:1,PP:1,PME:1,PP:1,PME:1,PP:1,PME:1

If I try to install the same version on the other workstation with 4GPUs, i first get one failed test

Mdrun was not able to distribute the requested non-bonded tasks to the available GPUs.
Will try again with following task assignment !

Abnormal return value for '/home/amin/softwares/openmpi/bin/mpiexec -np 6 -wdir /home/amin/softwares/gromacs-2021.6/build/tests/complex/nbnxn_vsite gmx_mpi mdrun        -notunepme >mdrun.out 2>&1' was -1
FAILED. Check mdrun.out, md.log file(s) in nbnxn_vsite for nbnxn_vsite

If I still install and try to run replica exchange simulations, I see a different behavior than the first workstation.
Using the same command, I get

Inconsistency in user input:
There were 30 GPU tasks found on node lvx0987, but 4 GPUs were available. If
the GPUs are equivalent, then it is usually best to have a number of tasks
that is a multiple of the number of GPUs. You should reconsider your GPU task
assignment, number of ranks, or your use of the -nb, -pme, and -npme options,
perhaps after measuring the performance you can get.

If I use

/home/amin/softwares/openmpi/bin/mpirun -np 15 gmx_mpi mdrun -multidir rep0 rep1 rep2 rep3 rep4 rep5 rep6 rep7 rep8 rep9 rep10 rep11 rep12 rep13 rep14 -s rex-after-equil1 -deffnm rex-after-equil1 -replex 500 -noappend -nb gpu -pme cpu -gputasks 000011112222333

The simulations run fine.
Also, if I reduce the number of replica’s to 12 i.e. a multiple of GPUs, the simulations run without specifying -gputasks
I am trying to understand why I am getting such behavior. The configuration of the two systems is quite similar.
First one (2 GPUs)

GROMACS version:    2021.6-plumed-2.7.5
Precision:          mixed
Memory model:       64 bit
MPI library:        MPI
OpenMP support:     enabled (GMX_OPENMP_MAX_THREADS = 64)
GPU support:        CUDA
SIMD instructions:  AVX2_256
FFT library:        fftw-3.3.8-sse2-avx-avx2-avx2_128
RDTSCP usage:       enabled
TNG support:        enabled
Hwloc support:      disabled
Tracing support:    disabled
C compiler:         /usr/bin/cc GNU 11.3.0
C compiler flags:   -mavx2 -mfma -Wno-missing-field-initializers -fexcess-precision=fast -funroll-all-loops -O3 -DNDEBUG
C++ compiler:       /usr/bin/c++ GNU 11.3.0
C++ compiler flags: -mavx2 -mfma -Wno-missing-field-initializers -fexcess-precision=fast -funroll-all-loops -fopenmp -O3 -DNDEBUG
CUDA compiler:      /usr/local/cuda/bin/nvcc nvcc: NVIDIA (R) Cuda compiler driver;Copyright (c) 2005-2022 NVIDIA Corporation;Built on Wed_Jun__8_16:49:14_PDT_2022;Cuda compilation tools, release 11.7, V11.7.99;Build cuda_11.7.r11.7/compiler.31442593_0
CUDA compiler flags:-std=c++17;-gencode;arch=compute_35,code=sm_35;-gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_52,code=sm_52;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-Wno-deprecated-gpu-targets;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_35,code=compute_35;-gencode;arch=compute_53,code=compute_53;-gencode;arch=compute_80,code=compute_80;-use_fast_math;-D_FORCE_INLINES;-mavx2 -mfma -Wno-missing-field-initializers -fexcess-precision=fast -funroll-all-loops -fopenmp -O3 -DNDEBUG
CUDA driver:        11.70
CUDA runtime:       11.70

Second one (4 GPUs)

GROMACS version:    2021.6
Precision:          mixed
Memory model:       64 bit
MPI library:        MPI
OpenMP support:     enabled (GMX_OPENMP_MAX_THREADS = 64)
GPU support:        CUDA
SIMD instructions:  AVX2_256
FFT library:        fftw-3.3.8-sse2-avx-avx2-avx2_128
RDTSCP usage:       enabled
TNG support:        enabled
Hwloc support:      disabled
Tracing support:    disabled
C compiler:         /usr/bin/cc GNU 11.1.0
C compiler flags:   -mavx2 -mfma -pthread -Wno-missing-field-initializers -fexcess-precision=fast -funroll-all-loops -O3 -DNDEBUG
C++ compiler:       /usr/bin/c++ GNU 11.1.0
C++ compiler flags: -mavx2 -mfma -pthread -Wno-missing-field-initializers -fexcess-precision=fast -funroll-all-loops -fopenmp -O3 -DNDEBUG
CUDA compiler:      /usr/local/cuda/bin/nvcc nvcc: NVIDIA (R) Cuda compiler driver;Copyright (c) 2005-2022 NVIDIA Corporation;Built on Wed_Sep_21_10:33:58_PDT_2022;Cuda compilation tools, release 11.8, V11.8.89;Build cuda_11.8.r11.8/compiler.31833905_0
CUDA compiler flags:-std=c++17;-gencode;arch=compute_35,code=sm_35;-gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_52,code=sm_52;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-Wno-deprecated-gpu-targets;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_35,code=compute_35;-gencode;arch=compute_53,code=compute_53;-gencode;arch=compute_80,code=compute_80;-use_fast_math;-D_FORCE_INLINES;-mavx2 -mfma -pthread -Wno-missing-field-initializers -fexcess-precision=fast -funroll-all-loops -fopenmp -O3 -DNDEBUG
CUDA driver:        11.70
CUDA runtime:       11.80

The first one is patched with plumed but the behavior is same with or without patching.
Both compilations were done using openmpi (4.1.2 for first, 4.1.4 for the second).

I would be really grateful for any suggestions to make the second installation behave like the first one i.e. automatically assign PME ranks to GPUs as well.

Best,
Amin.

Topic		Replies	Views
Mdrun detects GPUs but no GPU utilization User discussions mdrun , gpu	3	104	April 11, 2025
Alchemical calculations on GPU User discussions mdrun , gpu , free-energy	5	451	October 18, 2023
Fatal error: Setting the number of thread-MPI ranks is only supported with thread-MPI and GRO User discussions mdrun , mdrun-crash	0	1798	July 22, 2022
Offloading NB and PME to GPUs in multi-sim run User discussions mdrun , gpu , simulation-setup	1	222	January 22, 2024
Encountered crashed mdrun when request 2 GPUs on parallel simulations with 15 replicas User discussions mdrun , gpu , replica-exchange	0	55	November 15, 2024

Different behavior of gmx_mpi with two similar setups

Related topics