GROMACS version:2021.6
GROMACS modification: No
Dear Gromacs Users/developers,
I am trying to install Gromacs with MPI support on two workstations with 2 and 4 GPUs to run replica exchange simulations.
On the first one (with 2 GPUs), I have no problems and replica exchange simulations run fine with command
gmx_mpi mdrun -multidir rep0 rep1 rep2 rep3 rep4 rep5 rep6 rep7 rep8 rep9 rep10 rep11 rep12 rep13 rep14 -s rex-after-equil1 -deffnm rex-after-equil1 -cpi rex.cpt -hrex -replex 500 -plumed plumed.dat -nb gpu -bonded gpu -v -pin on -pme gpu -noappend
The PME tasks also seem to be offloaded to GPU as the log file says
2 GPUs selected for this run.
Mapping of GPU IDs to the 30 GPU tasks in the 15 ranks on this node:
PP:0,PME:0,PP:0,PME:0,PP:0,PME:0,PP:0,PME:0,PP:0,PME:0,PP:0,PME:0,PP:0,PME:0,PP:0,PME:1,PP:1,PME:1,PP:1,PME:1,PP:1,PME:1,PP:1,PME:1,PP:1,PME:1,PP:1,PME:1,PP:1,PME:1
If I try to install the same version on the other workstation with 4GPUs, i first get one failed test
Mdrun was not able to distribute the requested non-bonded tasks to the available GPUs.
Will try again with following task assignment !
Abnormal return value for '/home/amin/softwares/openmpi/bin/mpiexec -np 6 -wdir /home/amin/softwares/gromacs-2021.6/build/tests/complex/nbnxn_vsite gmx_mpi mdrun -notunepme >mdrun.out 2>&1' was -1
FAILED. Check mdrun.out, md.log file(s) in nbnxn_vsite for nbnxn_vsite
If I still install and try to run replica exchange simulations, I see a different behavior than the first workstation.
Using the same command, I get
Inconsistency in user input:
There were 30 GPU tasks found on node lvx0987, but 4 GPUs were available. If
the GPUs are equivalent, then it is usually best to have a number of tasks
that is a multiple of the number of GPUs. You should reconsider your GPU task
assignment, number of ranks, or your use of the -nb, -pme, and -npme options,
perhaps after measuring the performance you can get.
If I use
/home/amin/softwares/openmpi/bin/mpirun -np 15 gmx_mpi mdrun -multidir rep0 rep1 rep2 rep3 rep4 rep5 rep6 rep7 rep8 rep9 rep10 rep11 rep12 rep13 rep14 -s rex-after-equil1 -deffnm rex-after-equil1 -replex 500 -noappend -nb gpu -pme cpu -gputasks 000011112222333
The simulations run fine.
Also, if I reduce the number of replica’s to 12 i.e. a multiple of GPUs, the simulations run without specifying -gputasks
I am trying to understand why I am getting such behavior. The configuration of the two systems is quite similar.
First one (2 GPUs)
GROMACS version: 2021.6-plumed-2.7.5
Precision: mixed
Memory model: 64 bit
MPI library: MPI
OpenMP support: enabled (GMX_OPENMP_MAX_THREADS = 64)
GPU support: CUDA
SIMD instructions: AVX2_256
FFT library: fftw-3.3.8-sse2-avx-avx2-avx2_128
RDTSCP usage: enabled
TNG support: enabled
Hwloc support: disabled
Tracing support: disabled
C compiler: /usr/bin/cc GNU 11.3.0
C compiler flags: -mavx2 -mfma -Wno-missing-field-initializers -fexcess-precision=fast -funroll-all-loops -O3 -DNDEBUG
C++ compiler: /usr/bin/c++ GNU 11.3.0
C++ compiler flags: -mavx2 -mfma -Wno-missing-field-initializers -fexcess-precision=fast -funroll-all-loops -fopenmp -O3 -DNDEBUG
CUDA compiler: /usr/local/cuda/bin/nvcc nvcc: NVIDIA (R) Cuda compiler driver;Copyright (c) 2005-2022 NVIDIA Corporation;Built on Wed_Jun__8_16:49:14_PDT_2022;Cuda compilation tools, release 11.7, V11.7.99;Build cuda_11.7.r11.7/compiler.31442593_0
CUDA compiler flags:-std=c++17;-gencode;arch=compute_35,code=sm_35;-gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_52,code=sm_52;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-Wno-deprecated-gpu-targets;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_35,code=compute_35;-gencode;arch=compute_53,code=compute_53;-gencode;arch=compute_80,code=compute_80;-use_fast_math;-D_FORCE_INLINES;-mavx2 -mfma -Wno-missing-field-initializers -fexcess-precision=fast -funroll-all-loops -fopenmp -O3 -DNDEBUG
CUDA driver: 11.70
CUDA runtime: 11.70
Second one (4 GPUs)
GROMACS version: 2021.6
Precision: mixed
Memory model: 64 bit
MPI library: MPI
OpenMP support: enabled (GMX_OPENMP_MAX_THREADS = 64)
GPU support: CUDA
SIMD instructions: AVX2_256
FFT library: fftw-3.3.8-sse2-avx-avx2-avx2_128
RDTSCP usage: enabled
TNG support: enabled
Hwloc support: disabled
Tracing support: disabled
C compiler: /usr/bin/cc GNU 11.1.0
C compiler flags: -mavx2 -mfma -pthread -Wno-missing-field-initializers -fexcess-precision=fast -funroll-all-loops -O3 -DNDEBUG
C++ compiler: /usr/bin/c++ GNU 11.1.0
C++ compiler flags: -mavx2 -mfma -pthread -Wno-missing-field-initializers -fexcess-precision=fast -funroll-all-loops -fopenmp -O3 -DNDEBUG
CUDA compiler: /usr/local/cuda/bin/nvcc nvcc: NVIDIA (R) Cuda compiler driver;Copyright (c) 2005-2022 NVIDIA Corporation;Built on Wed_Sep_21_10:33:58_PDT_2022;Cuda compilation tools, release 11.8, V11.8.89;Build cuda_11.8.r11.8/compiler.31833905_0
CUDA compiler flags:-std=c++17;-gencode;arch=compute_35,code=sm_35;-gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_52,code=sm_52;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-Wno-deprecated-gpu-targets;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_35,code=compute_35;-gencode;arch=compute_53,code=compute_53;-gencode;arch=compute_80,code=compute_80;-use_fast_math;-D_FORCE_INLINES;-mavx2 -mfma -pthread -Wno-missing-field-initializers -fexcess-precision=fast -funroll-all-loops -fopenmp -O3 -DNDEBUG
CUDA driver: 11.70
CUDA runtime: 11.80
The first one is patched with plumed but the behavior is same with or without patching.
Both compilations were done using openmpi (4.1.2 for first, 4.1.4 for the second).
I would be really grateful for any suggestions to make the second installation behave like the first one i.e. automatically assign PME ranks to GPUs as well.
Best,
Amin.