Mdrun : An error occurred in MPI_Allreduce

GROMACS version: 2022.6
GROMACS modification: No

I am running gromacs-cp2k and I get the following error when running mdrun on qmmm system.

                      :-) GROMACS - gmx mdrun, 2022.6 (-:

Executable:   /usr/local/gromacs-cp2k-gpu/bin/gmx_mpi
Data prefix:  /usr/local/gromacs-cp2k-gpu
Working dir:  /home/vivek/Desktop/cp2k_test/tutorial/egfp
Command line:
  gmx_mpi mdrun -s egfp-qmmm-nvt.tpr -deffnm egfp-qmmm-nvt -v

Reading file egfp-qmmm-nvt.tpr, VERSION 2022.6 (single precision)
Changing nstlist from 10 to 100, rlist from 1.2 to 1.302

1 GPU selected for this run.
Mapping of GPU IDs to the 2 GPU tasks in the 1 rank on this node:
  PP:0,PME:0
PP tasks will do (non-perturbed) short-ranged interactions on the GPU
PP task will update and constrain coordinates on the CPU
PME tasks will do all aspects on the GPU
Using 1 MPI process
Using 32 OpenMP threads 

starting mdrun 'GREEN FLUORESCENT PROTEIN in water'
100 steps,      0.1 ps.
[admin-PC:1308999] *** An error occurred in MPI_Allreduce
[admin-PC:1308999] *** reported by process [1728380929,0]
[admin-PC:1308999] *** on communicator MPI_COMM_WORLD
[admin-PC:1308999] *** MPI_ERR_COMM: invalid communicator
[admin-PC:1308999] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[admin-PC:1308999] ***    and potentially your MPI job)

Is this a QMMM simulation? If so, I would guess the MPI error is triggered in the QMMM package.

Did you try with 2 MPI ranks? I suspect that this might work.

Well you guessed right!
It worked with mpirun -n 2. I see that gpu is detected and there is no GPU usage despite building it on CUDA. Now mdrun is extremely slow

But are you running QMMM? If you are running QMMM, then the QM will take nearly all time and MM nearly nothing, so the GPU will be nearly idle.

Yes, I am running QMMM. But I built cp2k with CUDA too. Wont it utilise GPU?
What would you suggest in order to achieve max performance on a qmmm sim? I have a core i9 K series CPU and RTX 3090Ti GPU. Currently there is no gpu usage and only one cpu thread utilization

I don’t know if cp2k will use the GPU when linked with GROMACS. Maybe @dmorozov knows?

QM tends to take the vast majority of the time. So if cp2k doesn’t use the GPU it will be nearly idle.

Hi, could you try to run standalone compiled CP2K version confirm that it actually uses GPU? Reason is that according to the CP2K INSTALL.md file CP2K has support only Tesla GPUs: K20X, K40, K80, P100, V100, A100.

Generally, I could say that the GeForce cards are really bad choice for doing Quantum Chemistry (and QMMM) as you need full double precision floating point support, which NVidia almost cut out from the GeForces. They want to sell Tesla accelerators as well!

Hello. Thank you very much for your support guys.

I tried using the cp2k for qm and I found that it is utilizing the GPU.
Also, I chose V100 while compiling cp2k as its compute capability was close to RTX 3090.

I don’t know what cp2k does when using two ranks and one GPU. Is it possible to compile CP2K (and GROMACS) without MPI?

Hey,
I compiled the non-mpi version of cp2k on CUDA and also gromacs without MPI.
Now, I see that mdrun uses gpu, but inefficiently.


Also, this is the cpu usage:

This doesn’t look to bad to me. I don’t know much about CP2K, but I can imagine that this could be the best you can get.