Encountered crashed mdrun when request 2 GPUs on parallel simulations with 15 replicas

MinhGromacs123 · November 15, 2024, 5:28pm

GROMACS version: 2023
GROMACS modification: No
Here post your question:

I have a gromacs built 2023 and plumed version 2.9.0 to do bias-exchange metadynamics (BE-META) simulation for a system with 15 replicas. When I requested 1 L40 GPU and specified 15 tasks per 15 replicas, the simulation was completed successfully. However, when I requested 2 L40 GPUs and specified 15 tasks, the simulation encountered an error message shown below:

Program: gmx mdrun, version 2023-plumed_2.9.0
Source file: src/gromacs/taskassignment/taskassignment.cpp (line 129)
Function: std::vector<std::vectorgmx::GpuTaskMapping > gmx::{anonymous}::buildTaskAssignment(const gmx::GpuTasksOnRanks&, gmx::ArrayRef)
MPI rank: 0 (out of 15)

Error in user input:
The GPU task assignment requested mdrun to use more than one GPU device on a
rank, which is not supported. Request only one GPU device per rank.

The simulation was executed with slurm as shown below:
srun --mpi=pmi2 gmx_mpi mdrun -ntomp 1 -nb gpu -resethway -v -plumed bemeta -multidir md_ -replex 2500 -s start -deffnm prod*

When I tried 2 L40 GPUs and specified 30 tasks for 15 replicas (i.e., 2 tasks per replica), the simulation run successfully with lower performance compared to the combination of one GPU and 15 tasks.

On the other hand, when I used another version of Gromacs and plumed (Gromacs 2018.8 and plumed 2.6.6) to run BE-META simulation of the same system requesting either (two T4 GPUs and 15 tasks) or (one T4 GPU and 15 tasks) or doubling the tasks while having one/two T4 GPUs, the simulation completed successfully in all these four variations.

The simulation of Gromacs 2018.8 and plumed 2.6.6 was executed with slurm as shown below:
srun --mpi=pmi2 gmx_mpi mdrun -ntomp 1 -nb gpu -resethway -v -plumed bemeta -multidir md_ -replex 2500 -s start -deffnm prod*

My question is: Does Gromacs 2023 permit the use of more than one GPU only with an even number of tasks? Or must the number of tasks be divisible by the number of requested GPUs? While an older version of Gromacs does not need these contraints?

Is there a way for me to utilize Gromacs 2023 with more than one GPU while maintaining the predefined number of replicas?

Topic		Replies	Views
Using gromacs 2022 to run N replicas on ONE gpu User discussions mdrun , replica-exchange	0	1337	June 24, 2022
Offloading NB and PME to GPUs in multi-sim run User discussions mdrun , gpu , simulation-setup	1	198	January 22, 2024
Different behavior of gmx_mpi with two similar setups User discussions	0	371	November 15, 2022
Simulations with multiple GPUs User discussions mdrun	3	2264	May 24, 2022
Efficient Use of CPU and GPU Hybridization for Multiple GROMACS Jobs on a Single Machine User discussions mdrun , simulation-setup	1	244	January 18, 2024

Encountered crashed mdrun when request 2 GPUs on parallel simulations with 15 replicas

Error in user input: The GPU task assignment requested mdrun to use more than one GPU device on a rank, which is not supported. Request only one GPU device per rank.

Related topics

Error in user input:
The GPU task assignment requested mdrun to use more than one GPU device on a
rank, which is not supported. Request only one GPU device per rank.