Encountered crashed mdrun when request 2 GPUs on parallel simulations with 15 replicas

GROMACS version: 2023
GROMACS modification: No
Here post your question:

I have a gromacs built 2023 and plumed version 2.9.0 to do bias-exchange metadynamics (BE-META) simulation for a system with 15 replicas. When I requested 1 L40 GPU and specified 15 tasks per 15 replicas, the simulation was completed successfully. However, when I requested 2 L40 GPUs and specified 15 tasks, the simulation encountered an error message shown below:


Program: gmx mdrun, version 2023-plumed_2.9.0
Source file: src/gromacs/taskassignment/taskassignment.cpp (line 129)
Function: std::vector<std::vectorgmx::GpuTaskMapping > gmx::{anonymous}::buildTaskAssignment(const gmx::GpuTasksOnRanks&, gmx::ArrayRef)
MPI rank: 0 (out of 15)

Error in user input:
The GPU task assignment requested mdrun to use more than one GPU device on a
rank, which is not supported. Request only one GPU device per rank.

The simulation was executed with slurm as shown below:
srun --mpi=pmi2 gmx_mpi mdrun -ntomp 1 -nb gpu -resethway -v -plumed bemeta -multidir md_ -replex 2500 -s start -deffnm prod*

When I tried 2 L40 GPUs and specified 30 tasks for 15 replicas (i.e., 2 tasks per replica), the simulation run successfully with lower performance compared to the combination of one GPU and 15 tasks.

On the other hand, when I used another version of Gromacs and plumed (Gromacs 2018.8 and plumed 2.6.6) to run BE-META simulation of the same system requesting either (two T4 GPUs and 15 tasks) or (one T4 GPU and 15 tasks) or doubling the tasks while having one/two T4 GPUs, the simulation completed successfully in all these four variations.

The simulation of Gromacs 2018.8 and plumed 2.6.6 was executed with slurm as shown below:
srun --mpi=pmi2 gmx_mpi mdrun -ntomp 1 -nb gpu -resethway -v -plumed bemeta -multidir md_ -replex 2500 -s start -deffnm prod*

My question is: Does Gromacs 2023 permit the use of more than one GPU only with an even number of tasks? Or must the number of tasks be divisible by the number of requested GPUs? While an older version of Gromacs does not need these contraints?

Is there a way for me to utilize Gromacs 2023 with more than one GPU while maintaining the predefined number of replicas?