Offloading NB and PME to GPUs in multi-sim run

GROMACS version: 2024-dev-20231116-b967ab0c74-dirty-unknown
GROMACS modification: Yes
I’m trying to offload NB and PME in a 2 directory multi-sim setup with 4 GPUs, running mdrun using a command which looks like this:

export CUDA_VISIBLE_DEVICES=0,1,2,3
mpirun -n 2 gmx_mpi mdrun -deffnm bench -ntomp 24 -nb gpu -pme gpu -update cpu -multidir ms0 ms1

and I get the following (excerpt from .err):

Error in user input:
The GPU task assignment requested mdrun to use more than one GPU device on a
rank, which is not supported. Request only one GPU device per rank.

How should I call mdrun such that each simulation gets 2 GPUs, 1 for NB PP and 1 for PME? I cannot use thread-MPI because the multi-dir simulation requires MPI.

Thanks in advance.

You need to use mpirun -n 4 (as you have 2 simulations each using 2 ranks) and add the -npme 1 option.