Using gromacs 2022 to run N replicas on ONE gpu

GROMACS version: 2022.2 (CUDA, MPI)
GROMACS modification: No
CUDA driver: 11.70
CUDA runtime: 11.60

Hello Gromacs Forum,

I am new to gromacs but would like to run temperature replica exchange. I am also new to replica exchange, so I have selected gromacs as my engine of choice based on background research I have done.

I am reaching out because I saw a very detailed response to a similarly described problem at this link: [gmx-users] multi-replica runs with GPUs [fork of Re: Gromacs 2018 and GPU PME ]

The problem that was asked before was what should one do if they have less GPUs than replicas?

For a given research problem, I have 44 replicas that I would like to simulate. I have followed the previous suggestions in the link described above the following advice was given:

#single node 2 GPUs 4 replicas 1 rank each 2-way sharing 
mpirun -np 4 gmx mdrun -multi 4 -pme gpu -nb gpu -gputasks 0011
mpirun -np 4 gmx mdrun -multi 4 # equivalent with the above assuming 2 GPUs

It can be worth trying at least 2-4 sims per GPU (especially if there are enough replicas and individual run performance is less important). What you **need**​ to make sure for performance reasons is that you have at least 1 core per GPU (cores not a hardware thread).

Here is what I have attempted:
mpirun -np 1 gmx_mpi mdrun -v -deffnm md_0 -multidir md_* -replex 1250 -reseed -1 -pme gpu -nb gpu -gputasks 0

where in this case all of my tpr files are named md_0.tpr.

No matter what value I put for -np, I keep getting the error of “The number of ranks (1) is not a multiple of the number of simulations (44).”

However, when I make my number of ranks equivalent to a multiple of the number of simulations everything is fine. But that is not the point I hope to make here. Instead, I would like to run a number of simulations which is greater than my current resources.

I know that my efficiency would not be the best if I ran 44 different simulations simultaneously on a single GPU. But is there any way to make something like this possible where the number of replicas I have is greater than my available computing resources? Whether I could run 44 different replicas on a single GPU or if I could run the 44 replicas across a number of processors that is not a multiple of 44 simulations would be extremely helpful.

Sincerely,
Austin Weigle
University of Illinois