Correct values of -ntomp and -ntmpi across multiple nodes (multi-replica simulation)

GROMACS version: 2023
GROMACS modification: patched with Plumed

Dear Experts,

Below is one PBS script for a multiple replica simulation that works, and one that does not work.

When I try increase the number of cores, I also seem to need to change the mpirun flags and the values of -ntomp and/or -ntmpi. However, from the documentation online, it is unclear what I need to change to get the simulation to work with 1248 cores.

Setting -ntomp to 1 and mpirun -np to 1248 does work below, however, GROMACS complains about the efficiency of the simulation. Any help would be appreciated.

Works:
#PBS -q normalsr
#PBS -v PROJECT=dd7
#PBS -l ncpus=104
#PBS -l mem=500gb
#PBS -l walltime=24:00:00
#PBS -l wd
#PBS -N meta-WTE
#PBS -l storage=scratch/dd7+gdata/dd7

echo $PWD
module load gromacs/2023

mpirun -np 96 gmx_mpi mdrun -v -deffnm meta -plumed plumed-WTE -multidir 00 01 02 03 04 05 06 07 08 09 10 11 -replex 25000 -ntomp 1 -pf meta_pullf.xvg -px meta_pullx.xvg -maxh 23.90 > meta-all.log

Does not work:
#PBS -q normalsr
#PBS -v PROJECT=dd7
#PBS -l ncpus=1248
#PBS -l mem=500gb
#PBS -l walltime=24:00:00
#PBS -l wd
#PBS -N meta-WTE
#PBS -l storage=scratch/dd7+gdata/dd7

echo $PWD
module load gromacs/2023

mpirun -np 1248 gmx_mpi mdrun -v -deffnm meta -plumed plumed-WTE -multidir 00 01 02 03 04 05 06 07 08 09 10 11 -replex 25000 -ntomp 1 -pf meta_pullf.xvg -px meta_pullx.xvg -maxh 23.90 > meta-all.log

What is the error you are encountering?

Note that -ntmpi is not the right flag, this should only be used for thread-MPI runs while you have an MPI build (required for multisim).