REMD extremely slow wall-clock time despite showing ~3 ns/day in .log file

Dear GROMACS users,

I am running 24-replica REMD simulations using GROMACS 2023 on our local cluster.
Each node has 48 cores and I launch the job as shown below:

----------------------------------------------------
export OMP_NUM_THREADS=1

for i in {0..23}; do
    mpirun -machinefile $PBS_NODEFILE -np 48 \
        /apps/codes/gromacs/2023/bin/gmx_mpi convert-tpr \
        -s replica_$i/remd_61_65.tpr -extend 5000 -o replica_$i/remd_66_70.tpr
done

mpirun -np 48 /apps/codes/gromacs/2023/bin/gmx_mpi mdrun \
    -multidir replica_{0..23} \
    -s remd_66_70.tpr -deffnm remd_66_70 -cpi \
    -replex 1000
----------------------------------------------------

Inside the *.log* files of the replicas, the reported performance is:

(ns/day) Performance: 3.11

So I expected that each 5 ns extension would finish in about **1.6 days**.
However in practice, the job takes **15–20 days** of wall-clock time to finish, which is far slower. 

Any guidance on how to run REMD more efficiently would be greatly appreciated.

Best regards,  
Sankar Maity
NIT ROURKELA