[MD Simulation Error] Fatal PME Rank Issue: Particles Moving Out of Domain Decomposition Cell

GROMACS version: VERSION 2023.3-plumed_2.9.0

I am running a GROMACS simulation of 7.5millions all-atoms using gmx mdrun (version 2023.3-plumed_2.9.0) with MPI parallelization (40 ranks), and I encountered the following fatal error:
2025-03-02T23:00:00Z

Program:     gmx mdrun, version 2023.3-plumed_2.9.0
Program:     gmx mdrun, version 2023.3-plumed_2.9.0
Source file: src/gromacs/ewald/pme_redistribute.cpp (line 305)
Source file: src/gromacs/ewald/pme_redistribute.cpp (line 305)
MPI rank:    18 (out of 40)
MPI rank:    38 (out of 40)
Fatal error:
Fatal error:
5 particles communicated to PME rank 38 are more than 2/3 times the cut-off
3 particles communicated to PME rank 18 are more than 2/3 times the cut-off
out of the domain decomposition cell of their charge group in dimension y.
out of the domain decomposition cell of their charge group in dimension y.
This usually means that your system is not well equilibrated.
This usually means that your system is not well equilibrated.

Though the system is well equilibrated and working fine with low rank MPI.
My mdp file:

integrator              = md
dt                      = 0.002
nsteps                  = 250000000
nstxout                 = 0
nstxout-compressed      = 50000 
nstvout                 = 0
nstfout                 = 0
nstcalcenergy           = 1000
nstenergy               = 50000
nstlog                  = 50000
;
cutoff-scheme           = Verlet
nstlist                 = 20
rlist                   = 1.2
coulombtype             = pme
rcoulomb                = 1.2
vdwtype                 = Cut-off
vdw-modifier            = Force-switch
rvdw_switch             = 1.0
rvdw                    = 1.2
;
tcoupl                  = v-rescale
tc_grps                 = LIPIDS SOLVENTS
tau_t                   = 0.1 0.1
ref_t                   = 310 310
;
pcoupl                  = Parrinello-Rahman
pcoupltype              = isotropic
tau_p                   = 5.0
compressibility         = 4.5e-5
ref_p                   = 1.0
;
constraints             = h-bonds
constraint_algorithm    = LINCS
continuation            = no
lincs-order		= 4
lincs-iter		= 1
;
gen-vel 		= yes
gen-seed 		= 234353
gen-temp 		= 310
;
nstcomm                 = 100
comm_mode               = linear
comm_grps               = System
;
refcoord_scaling        = com
; Pull parameters
pull 			= no

Also my run slurm script is:

#SBATCH --nodes=10
#SBATCH --ntasks-per-node=4   # 4 MPI tasks per node (one per GPU)
#SBATCH --cpus-per-task=6     # 6 OpenMP threads per MPI task
#SBATCH --gres=gpu:4          # Request 4 GPUs per node

mpirun --bind-to none -np 40 gmx_mpi mdrun -deffnm production -dlb yes -ntomp 6 -npme auto -nb gpu -pin on  -v

To address this, I have tried the following steps:

  1. Energy minimization was performed successfully.
  2. Extended NVT and NPT equilibration, but the issue still persists.
  3. Adjusted PME ranks (-npme auto and manual settings) without success.

Given these observations, could you suggest additional ways to troubleshoot or optimize my domain decomposition settings, PME parameters, or simulation stability? Are there any specific .mdp settings or MPI decomposition strategies that could help mitigate this issue?

Any insights would be greatly appreciated! Thank you.

I suggest first checking whether a non-PLUMED build using a more recent GROMACS 2024.5 runs without issues.

I tried it with newer version and without plumed but it is giving the same error. but If I decrease the rank It is working without throwing any errors.