I was trying to do gas-phase simulation (i.e. in vacuum, without solvent) using gromacs 2021.5 version on a high-end cluster and the mdrun starts fine, but after a couple thousand steps, the job ends by generating multiple outputs and checkpoint files and the log file says:
How do you launch your simulation? That several numbered output files are created suggests to me that many separate simulations (4?) are started on the cluster, instead of a single simulation using that number of MPI ranks.
Please post your job submission script (if you have one) or the exact command used to launch the simulation.
; Output control
nstxtcout = 5000
xtc_precision = 5000
nstxout = 0 ; suppress bulky .trr file by specifying
nstvout = 0 ; 0 for output frequency of nstxout,
nstfout = 0 ; nstvout, and nstfout
nstenergy = 5000 ; save energies every 10.0 ps
nstlog = 5000 ; update log file every 10.0 ps
compressed-x-grps = System ; save the whole system
; Bond parameters
continuation = yes ; Restarting after NVT
constraint_algorithm = lincs ; holonomic constraints
constraints = h-bonds ; bonds involving H are constrained
lincs_iter = 1 ; accuracy of LINCS
lincs_order = 8 ; also related to accuracy
; Neighborsearching
cutoff-scheme = Verlet ; Buffered neighbor searching
ns_type = grid ; search neighboring grid cells
nstlist = 50 ; 20 fs, largely irrelevant with Verlet scheme
rcoulomb = 333.3 ; short-range electrostatic cutoff (in nm)
rvdw = 333.3 ; short-range van der Waals cutoff (in nm)
; Electrostatics
coulombtype = Cut-off ; Plain cut-off with neighborlist radius and Coulomb cut-off
pme_order = 4 ; cubic interpolation
fourierspacing = 0.16 ; grid spacing for FFT
; Temperature coupling is on
tcoupl = V-rescale ; modified Berendsen thermostat
tc-grps = Protein ; two coupling groups - more accurate
tau_t = 0.5 ;0.5 ; time constant, in ps
ref_t = 300 ;300 ; reference temperature, one for each group, in K
; Pressure coupling is off
pcoupl = no ; No pressure coupling in NVT
pbc = xyz ; 3-D PBC
#SBATCH -J Gromacs #SBATCH --job-name=XXX_5ns #SBATCH --partition=big # Choose either “gpu” or “infer” node type #SBATCH --nodes=1 # Resources from a two nodes #SBATCH --gres=gpu:4 # Four GPUs per node (plus 100% of node CPU and RAM per node) #SBATCH --ntasks-per-node=4 #SBATCH --gres-flags=enforce-binding #SBATCH --mail-type=ALL #SBATCH --mail-user=xxx@xx.xx.xx
Run commands:
module purge # Removes all modules still loaded
module load gromacs/2021.5
module load openmpi/4.0.5
Thank you so much! This was very helpful. I removed the “mpirun” from the command as this GROMACS version runs without using external mpi library, and it worked!
I am encountering the same problem (failure after first checkpoint backup, aside performances lower than expected), using the command mpirun -np 8 followed by standard gmx.
I tried than to run gmx_mpi with the cluster srun, however with no mean I managed to avoid the error regarding the number of ranks (being multiple of the simulations), despite the SLURM set-up.
Since, both variations of the mdrun commands are not working, I have no clue of what could be the problem.