File input/output error: Cannot rename checkpoint file; maybe you are out of disk space?

GROMACS version: 2021.5
GROMACS modification: No

I was trying to do gas-phase simulation (i.e. in vacuum, without solvent) using gromacs 2021.5 version on a high-end cluster and the mdrun starts fine, but after a couple thousand steps, the job ends by generating multiple outputs and checkpoint files and the log file says:

*#test_md.edr.1# *
*#test_md.edr.2# *
*#test_md.edr.3# *
*#test_md.xtc.1# *
*#test_md.xtc.2# *
*#test_md.xtc.3# *
*#test_md_step291000.cpt.1# *

Program: gmx mdrun, version 2021.5
Source file: src/gromacs/mdlib/mdoutf.cpp (line 463)

File input/output error:
Cannot rename checkpoint file; maybe you are out of disk space?

I am not able to find a solution for it anywhere. What could be causing this error? Any help is very much appreciated.

How do you launch your simulation? That several numbered output files are created suggests to me that many separate simulations (4?) are started on the cluster, instead of a single simulation using that number of MPI ranks.

Please post your job submission script (if you have one) or the exact command used to launch the simulation.

This is my .mdp file:

title = OPLS NVT equilibration
; Run parameters
integrator = md ; Leap-frog integrator
nsteps = 5000000 ; 0.001 * 1000000 = 1000 ps (1 ns)
dt = 0.001 ; 1 fs

; Output control
nstxtcout = 5000
xtc_precision = 5000
nstxout = 0 ; suppress bulky .trr file by specifying
nstvout = 0 ; 0 for output frequency of nstxout,
nstfout = 0 ; nstvout, and nstfout
nstenergy = 5000 ; save energies every 10.0 ps
nstlog = 5000 ; update log file every 10.0 ps
compressed-x-grps = System ; save the whole system

; Bond parameters
continuation = yes ; Restarting after NVT
constraint_algorithm = lincs ; holonomic constraints
constraints = h-bonds ; bonds involving H are constrained
lincs_iter = 1 ; accuracy of LINCS
lincs_order = 8 ; also related to accuracy

; Neighborsearching
cutoff-scheme = Verlet ; Buffered neighbor searching
ns_type = grid ; search neighboring grid cells
nstlist = 50 ; 20 fs, largely irrelevant with Verlet scheme
rcoulomb = 333.3 ; short-range electrostatic cutoff (in nm)
rvdw = 333.3 ; short-range van der Waals cutoff (in nm)

; Electrostatics
coulombtype = Cut-off ; Plain cut-off with neighborlist radius and Coulomb cut-off
pme_order = 4 ; cubic interpolation
fourierspacing = 0.16 ; grid spacing for FFT

; Temperature coupling is on
tcoupl = V-rescale ; modified Berendsen thermostat
tc-grps = Protein ; two coupling groups - more accurate
tau_t = 0.5 ;0.5 ; time constant, in ps
ref_t = 300 ;300 ; reference temperature, one for each group, in K

; Pressure coupling is off
pcoupl = no ; No pressure coupling in NVT
pbc = xyz ; 3-D PBC

; Dispersion correction
DispCorr = EnerPres ; account for cut-off vdW scheme

; Velocity generation
gen_vel = no ; Velocity generation is off

And my submission script is as follows:

#!/bin/bash

Generic options:

#SBATCH --time=24:00:0 # Run for a max of 1 hour

Node resources:

#SBATCH -J Gromacs
#SBATCH --job-name=XXX_5ns
#SBATCH --partition=big # Choose either “gpu” or “infer” node type
#SBATCH --nodes=1 # Resources from a two nodes
#SBATCH --gres=gpu:4 # Four GPUs per node (plus 100% of node CPU and RAM per node)
#SBATCH --ntasks-per-node=4
#SBATCH --gres-flags=enforce-binding
#SBATCH --mail-type=ALL
#SBATCH --mail-user=xxx@xx.xx.xx

Run commands:

module purge # Removes all modules still loaded
module load gromacs/2021.5
module load openmpi/4.0.5

#export GMX_GPU_DD_COMMS=true
#export GMX_GPU_PME_PP_COMMS=true
#export GMX_FORCE_UPDATE_DEFAULT_GPU=true

mpirun -np ${SLURM_NTASKS_PER_NODE} gmx mdrun -s 1u7g_md.tpr -deffnm 1u7g_md -o 1u7g_md.trr -c 1u7g_md_confout.gro -e 1u7g_md.edr -g md.log -nb gpu -bonded gpu

echo “end of job”

You don’t seem to be using an MPI-enabled version of Gromacs, that binary is typically called gmx_mpi, not just gmx.

If so mpirun launches several separate simulations, which likely causes your error as they try to write to the same files at the same time.

The solution is to run gmx_mpi. If that is not installed, compile Gromacs with -DGMX_MPI=on when running cmake. More information here: https://manual.gromacs.org/current/install-guide/index.html#id1

Thank you so much! This was very helpful. I removed the “mpirun” from the command as this GROMACS version runs without using external mpi library, and it worked!

Dear Gromacs users,

I am encountering the same problem (failure after first checkpoint backup, aside performances lower than expected), using the command mpirun -np 8 followed by standard gmx.

I tried than to run gmx_mpi with the cluster srun, however with no mean I managed to avoid the error regarding the number of ranks (being multiple of the simulations), despite the SLURM set-up.

Since, both variations of the mdrun commands are not working, I have no clue of what could be the problem.

Thanks for the suggestions.