File input/output error: Cannot rename checkpoint file; maybe you are out of disk space?

nehavk · September 13, 2022, 5:52pm

GROMACS version: 2021.5
GROMACS modification: No

I was trying to do gas-phase simulation (i.e. in vacuum, without solvent) using gromacs 2021.5 version on a high-end cluster and the mdrun starts fine, but after a couple thousand steps, the job ends by generating multiple outputs and checkpoint files and the log file says:

*#test_md.edr.1# *
*#test_md.edr.2# *
*#test_md.edr.3# *
*#test_md.xtc.1# *
*#test_md.xtc.2# *
*#test_md.xtc.3# *
*#test_md_step291000.cpt.1# *

Program: gmx mdrun, version 2021.5
Source file: src/gromacs/mdlib/mdoutf.cpp (line 463)

File input/output error:
Cannot rename checkpoint file; maybe you are out of disk space?

I am not able to find a solution for it anywhere. What could be causing this error? Any help is very much appreciated.

pjohansson · September 14, 2022, 12:20pm

How do you launch your simulation? That several numbered output files are created suggests to me that many separate simulations (4?) are started on the cluster, instead of a single simulation using that number of MPI ranks.

Please post your job submission script (if you have one) or the exact command used to launch the simulation.

nehavk · September 14, 2022, 1:32pm

This is my .mdp file:

title = OPLS NVT equilibration
; Run parameters
integrator = md ; Leap-frog integrator
nsteps = 5000000 ; 0.001 * 1000000 = 1000 ps (1 ns)
dt = 0.001 ; 1 fs

; Output control
nstxtcout = 5000
xtc_precision = 5000
nstxout = 0 ; suppress bulky .trr file by specifying
nstvout = 0 ; 0 for output frequency of nstxout,
nstfout = 0 ; nstvout, and nstfout
nstenergy = 5000 ; save energies every 10.0 ps
nstlog = 5000 ; update log file every 10.0 ps
compressed-x-grps = System ; save the whole system

; Bond parameters
continuation = yes ; Restarting after NVT
constraint_algorithm = lincs ; holonomic constraints
constraints = h-bonds ; bonds involving H are constrained
lincs_iter = 1 ; accuracy of LINCS
lincs_order = 8 ; also related to accuracy

; Neighborsearching
cutoff-scheme = Verlet ; Buffered neighbor searching
ns_type = grid ; search neighboring grid cells
nstlist = 50 ; 20 fs, largely irrelevant with Verlet scheme
rcoulomb = 333.3 ; short-range electrostatic cutoff (in nm)
rvdw = 333.3 ; short-range van der Waals cutoff (in nm)

; Electrostatics
coulombtype = Cut-off ; Plain cut-off with neighborlist radius and Coulomb cut-off
pme_order = 4 ; cubic interpolation
fourierspacing = 0.16 ; grid spacing for FFT

; Temperature coupling is on
tcoupl = V-rescale ; modified Berendsen thermostat
tc-grps = Protein ; two coupling groups - more accurate
tau_t = 0.5 ;0.5 ; time constant, in ps
ref_t = 300 ;300 ; reference temperature, one for each group, in K

; Pressure coupling is off
pcoupl = no ; No pressure coupling in NVT
pbc = xyz ; 3-D PBC

; Dispersion correction
DispCorr = EnerPres ; account for cut-off vdW scheme

; Velocity generation
gen_vel = no ; Velocity generation is off

nehavk · September 14, 2022, 1:34pm

And my submission script is as follows:

#!/bin/bash

Generic options:

#SBATCH --time=24:00:0 # Run for a max of 1 hour

Node resources:

#SBATCH -J Gromacs
#SBATCH --job-name=XXX_5ns
#SBATCH --partition=big # Choose either “gpu” or “infer” node type
#SBATCH --nodes=1 # Resources from a two nodes
#SBATCH --gres=gpu:4 # Four GPUs per node (plus 100% of node CPU and RAM per node)
#SBATCH --ntasks-per-node=4
#SBATCH --gres-flags=enforce-binding
#SBATCH --mail-type=ALL
#SBATCH --mail-user=xxx@xx.xx.xx

Run commands:

module purge # Removes all modules still loaded
module load gromacs/2021.5
module load openmpi/4.0.5

#export GMX_GPU_DD_COMMS=true
#export GMX_GPU_PME_PP_COMMS=true
#export GMX_FORCE_UPDATE_DEFAULT_GPU=true

mpirun -np ${SLURM_NTASKS_PER_NODE} gmx mdrun -s 1u7g_md.tpr -deffnm 1u7g_md -o 1u7g_md.trr -c 1u7g_md_confout.gro -e 1u7g_md.edr -g md.log -nb gpu -bonded gpu

echo “end of job”

pjohansson · September 14, 2022, 2:16pm

You don’t seem to be using an MPI-enabled version of Gromacs, that binary is typically called gmx_mpi, not just gmx.

If so mpirun launches several separate simulations, which likely causes your error as they try to write to the same files at the same time.

The solution is to run gmx_mpi. If that is not installed, compile Gromacs with -DGMX_MPI=on when running cmake. More information here: https://manual.gromacs.org/current/install-guide/index.html#id1

nehavk · September 26, 2022, 9:59am

Thank you so much! This was very helpful. I removed the “mpirun” from the command as this GROMACS version runs without using external mpi library, and it worked!

gian93 · June 22, 2023, 5:25pm

Dear Gromacs users,

I am encountering the same problem (failure after first checkpoint backup, aside performances lower than expected), using the command mpirun -np 8 followed by standard gmx.

I tried than to run gmx_mpi with the cluster srun, however with no mean I managed to avoid the error regarding the number of ranks (being multiple of the simulations), despite the SLURM set-up.

Since, both variations of the mdrun commands are not working, I have no clue of what could be the problem.

Thanks for the suggestions.

Topic		Replies	Views
Cannot rename checkpoint file; maybe you are out of disk space? User discussions	2	439	March 13, 2021
Mpirun for mdrun User discussions mdrun	3	2998	September 16, 2022
Where is checkpoint file written running mdrun with MPI-threads User discussions	0	407	March 22, 2022
Error: "Checkpoint file corrupted/truncated, or maybe you are out of disk space?" User discussions	2	1113	November 18, 2021
Possible bug with simulation extension User discussions	1	402	June 26, 2020

File input/output error: Cannot rename checkpoint file; maybe you are out of disk space?

Generic options:

Node resources:

Run commands:

Related topics