Checkpoint file is not written

amleon · March 27, 2023, 10:52am

GROMACS version: 2022.5
GROMACS modification: No
Hello,
I am using an Umbrella Sampling/Simulated Tempering strategy. I started to use gmx-2022.5 and I observed that the checkpoint is not longer written. On the log file is printed:

Expanded ensemble with the legacy simulator does not always checkpoint correctly, so checkpointing is disabled. You will not be able to do a checkpoint restart of this simulation. If you use the modular simulator (e.g. by choosing md-vv integrator) then checkpointing is enabled. See GROMACS 2022.3 has issues with checkpointing expanded ensemble simulations (#4629) · Issues · GROMACS / GROMACS · GitLab for details.

I went to the issue tracker, but I did not get in the end what is the best option in my case: go to a previews GROMACS version where the checkpoint was working (however with some bugs that I do not know if they have some repercussion in my case); or if there are some possible modifications that I could include in the MDP that helps to mitigate the issue and output the checkpoint. I am using the md-vv integrator. I do not understand what is the modular simulator and the legacy simulator referenced in the printed note.

hess · March 27, 2023, 12:17pm

This should not happen, the modular simulator should be chosen automatically.

You can force the of the modular simulator by setting the environment variable when running mdrun: GMX_USE_MODULAR_SIMULATOR=ON

amleon · March 27, 2023, 1:10pm

Hi @hess . Thanks for your replay! I set it GMX_USE_MODULAR_SIMULATOR=ON on my job script and then I called mdrun as: gmx mdrun -nt 12 -cpi -stepout 5000 -v -deffnm production -px production_pullx -pf production_pullf >& production.lis. But still I am not getting the checkpoint file that should be production.cpt. Here are some fragments of my MDP file:

define                                   = -DPOSRES -DPOSRES_FC_BB=0.0 -DPOSRES_FC_SC=0.0 -DPOSRES_FC_LIPID=0.0 -DDIHRES -DDIHRES_FC=0.0 -DPOSRES_LIG=0.0
integrator                               = md-vv
dt                                       = 0.004
tinit                                    = 0
nsteps                                   = 37500000
nstcomm                                  = 100
nstxout                                  = 0
nstvout                                  = 0
nstfout                                  = 0
nstcalcenergy                            = 100
nstenergy                                = 5000
nstlog                                   = 18750
nstxout_compressed                       = 18750
(...)
; Simulated Tempering
free_energy                              = expanded
init_lambda_state                        = 0
nstdhdl                                  = 50
temperature_lambdas                      = 0.0 0.06666666666666667 0.13333333333333333 0.2 0.26666666666666666 0.3333333333333333 0.4 0.4666666666666667 0.5333333333333333 0.6 0.6666666666666666 0.7333333333333333 0.8 0.8666666666666667 0.9333333333333333 1.0
init_lambda_weights                      = 0.0 3881.13794 7695.36963 11455.08984 15156.50684 18796.63477 22372.41992 25899.44141 29367.31055 32777.19531 36129.94531 39438.44922 42699.17188 45909.59375 49072.89062 52181.53125
simulated_tempering                      = yes
simulated_tempering_scaling              = linear
sim_temp_low                             = 303.15
sim_temp_high                            = 333.15
nstexpanded                              = 100
lmc_stats                                = wang-landau
lmc_move                                 = metropolis
lmc_weights_equil                        = no
wl_scale                                 = 0.999999

hess · March 27, 2023, 1:53pm

Then I suppose you are doing something wrong in setting the environment variable. Unfortunately mdrun doesn’t print which simulator it is using.

Do you need to export the env var.

amleon · March 27, 2023, 3:20pm

This is my script:

#!/bin/bash
#SBATCH --job-name=15_production
#SBATCH --output=myjob.out
#SBATCH --error=myjob.err
#SBATCH --nice=0
#SBATCH --nodes=1
#SBATCH --gpus=1
#SBATCH --cpus-per-task=12
#SBATCH --partition=deflt
#SBATCH --exclude=fang41,fang49
#SBATCH --time=2-00:00:00


# This block is echoing some SLURM variables
echo "Job execution start: $(date)"
echo "JobID = $SLURM_JOBID"
echo "Host = $SLURM_JOB_NODELIST"
echo "Jobname = $SLURM_JOB_NAME"
echo "Subcwd = $SLURM_SUBMIT_DIR"
echo "SLURM_TASKS_PER_NODE = $SLURM_TASKS_PER_NODE"
echo "SLURM_CPUS_PER_TASK = $SLURM_CPUS_PER_TASK"
echo "SLURM_CPUS_ON_NODE = $SLURM_CPUS_ON_NODE"

source /data/shared/spack-0.19.1/shared.bash
module load gromacs/2022.5

cd $(pwd)

export GMX_USE_MODULAR_SIMULATOR=ON
echo "GMX_USE_MODULAR_SIMULATOR = $GMX_USE_MODULAR_SIMULATOR"
gmx mdrun -nt 12 -cpi -stepout 5000 -v -deffnm production -px production_pullx -pf production_pullf  >& production.lis

jochenhub · March 31, 2023, 3:17pm

The line

echo “GMX_USE_MODULAR_SIMULATOR = $GMX_USE_MODULAR_SIMULATOR”

is wrong. There must not be a white space before an after the “=” sign. In your line, you try to execute a command GMX_USE_MODULAR_SIMULATOR with two arguments “=” and “$GMX_USE_MODULAR_SIMULATOR”:

A=5 # correct
A = 5
-bash: A: command not found

glykos · May 31, 2024, 9:53am

Have you found a solution to this ? I am seeing the same behavior with v.2024.2 and with the md-vv integrator.

Thanks a lot.

Topic		Replies	Views
Writing checkpoints only and not running User discussions mdrun	1	738	January 11, 2022
Check point file User discussions	6	685	January 30, 2021
Metadynamics simulation unable to be restarted from checkpoint, despite mdrun maxh flag used User discussions mdrun , replica-exchange	1	233	April 11, 2024
Simulation will not continue with cpt file User discussions	0	568	July 6, 2020
Checkpoint file during QM/MM simulation using GROMACS/CP2K User discussions qm-mm	2	381	October 24, 2022

Checkpoint file is not written

Related topics