GROMACS version: 2024
GROMACS modification: No
I am attempting to simulate a fairly straight-forward system (folded protein + tip3p water + charge balancing ions) with a 4 fs time step using HMR. In different minimization runs I was able to get Fmax < 500, < 100, and even < 50, so I don’t think my problem has to do with insufficient energy minimization.
NVT equilibration with a 4 fs time step and positional restraints crashes almost immediately (with different errors – see below), so I thought I could increase the time step slowly over multiple equilibration runs. 500,000 steps of 2 fs nvtEQ ran without problems, as did another 1,000,000 steps of 3 fs. Total energy and temperature seem to be nicely converged. However, a 4 fs time step nvtEQ run started from the end of the 3 fs time step run (coordinates and velocities) crashes within 5,000 steps. So, am I still not minimizing / equilibrating the system enough to run with a 4 fs time step, or does something about the topology of the system make it fundamentally unstable with a time step that long?
@MagnusL, the error behavior I’m observing is similar to what Ryan observed in his recent post, so I am tagging you here.
The runs have crashed in at least 4 different ways, including a core dump. One of the other errors points to hydrogen in a water molecule near the edge of the box, nowhere near the protein:
Step 2240:
The update group starting at atom 21709 moved more than the distance
allowed by the domain decomposition (2.204964) in direction X
distance out of cell 6.098123
Old coordinates: 22.099 17.146 11.171
New coordinates: 18.048 4.994 5.442
Old cell boundaries in direction X: 5.401 8.102
New cell boundaries in direction X: 5.401 8.102
Program: gmx mdrun, version 2024
Source file: src/gromacs/domdec/redistribute.cpp (line 219)
MPI rank: 5 (out of 7)Fatal error:
One or more atoms moved too far between two domain decomposition steps.
This usually means that your system is not well equilibrated
At other times the error is this:
step 1400, remaining wall clock time: 167 s imb F 3% pme/F 0.81
Program: gmx mdrun, version 2024
Source file: src/gromacs/gpu_utils/device_stream.cu (line 107)
Function: DeviceStream::synchronize() const::<lambda()>
MPI rank: 6 (out of 7)Assertion failed:
Condition: stat == cudaSuccess
cudaStreamSynchronize failed. CUDA error #700 (cudaErrorIllegalAddress):
an illegal memory access was encountered.
or this:
step 1200, remaining wall clock time: 175 s imb F 3% pme/F 0.84
Program: gmx mdrun, version 2024
Source file: src/gromacs/mdlib/sim_util.cpp (line 555)
Function: void checkPotentialEnergyValidity(int64_t, const
gmx_enerdata_t&, const t_inputrec&)
MPI rank: 0 (out of 7)Internal error (bug):
Step 1300: The total potential energy is nan, which is not finite. The LJ and
electrostatic contributions to the energy are 0 and -429026, respectively. A
non-finite potential energy can be caused by overlapping interactions in
bonded interactions or very large or Nan coordinate values. Usually this is
caused by a badly- or non-equilibrated initial configuration, incorrect
interactions or parameters in the topology.