Bug report

GROMACS version: 2021 - MODIFIED
GROMACS modification: Yes - note as modified on LUMI

I have got this error on gromacs - it said it is a bug - any suggestion?

starting mdrun 'Title'
500000000000 steps, 1000000000.0 ps (continuing from step 69799500, 139599.0 ps).
step 69799500
Program:     gmx mdrun, version 2021-MODIFIED
MPI rank:    0 (out of 640)

Standard library logic error (bug):
(exception type: St12out_of_range)
basic_string::erase: __pos (which is 18446744073709551615) > this->size()
(which is 1024)

For more information and tips for troubleshooting, please check the GROMACS
website at http://www.gromacs.org/Documentation/Errors
MPICH Notice [Rank 0] [job id 929543.0] [Fri Mar 18 17:47:34 2022] [nid001458] - Abort(1) (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0

aborting job:
application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
srun: error: nid001458: task 0: Exited with exit code 255
srun: launch/slurm: _step_signal: Terminating StepId=929543.0
slurmstepd: error: *** STEP 929543.0 ON nid001458 CANCELLED AT 2022-03-18T17:47:34 ***


this is an internal libc library error, likely due to some integer value overflowing and causing trouble.

Can you share the TPR and checkpoint file so I can try to reproduce this? This of course shouldn’t happen during a normal run.



Hi Paul,

I just literally copy the file from LUMI and put it on Dardel and run it there and it works perfectly fine.

I guess this is not the gromacs issue but cluster issue? Should I just let them know?



Hello Will,

did you use same slurm settings, number of ranks and so on? Then it might be an issue with the LUMI version. Otherwise I would first try to fully reproduce it to make sure we are not doing something bad during restarts.