Signalling bug in gro file writer

GROMACS version:
GROMACS modification: Yes or No
Here post your question :

The *.gro file created by gmx has a problem.
(I’m using both gmx_mpi version 2023.3 and gmx 2024.2, and the bug appears the same.)

The third column of my_config.gro contains the atom number, say num_atom. When num_atom is 10000 to 99 999, it abuts the second column. Technically this is not yet a bug, because MatLab and Fortran could read this easily, but for CSV readers, this poses a complication because the number of columns appears to change from 9 to 8.
At num_atom 100 000 to 109 999, the atom numbering is wrong. It restarts at 0, 1, 2 …, leaving enough space for 9 columns again, but with wrong atom number. At 110 000 atoms, the columns join again. This pattern repeats.

Voila, I’ve reported this (perhaps harmless and well-known) bug.

It has become an issue because I’d like to read and edit gro files (in order to reconfigure the computational domain of a long simulation, Topic posted separately).

If I try to find and fix the c++ code that generates these problems in gro files, I’m afraid that that will break the routines that read these files.

Perhaps one of the developers could advice me if it would be sensible to modify the program that generates gro files ? At least on my laptop installation. (I’m not going to reinstall on the cray super computer I’m running gmx_mpi mdrun.) Would the read routines be robust to reading differently (simply with a bigger space between columns 2 and 3) formatted gro files ?

Dear @D-Gly

I am pretty sure this is not a bug, but a property of the .gro files. Back in the day the files had a well-defined and limited spacing for atom names numbers etc. The spaces for the atom numbers are 5, and as such they can go from 1 to 99999, and from 100000 they restart from 1. In very large systems you can see the same also for the residue numbers, that have 5 spaces, and is typical to see

...
99999SOL
     1SOL
     2SOL
...

This might be a problem for some visualization software and/or some in-house scripting. It is not a problem for GROMACS, as the binary .tpr file stores properly the index number. In fact, the index file doesn’t restart from 1, but can go > 100000, so you always point at the correct atom. It is more pointing at the line of the gro file, if you want to see it in another way, rather than to the atom number.

Indeed the gro file uses old style fixed format, just like the old PDB format. There is a fixed number of characters for each entry and there are no separators between entries (although in practice there are often spaces when not all space for an entry is used.

https://manual.gromacs.org/current/reference-manual/file-formats.html#gro