Storing protein velocities at high framerate from long simulation

GROMACS version: 2020.4
GROMACS modification: Yes/No

Hi,

For the analysis I need to do, I need to store very high time-resolution positions and velocities for the protein atoms from a several hundred ns simulation. I am struggling to figure out how to do this in GROMACS in a way that doesn’t require absurd amounts of storage.

The .trr stores positions and velocities in a compact format, but it includes the solvent atoms and so results in incredibly large files regardless. I see no way to output only the protein atoms.

I can output an .xtc file instead, which gives me reasonable file sizes with only the protein atoms. But I cannot get velocities from the integrator this way. I could recover the velocities from sufficiently precise positions, but that would require unpacking the .xtc file before I do my coarse-graining instead of after (I currently process the .xtc using gmx traj to get some center-of-mass trajectories before I convert anything to ascii to save time and storage space). I have tried simply calculating the velocities of my coarse-grained sites using finite differences, but it turns out single-precision data isn’t enough to recover good velocities at this stage as the cg sites move too slowly and I get lots of spurious zero velocities and the rest become obviously quantized. And converting the all-protein atom .xtc to ascii to get velocities first is slow and storage-intensive. Double-precision would probably work here, but I don’t currently have access to double-precision GROMACS.

So my problem is the asymmetric way GROMACS treats positions and velocities. The positions are no problem at all because they can go into the .xtc file and the solvent gets dumped, then I can coarse-grain using traj before converting my results to ascii. But there seems to be no equivalent pipeline for velocities that avoids gigantic file sizes.

I have also considered periodically stopping the MD simulation and using trjconv to toss the solvent out of the .trr, and appending the results of this dumping as I go. The problem I ran into here is that versions of GROMACS more than a couple years old develop errors in the timestamp during long simulations, and the concatenation routines in GROMACS use the timestamps to do alignment instead of frame numbers, so I get errors at every stitch. So again I would need to convert results to ascii and stitch them manually, which is again quite prohibitive. I do not have access to a newer version of GROMACS than 2020. This also drastically slows down simulations.

If anyone who is more familiar with what’s possible with GROMACS has any ideas, I would greatly appreciate it.

Hi,

Why not compile a newer (also double precision) version of the code yourself ?

Cheers,
Szilárd

I’ve been looking into that since I posted this. I managed to compile the double precision version, but I can’t get the output .xtc files to store the positions in double precision. Setting compressed-x-precision = 1e8 works fine to get essentially full single precision data, but if I try to set it any higher (ie: 1e9) I get an immediate fatal error in mdrun, even though this version of gromacs is double precision. I don’t really understand why.

Fatal Error - XTC error - maybe you are out of disk space

I checked that the .trr is working fine on short runs, so the double precision is working in general, but those are prohibitively large for what I’m doing, I need to use an .xtc so that I can selectively store the protein without the solvent.