GROMACS version: 2020.4
GROMACS modification: Yes/No
Hi,
For the analysis I need to do, I need to store very high time-resolution positions and velocities for the protein atoms from a several hundred ns simulation. I am struggling to figure out how to do this in GROMACS in a way that doesn’t require absurd amounts of storage.
The .trr stores positions and velocities in a compact format, but it includes the solvent atoms and so results in incredibly large files regardless. I see no way to output only the protein atoms.
I can output an .xtc file instead, which gives me reasonable file sizes with only the protein atoms. But I cannot get velocities from the integrator this way. I could recover the velocities from sufficiently precise positions, but that would require unpacking the .xtc file before I do my coarse-graining instead of after (I currently process the .xtc using gmx traj to get some center-of-mass trajectories before I convert anything to ascii to save time and storage space). I have tried simply calculating the velocities of my coarse-grained sites using finite differences, but it turns out single-precision data isn’t enough to recover good velocities at this stage as the cg sites move too slowly and I get lots of spurious zero velocities and the rest become obviously quantized. And converting the all-protein atom .xtc to ascii to get velocities first is slow and storage-intensive. Double-precision would probably work here, but I don’t currently have access to double-precision GROMACS.
So my problem is the asymmetric way GROMACS treats positions and velocities. The positions are no problem at all because they can go into the .xtc file and the solvent gets dumped, then I can coarse-grain using traj before converting my results to ascii. But there seems to be no equivalent pipeline for velocities that avoids gigantic file sizes.
I have also considered periodically stopping the MD simulation and using trjconv to toss the solvent out of the .trr, and appending the results of this dumping as I go. The problem I ran into here is that versions of GROMACS more than a couple years old develop errors in the timestamp during long simulations, and the concatenation routines in GROMACS use the timestamps to do alignment instead of frame numbers, so I get errors at every stitch. So again I would need to convert results to ascii and stitch them manually, which is again quite prohibitive. I do not have access to a newer version of GROMACS than 2020. This also drastically slows down simulations.
If anyone who is more familiar with what’s possible with GROMACS has any ideas, I would greatly appreciate it.