GROMACS version: 2025.1
GROMACS modification: No
I’m trying to perform analysis using only part of a molecule (Removing the ends of a nucleic acid strand). Because of how my downstream analysis pipeline works, I determined that the best way to handle this would be to use trjconv and convert-tpr with an index group to get xtc and tpx files excluding the unwanted residues. Modifying gro and xtc files with trjconv worked well, but when I generate a truncated tpx file with convert-tpr, it keeps the whole molecule topology, including the removed residues, resulting in incorrect atom types when loaded in downstream tools (MDAnalysis in this case). I unfortunately need to use the tpr file because there are more than 100k atoms in my system, so using the gro file as the topology for analysis results in indexing problems as my downstream analysis uses atom indexes. Any idea how to get a clean tpx file where the molecule topology is also truncated?
convert-tpr/trjconv commands:
gmx convert-tpr -s md.tpr -n noloops.ndx -o noloops.tpx
gmx trjconv -s md.tpr -f md.gro -n noloops.ndx -o noloops.gro
Looking at the gro file, it starts on the residue I want to start my analysis on:
$ head noloops.gro
RNA_chain_A
126618
3U P 1 9.307 5.736 8.243 0.1655 -0.1133 -0.3574
3U O1P 2 9.337 5.809 8.363 -0.6448 0.5286 -0.5350
3U O2P 3 9.305 5.818 8.119 0.1602 0.0273 0.7769
3U O5' 4 9.419 5.616 8.225 0.6936 -0.8166 -0.4415
3U C5' 5 9.479 5.552 8.334 -0.3628 0.0396 -0.4869
3U H5'1 6 9.402 5.507 8.397 0.5341 -2.7104 -1.2128
3U H5'2 7 9.523 5.621 8.406 -2.3819 1.2063 -0.3085
3U C4' 8 9.569 5.436 8.292 0.7754 -0.4253 0.1886
But when I load the tpx file with MDA:
import MDAnalysis as mda
u = mda.Universe('noloops.tpx', 'noloops.gro') # Atom types are wrong
print(u.atoms[0].type, u.atoms[1].type, u.atoms[2].type)
u = mda.Universe('noloops.gro', 'noloops.gro') # atom types are right, but atom IDs reset at 99999
print(u.atoms[0].type, u.atoms[1].type, u.atoms[2].type)
u = mda.Universe('md.tpr', 'md.gro') # Full tpr as reference
print(u.atoms[0].type, u.atoms[1].type, u.atoms[2].type)
This prints:
OH HO CI
P O O
OH HO CI
As you can see, loading the truncated tpr file still results in atom types matching the removed 5’ end of the nucleic acid strand, rather than starting on nucleotide 3.