PCA on simulations of different MD structures

GROMACS version: 2024.1
GROMACS modification: No
Hello everyone, I’ve conducted a set of 3 individual simulations of a protein-DNA complex, each simulation having a DNA of different sequence. The simulations ran fine and results make sense. Now I am trying to make a visual comparison of the motions of each DNA with respect to the protein to support my statement that there are sequence dependent motions. However, while I can use gmx covar and then gmx anaeig to do a PCA on an individual simulation, from my understanding, doing this for each one would just result in 3 different set of principal components that cannot be compared to each other. From a quick search seemed like doing PCA on one and then projecting the trajectories of all simulations onto the eigenvectors of one is the way to go. Again, doing this for the first structure would go something like:
gmx covar -s struc1.tpr -f traj1.xtc -o eigen1.xvg -v eigenvec1.trr -n index1.ndx
gmx anaeig -s struc1.tpr -f traj1.xtc -v eigenvec1.trr -2d proj2.xvg -first 1 -last 2 -n index1.ndx
However, when trying to map those eigenvectors to the trajectory of the second simulation like:
gmx anaeig -s struc1.tpr -f traj2.xtc -v eigenvectors1.trr -2d proj2.xvg -first 1 -last 2 -n index2.ndx
I get an error that there are inconsistent shifts over periodic boxes, which makes sense given the trajectories were removed from PBC based on their respective structure, so shifts are different. My questions are:

  • Does my approach make sense?
  • Do I need to use the trajectory files without removing PBC?
  • How can I fit different trajectories to the same starting point when there are different number of atoms in each simulation? Should I make a .tpr and .xtc file containing only the atoms of the DNA chains I’m interested in comparing?
    Any guidance is appreciated.

Option 1) Convert your trajectories so that you only keep DNA backbone + protein, this way they will all be identical.

Option 2) Extract helical parameters from the base pairs (X3DNA/Curves+ style) and perform PCA in the space of helical parameters of the whole duplex (using, say, scikit-learn).