PCA: how to make 2 2d-PCA scatter plots from both wild-type and mutant structures into one figure?

GROMACS version: 2021.1
GROMACS modification: No

Hello, After finishing 300-ns production MD, I want to make a scatter figure for first 2 components for both wild-type (WT) and mutant (MUT) protein structures. Importantly, I want to put them together in order to compare how different betwenn WT and MUT structures in the 2d PCA.

What I did is listed below:

For WT, in the WT folder:
echo 3 1 | gmx trjconv -s md_300ns.tpr -f md_300ns_noPBC.xtc -fit rot+trans -o mdfit.xtc
echo 3 3 | gmx covar -s md_300ns.gro -f mdfit.xtc -o eigenvalues.xvg -v eigenvectors.trr -xpma covapic.xpm
echo 3 3 | gmx anaeig -s md_300ns.gro -f mdfit.xtc -v eigenvectors.trr -first 1 -last 2 -2d 2d_WT.xvg

For MUT, in the MUT folder:
echo 3 1 | gmx trjconv -s md_300ns.tpr -f md_300ns_noPBC.xtc -fit rot+trans -o mdfit.xtc
echo 3 3 | gmx covar -s md_300ns.gro -f mdfit.xtc -o eigenvalues.xvg -v eigenvectors.trr -xpma covapic.xpm
echo 3 3 | gmx anaeig -s md_300ns.gro -f mdfit.xtc -v eigenvectors.trr -first 1 -last 2 -2d 2d_MUT.xvg

Then, I can obtain 2 files: 2d_WT.xvg and 2d_MUT.xvg. I exported these 2 xvg files into Excel and after removing the header comments, I can make scatter figure for each of them like below.
WT
MUT

It is ok for individual 2d PCA plot, but I want to know how to put them together in one figure like this below one:
example_pca

What is the proper way to get this? Is it ok just simply putting the MUT points into WT figure?

Thanks for your help.

I tried to do like this:

Suppose in 2d_WT.xvg and 2d_MUT.xvg, I have 5 structures for each of them like below.
class PC1 PC2
1 WT 1.0 2.0
2 WT 3.0 5.0
3 WT 5.0 3.0
4 WT 7.0 6.0
5 WT 2.0 8.0
6 MUT 2.0 3.0
7 MUT 4.0 7.0
8 MUT 5.0 5.0
9 MUT 5.5 6.5
10 MUT 3.0 6.8

First 5 from wild-type PC1 and PC2, and last 5 from mutant PC1 and PC2. Then I can make a PCA graph using R where I put both WT and MUT PC1 and PC2 scatter graph together.

library(ggplot2)

pcWT ← data.frame(class = rep(“WT”, 5), PC1 = c(1, 3, 5, 7, 2), PC2 = c(2, 5, 3, 6, 8))
pcMUT ← data.frame(class = rep(“MUT”, 5), PC1 = c(2, 4, 5, 5.5, 3), PC2 = c(3, 7, 5, 6.5, 6.8))
pcWTMUT ← rbind(pcWT, pcMUT)

ggplot(pcWTMUT, aes(x = PC1, y = PC2)) +
geom_point(aes(color = factor(class), shape = factor(class))) +
guides(color = guide_legend(title = NULL), shape = guide_legend(title = NULL))

The result figure is shown below:
pca_test

Can you give some suggestions about what I did? Is it right or wrong to put two PCA graphs together directly (showing in one graph) like I cited one? Finally, using this graph, I expect to explain WT and MUT global motion, for example if MUT is more flexible (larger PC1 and PC2 range) compared with WT.

One thing I am most not sure is that if 2d_WT.xvg and 2d_MUT.xvg can be used in my way to make one combined PCA graph toghter for both WT and MUT, because the first 2 PCs from WT and first 2 PCs from MUT were obtained separately. Are they ok to compare together or need other steps to get a proper PCA graph showing these two?
Thanks.

Hi,
To plot the two trajectories in the same plot, you have to plot them respect the same eigenvectors.
Now you have generated eigenvectors for the wild type and eigenvectors for mutant independently. Thus the eigenvector are different. What you have to do it is to project the trajectory on the same eigenvector (that also required that the trajectories for the PCA have identical number of atoms).
Best regards
Alessandra

Hello @alevilla Thanks for your help. It seems what I am worried about is correct (ie, what I did for plotting two trajectories in the same graph is wrong like above one). For (that also required that the trajectories for the PCA have the identical number of atoms), yes, they have identical number of atoms because I want to use C-alpha for the PCA like other literature did. In fact the WT and MUT I used is from the same protein with only one points mutation, but they have the same number of atoms for C-alpha (eg, saying 300).

For now, the question is that I have done two MD simulations for WT and MUT separately. The main files I obtained then are (for better distinguishing file names, I use WT and MUT in the front of the file names) WT_md_300ns.tpr and WT_md_300ns_noPBC.xtc for WT; and MUT_md_300ns.tpr and MUT_md_300ns_noPBC.xtc for MUT.

Your suggestion is What you have to do it is to project the trajectory on the same eigenvector. Can you give suggestions how can I do it (I have totally no ideas about it). Or can you instruct what gmx commands should I use for projecting the trajectory on the same eigenvector?

Thanks,
Ming

According to searching on the internet, I found below script seems to be OK.

gmx anaeig -s WT_md.gro -f WT_md.xtc -v WT_eigenvectors.trr -first 1 -last 2 -2d WT_2d.xvg
gmx anaeig -s WT_md.gro -f MUT_md.xtc -v WT_eigenvectors.trr -first 1 -last 2 -2d MUT_2d.xvg

That means that two ‘gmx anaeig’ commands use the same WT_md.gro and WT_eigenvectors.trr, by doing so, the MUT can be projected into the WT eigen vector.

Then I can plot WT_2d.xvg and MUT_2d.xvg together.

Can help confirm if I am right about this?

Thanks,
Ming