PCA analysis

GROMACS version: 2022
GROMACS modification: No

Hi everyone,
I have simulations for wt and mutated isoforms of a protein. I want to study the system with a PCA but I have a couple of questions about it because I’m not an expert. After using the covar command (only on a part of the protein to reduce the calculation costs), I checked the size of the eigenvalues. I have of course a first value that is larger than the others but considering the sum of all of them, the first few eigenvalues represent only slightly more than 50% of the events. Does it make them suitable for the following analyses? Or are they not so representatives? I was wondering if the first 2 eigenvectors should represent the majority of the events (like more than 60% or higher?) to be used for other analysis. I tried also to select a smaller area of the protein but the results are similar. Moreover, does a PCA with such a small representation in the first eigenvalues suggest something wrong in the simulation?

Thank you very much in advance,
Martina

it is ok, but you need to do more analysis to get a conclusive information. The dynamics is complex here (assuming there are no mistake in PCA calculations).

Thank you very much. More specifically, I used this command selecting Calpha as fitting group:

gmx covar -f md_noPBC.xtc -s md.tpr -n index.ndx -o eigenval.xvg -tu ns -v eigenvec.trr

And then I want to analyze the results with something like this:

anaeig -v eigenvec.trr ­-f md_noPBC.xtc ­-eig eigenval.xvg ­-s md.tpr ­-first ­-last ­-2d 2dproj.xvg -comp eigcomp.xvg -rmsf eigrmsf.xvg -tu ns

selecting as first and last the more representatives frames. Then, I wanted to use the first 2 eigenvectors for a free energy landscape but I was wondering if this is ok even with a percentage of representation lower than 50%. Maybe this analysis won’t be particularly representatives while the other results made with anaeig are still valuable?

you need to check fel with other pc’s, how big is the receptor?

Is it possible to check fel with more than 2 principal components? I only knew about the possibility to chose 2 of them, or do you mean I should run more than one fel analysis (selecting for instance the first 2 and then the 3rd and 4th pcs) and then compare them?
The protein is 1046 amino acids. I’m doing the PCA with an index that include about 500 residues which are close to the mutation site.

How many ns is your simulation? and which protein it is

It is a 1us simulation

1 microsecond is a reasonable time, still it depends on what protein it is and what you are simulating.