Principal Component Regression

GROMACS version:
GROMACS modification: No
Here post your question:

Dear all,
I am newbie on Gromacs and I would like to kindly ask you about some potential analysis that can be carried out on the md trajectory.

Currently I am studying a peculiar molecular crystal, from which I have identified a couple of interesting configurations I would like to further analyze. On top the MD simulation, I have calculated the Principal Component Analysis with Gromacs gmx covar and gmx anaeig.

I was wondering if it is possible to carry out a Principal Component Regression analysis, starting from the 2 relevant configurations I sampled and taking them as reference, compute the relative RMSD and dot product with respect to the PCs previously computed, and to filter out the motion related to the two sampled configurations.

Thanks in advance for any suggestion and tip!
Best,
Tommy

Hi Tommy,

Of course, you can carry out the PCR - if you want to play with trajectory data in this way, I suggest using scipy / scikit-learn in combination with MDAnalysis or simlar tools to get your data into python first, because GROMACS analysis scripts do not support the whole zoo of regression methods that you will find in that type of python package.

However, it’s good to have a clear goal in mind on what behaviour you want to predict with this regression model? If you have an observable and want to find out the movement that is related to the change in observable, I suggest you have a look at this method

If you just want to explore what your system does during your simulation, I suggest you use the GROMACS tools that allow you to project your trajectory onto principal components with the GROMACS tools, or, alternatively, use the frames that have extreme values in the PCA projections to get an idea of the range of movement in your system.

Best,
Christian

Hi Christian!
Oh I see, thanks a lot for the suggestions and paper, I will have definitively a look at it.

In the plethora of codes out there it is really hard to figure out which is the best one suited for this kind of analysis.

Thanks a lot again for the suggestion!
Best,
Tommy