Using dynamic selections with tools such as trjconv

GROMACS version: 2020.4
GROMACS modification: No

Hi everyone,

does anyone know whether there’s a way to use dynamic selections with tools such as trjconv?

I’d like to use some analysis tools such as gmx msd to understand the behavior of my solvent, but only in the vicinity of my polymer.

One approach I tried was to use gmx select to get the indices that were in the vacinity of the polymer at each timestep, i.e. using gmx select and going:

resname SOL and  within 0.5 of group "polymer" 

… which gives me the atom indices which satisfy this criteria as a function of timestep in an index file.

Passing this index file to analysis tools such as trjconv to output those atoms in the viciniity of the polymer at each timestep doesn’t work - the closest I get is a trajectory which includes all atoms that will at some point be in the vacinity, rather than only those at each timestep.

Any suggestions would be greatly appreciated.

Cheers,
Ben

Hi Ben,
As you have noticed all gmx tools support dynamics selection.
Maybe gmx trjorder gmx trjorder — GROMACS 2021.2 documentation or gmx trajectory gmx trajectory — GROMACS 2021.2 documentation may be useful for your scope.
\Alessandra

Hi Alessandra,

have dynamic selections only been implemented in the 2021 version? I’m using 2020.4 and it doesn’t seem trjconv, trjorder, msd etc support dynamic selections. What I mean by that is, the only option to select for output/analysis is to select a group (either default or supplied by an index file). For instance, trying to use the selection:

SOL and  within 0.5 of group "polymer" 

within gmx msd, where both “SOL” and “polymer” are groups supplied by an index file, I simply get…

Selected 8: 'SOL'
Reading frame ....

It’s possible to use gmx select to create an index file for the selection, but the groups in that file are (using the above selection as an example) all SOL within 0.5 of polymer at time=0 (group 1), at time=1 (group 2) etc… But when passing this index file to tools such as gmx msd, the only options seem to be to select the first group (i.e. only SOL near polymer at t=0, which in subsequent frames won’t be near polymer) or select all the groups (in which case all SOL which will at some point be near polymer are included). Ideally the tool would only use the group in the index file that’s relevant to the current frame in the trajectory it’s reading in.

Perhaps this is just a syntax issue, but for the life of me I can’t find any info on selection syntax for tools like gmx msd.

Cheers,
Ben

@alevilla is this true? I thought it was the opposite - for years, it has been the case that basically none of the analysis tools support dynamic selections.

I don’t see how one could make gmx msd even work with dissimilar numbers of atoms/molecules in each frame without a complete re-write of the code (if it could even be done).

Hi Justin,

thanks for getting back to me, and for your years of work on Gromacs.

I was able to use a combination of gmx trjorder and some code I wrote to make trajectories only including the N closest molecules to a group of interest. Therefore dissimilar numbers of molecules wasn’t an issue, but I presumed having a different set of atoms between each frame might also cause issue with gmx msd. I was able to write some more code to take those trajectories and, making extensive use of dictionaries, track each molecule during its time in the region of interest and calculate MSD and rotational correlation functions.

Cheers,
Ben

Only a few tools so far support dynamic selections. Tools using dynamic selections need use the modern trajectory analysis framework, only about 10 of those have been migrated to the framework so far. I’ve been working on a few migrations, but the core devs are busy with higher priority work and I only have the occasional weekend to work on it.

I actually did just rewrite gmx msd for the trajectory analysis framework, but selections are still static. It’s not super clear how valid dynamic selections would be for this tool - for longer time deltas, you surely have some particles that have diffused out of the selection range. Do you keep or discard those points? If you keep them, you’re not necessarily sampling what you wanted to. If you discard them, you’re artificially suppressing observables and your observed diffusion coefficient will be lower than the true one.

I suppose we could make the behavior in that case configurable. I think It’s feasible if there’s a strong demand for it, but I’m not convinced there’s a valid use case, unless you restrict to short time deltas.

Hi Kevin,

thanks for your input on this.

It’s not super clear how valid dynamic selections would be for this tool - for longer time deltas, you surely have some particles that have diffused out of the selection range. Do you keep or discard those points? If you keep them, you’re not necessarily sampling what you wanted to. If you discard them, you’re artificially suppressing observables and your observed diffusion coefficient will be lower than the true one.

Yes, all issues I encountered. The dodgy python code I’ve conjured up builds a set of all ‘complete’ trajectories, i.e. if a molecule diffuses out of the zone, then back in later on, it’s treated as two unique molecules. But as you say, this is effectively suppressing observables. For instance as you eluded to, data points at long tau are all from molecules that haven’t diffused out of the zone.

The MSD vs tau plots I’m getting are in line with gmx msd for short tau. At long tau, they approach an asymptotic value which I presume is proportional to the size of the zone of interest, looking much like a particle stuck in a well or pore.

Considering the issues with this approach, would you suggest some other form of analysis for the solvent dynamics within a region of interest? I’ve also made something similar to gmx rotacf to work on a region of interest, which may be less susceptible to the forementioned issues.

Cheers,
Ben

Sounds like you’ve got the right concerns. I don’t have great suggestions (kinda gave up on those sorts of analyses myself for similar reasons), but I wonder if for some meta-analysis it might be useful to run the script for just a bulk water segment and see how boundary size and tau ranges affect the observed diffusion coefficient.

Actually - I just remembered what I had planned to do before shelving the project where I was thinking about this problem.

I’m struggling to find a reference, but from any diffusion coefficient you can calculate an expected probability distribution of individual displacements for a given tau. So you could probably calculate your distribution, and then use a fit to back-calculate the MSDs. That would let you discard boundary-violating particles in a nicer way. Instead of deciding whether or not these excursions contribute to the average (which will bias it either way), you just know that those are outside of your fit range, so whether or not diffusion sped up outside of your boundary doesn’t matter - those particles were going to diffuse out of your boundary range regardless, so you can add larger taus to your fit without much in the way of artifacts, since you still have a lot of valid data points at those taus.

The only point I can think of where you might still have issues is if a particle diffuses back into your boundary region. Those are “affected” by your bulk conditions, but some of them would have diffused back into the boundary regardless, so discarding them is still biasing. Not sure how to deal with that condition, but it’s a lot fewer particles than the original analysis

A more math-intense solution might be to account for conditions outside of your boundary explicitly. It’s easy to calculate bulk MSDs, there’s probably a model you can set up that accounts for a particles diffusion in two different environments with crossover, where your inside-boundary coefficient is the only unknown. I wouldn’t envy trying to work that out but I’d guess it’s feasible

Hi Kevin,

like a Van Hove plot? If I’m understanding correctly, you would end up with a Gaussian (or similar) distribution of particle displacements for a given tau. Then we’re assuming the particles that diffuse outside of the region of interest (ROI) would be in the tails of that distribution, and the fit would only be performed on the range [-L/2,L/2] where L is the size of the ROI? The issue that I can see with this approach (if this is indeed what you meant) is that the displacement is measured relative to each particle in the distribution’s starting position, so those particles starting near the edge of the ROI might diffuse out with only a small displacement (and hence not all particles diffusing outside the ROI end up in the tails of the distribution).

I think I’ll stick with a combination of your suggestion of fitting MSD only at short tau, and well as rotational correlation analysis, which I don’t think is as prone to the issues mentioned.

Cheers,
Ben