To calculate distance between C1 and C2 atoms of each molecule during a simulation, I loop over the molecules (molecules # 3000) in the trajectory, likes below inefficient bash script:
for i in `seq 1 3000`
do
gmx distance -s topol.tpr -f traj.trr -n index.ndx -select ‘group "r_’$i’_&_C1" plus group “r_’$i’_&_C2”’ -oxyz $i.xvg
done
As I was expecting, the process is so time consuming as for each $i (each molecule), the rest of trajectory belonging to other 2999 molecules are also loaded, get processed and then discarded.
So, I wonder if there is any other better way in BASH to read and store the whole trajectory just ONCE, and then do the distance calculations?
I also tried to partitioned the trajectory to single molecules trajectories $i.trr first, and then apply gmx distance on each $i.trr, however, no improvement I could noticed.
Thanks Micholas for your comment.
The nodes have 44 cores or 32 cores, however even if I had a node with 3000 cores corresponding to the number of molecules, I believe that sending the distance commands to background wouldn’t still solve the issue of unnecessarily reading/loading the whole trajectory 2999 times. I am more looking for a way in BASH to read the trajectory only once.
I’m not really sure that you can tell bash to read the trajectory only once because you’re using a GROMACS tool to read the trajectory and do the distance calculation. The only thing bash is useful for there is doing the looping to rerun the command over and over.
If you’re comfortable with using Python then MDAnalysis might work better for you.
For most applications using these tools you will only load the trajectory once and then you can iterate over the frames of the trajectory and calculate your distances of interest. Hopefully this helps.