MSD differs when processing the same trajectory using command gmx_mpi msd

Hi every one,
I have a question regarding calculating MSD and diffusion coefficients using GROMACS:

In my calculations, when computing the diffusion coefficient of potassium atoms using a simulation of 40 picoseconds (ps), I noticed that the results vary depending on the time interval considered.
Specifically, when considering intervals of 1-8 ps, extracting data from trajectory (trr) files ranging from 0-10 ps, 0-20 ps, and 0-40 ps yield different results. The values obtained are as follows: 40 ps: 1.9623, 20 ps: 2.2871, 10 ps: 3.8900 (in units of 10^-5 cm^2/s). I would like to inquire about the possible reasons behind this discrepancy.

Yesterday, I attempted to visualize the trajectories of the first 10 picoseconds (ps) for the three sets of trajectory (trr) files using VMD. Surprisingly, I found that all three sets of trajectories appeared identical. However, when fitting the mean squared displacement (MSD) against time for the 1-8 ps interval using GROMACS, the results varied for each set of trajectories (as illustrated in Figure). I am curious about the potential reasons behind this inconsistency.


Figure. The MSD plots obtained from fitting the trajectories of the three sets of files are presented below, with each set representing calculations based on files spanning 10 ps, 20 ps, and 40 ps, respectively.

Obviously the trajectories are the same, I don’t understand why the diffusion coefficients differ.

Thank you for reading and answering the question.

Best Wishes.

I think you need to explain a bit more.

If the three trajectories are from three different simulations it is expected that they are different. When you say that the trajectories appeared identical, how did you compare them? It is difficult to spot minor differences by just comparing them by eye. It is also expected that the drift, from the starting configuration, will be larger the longer the simulation.

With longer simulations, the difference is expected to be different. I.e., if all three had been 10ns instead I think you would find the MSD plots fairly similar. But even then you would have differences, that is the reason why you should run multiple replicas to get an uncertainty approximation of your measured values.

Thank you for your consideration!
The three trajectories are from the same simulation, I just took 3 trajectories with different time length. I have already checked the trajectories, when I use -b 0 -e 10 command, they showed the same result. Once I remove the command, the result varies.

I see. Then I think it might be explained by (sorry about the automatic highlights):

"By default, gmx msd compares all trajectory frames against every frame stored
at -trestart intervals, so the number of frames stored scales linearly with
the number of frames processed. This can lead to long analysis times and
out-of-memory errors for long/large trajectories, and often the data at higher
time deltas lacks sufficient sampling, often manifesting as a wobbly line on
the MSD plot after a straighter region at lower time deltas. The -maxtau
option can be used to cap the maximum time delta for frame comparison, which
may improve performance and can be used to avoid out-of-memory issues."

See what happens if you set -trestart to 2 ps.

Thank you again for your warm assist.
Unfortunately, due to my model of MD, which was supposed to be used to estimate electrical conductivity of molten hydroxide, the total time span of my simulation using CP2K is only 40 ps. If I set -trestart to 2 ps, probably there are too few points to calculate a correct MSD. So I have to set -trestart to 0.001-0.05 ps to utilize the frames I calculated as many as possible to get a reliable result.
I would be very grateful to hear any other possible solutions. Thank you!

By the way, when I calculate the electrical conductivity using the Nernst-Einstein relations (σ=Dnq^2/kT, D=diffusion coefficient, n=number density, q=charge)in molten systems, I always get larger results. Even though I used the data of diffusion coefficient published in the paper, I always get a much larger result than the paper (molten KCl, 4.8S/cm vs 2.75mS/cm). As for molten KOH-NaOH eutectic hydroxide, the results would be 4-5 times larger. Do you have any suggestions?

OK. I didn’t you already used such a low -trestart. 0.001 ps sounds very low. What timestep do you use? Do you see any difference between the three trajectories with -trestart 0.001, -trestart 0.05 and -trestart 0.1? It doesn’t really matter if you’re getting a correct MSD. This is just to understand the differences between the three trajectory lengths.

I’m afraid I can’t explain why your results are different from the published ones. Are you sure you are using the correct units? I haven’t used the Nernst-Einstein relations myself, so I don’t know if there’s anything specific to keep in mind.

Thank you so much for consideration.
When I use the command
gmx_mpi msd -s K5N5-250.gro -f K5N5-250-A.trr -trestart 0.1 -beginfit 1 -endfit 8 ,
the result is
D[ K] 3.8735 (+/- 0.5477) 1e-5 cm^2/s;
When I use
gmx_mpi msd -s K5N5-250.gro -f K5N5-250-A.trr -trestart 0.05 -beginfit 1 -endfit 8
the result is
D[ K] 3.8711 (+/- 0.5472) 1e-5 cm^2/s
NO significant changes happened. It seems that the MSD is OK.

The timestep is 1 fs per frame.

I’ve been reading up a bit more on this. See, e.g., molecular dynamics - How to calculate diffusion coefficient from MSD graph using GROMACS? - Matter Modeling Stack Exchange. The problem is that you will get a noisy curve when you are approaching the end of the trajectory, due to less available data to sample - worse statistics. That uncertain region will start earlier with shorter trajectories.

With your data I would use the full 40 ps trajectory and fit from 2 to 20 ps.

I hope that helps.

Thank you so much for your advice! I would try to fit it.