Viscosity, einstein_blocks, and error-bars

GROMACS version: 2025.2
GROMACS modification: No

Hi,

Another question regarding viscosity calcs using Einstein. I noticed that, when analyzing a given edr file, with different values for the einstein_blocks parameter, e.g. as in:

gmx energy -f e.edr -o e1 -evisco v1 -eviscoi i1 -einstein_blocks 1

gmx energy -f e.edr -o e2 -evisco v2 -eviscoi i2 -einstein_blocks 2

gmx energy -f e.edr -o e4 -evisco v4 -eviscoi i4 -einstein_blocks 4

and compare the resulting files: v1.xvg vs v2.xvg vs v4.xvg if find that the major difference between the three files is that they cover different total time spans (e.g. 500, 250 and 125 ns, if the total trajectory covers 500 ns) However the actual vales for the same time points in all three cases are close to being identical. (see figure attached). It looks as if there was no good reason to choose any value for einstein_blocks other than 1. Is there something I overlooked here? The way einstein_blocks is described in the documentation is somewhat ambiguous.

Also: I searched the literature, and failed to find any account on how to properly calculate error-bars for such results. If I do it the naive way, just calculating the viscosity for repeated/independent runs, e.g., 15 runs each 500 ns, then the averages of the values in evisco and eviscoi do look reasonable (give me viscosities close to the expected/exptl values), but when I use the standard deviation or RMSD calculated from the 15 individual viscosities I get ridiculously large error bars. Is there a statistically sound way to calculate error bars here?

I should add that I found this: Shear viscosity coefficient of acqueous glycerol from equilibrium Molecular Dynamics simulations … and here some kind of bootstrapping is used to get error bars. When I try something similar for my data, then I do get much smaller error-bars, compared to the naive RMSD mentioned above. But then, this is just a single python script, and there is no discussion about the way error bars are calculated. Is anybody aware of a more systematic account on that topic?

thanks!

Michael

The -einstein_blocks option is a bit confusing. Effectively it “just” sets the maximum time of the Einstein output. But conceptually it doesn’t make much sense to go below 4 as you would only have 4 independent samples for the longest times.

I am not sure if I understand your “standard deviation” error estimate. The best way is to compute a standard error estimate from independent simulations, which is stddev/sqrt(N-1). This would be stddev/sqrt(14) for your 15 simulations.

But conceptually it doesn’t make much sense to go below 4 as you would only have 4 independent samples for the longest times.

that’s what I thought at first, but as suggested by the figures Included above, the setting of einstein_blocks does not seem to do what I expected (splitting the data in the edr over, e.g. 4, subsets, followed by averaging). I am not sure I fully understand the code, but when I look at the subroutine “einstein_visco” in gmx_energy.cpp it looks as if the code tried to use, in any case, and for each time point, as much of the provided information as possible (i.e. ALL stress data points in the edr), leading to the fact that the output in evisco.xvg is basically independent of the choice for -einstein_blocks, the only difference being that (as you write) for einstein_blocks>1 the maximum time of the output is smaller, but the results for this time window that is actually printed are independent of einstein_blocks (apart from very small differences, probably due to round-off errors)

regarding the error bars … I know how a standard deviation is defined - my question refers to the choice between this, what I’d call the naive way: using the stddev from viscosities as calculated in a series of independent simulation runs, versus doing a bootstrap (as in the script by michele’s zenodo repo). I find that overall my results, as judging from the averages, look very reasonable, but when I use the stddev from multiple independent simulations to calculate error bars the resulting error bars are huge, compared to the differences i see between my different systems. If I do bootstrapping (as done in Michele’s script) the resulting error bars are much smaller. I am not a statistician, so I wonder how to justify the bootstrap solution - well, I guess this is probably off-topic …

cheers

Michael

The block count only affects the end time in the output, and necessarily, the time the computation takes. The results for shorter times are identical. That’s why I said that calling this “blocks” is confusing.

The standard error estimate should be nearly identical to the bootstrapping error estimate. I don’t understand what you have done with the “stddev” then. Do you somehow use the stddev of individual simulations instead of only the averages?

thanks for the swift reply! … I guess the einstein_blocks is clear now … as for the error-bars, after i read your msg, I double-checked, and its a typo in one of my scripts ;) - sorry for wasting your time!

thanks!

michael