Thank you very much for your prompt reply and for pointing out these critical issues.
I completely agree with your assessment regarding the L-OPLS force field. Given its specific optimization for liquid properties and long-chain hydrocarbons, it seems perfectly suited for our heavy diesel systems. We will definitely incorporate it into our future force field benchmarking to establish a more accurate baseline.
You hit the nail on the head regarding the GROMACS version. Our university’s supercomputing center maintains a very conservative software environment for “stability” reasons, which is why the default module is still stuck on a version from 4 years ago.
Your warning about the necessity of GROMACS 2025/2026 for properly simulating shear flow and maximizing hardware utilization is a vital wake-up call for us. I realize now that relying on the outdated version is limiting the physical accuracy of our NEMD simulations and wasting our GPU resources.
I will immediately follow your advice and compile the latest GROMACS 2025 from source in my local user directory to ensure our shear flow methodology is physically rigorous and highly optimized.
Thank you again for your invaluable guidance and for developing such an incredible tool for the community. Your insights have tremendously helped us correct our research trajectory.
Thank you so much for your previous recommendation to use the deform option. It has been incredibly helpful! I applied it to my 22 complex diesel surrogate mixtures running 20 ns. Overall, the results are very encouraging, with an average viscosity error of ~17% compared to experimental data.
However, I am encountering extreme variance with a few specific systems. For example, when I ran the exact same system (S_2) three times with different random velocity seeds, I got drastically different macroscopic viscosities: 1.71 cP, 8.58 cP, and 2.23 cP.
Upon checking the instantaneous P\_{xz} data, I noticed massive thermal noise, fluctuating between +400 and -500 bar (standard deviation ~245 bar).
Given this context, I would love to hear your insights:
Is this extreme variance across independent replicas expected for complex hydrocarbon mixtures in NEMD (perhaps due to transient molecular entanglements or the low signal-to-noise ratio)?
To obtain a reliable viscosity for these highly fluctuating outlier systems, would you recommend simply averaging 3-5 independent replicas, extending the simulation time significantly (e.g., >100 ns), or another approach?
Thanks again for your continuous guidance and time!
The instantaneous pressure can fluctuate a lot. Your numbers look very high though. What is the size of your system?
Running multiple replicates or one long simulation makes no difference, as you should be running much longer than the autocorrelation time of the stress. Compute the autocorrelation time to see what times you need. Note that you might need to run extremely long, depending on the viscosity of your system.
If you system is shear thinning and ordering, you might get even longer correlation times due to the ordering.
Thank you so much for the incredibly insightful and prompt reply. Your point regarding the stress autocorrelation time and the potential for shear ordering perfectly explains the underlying physics of the massive variance I am observing.
To answer your question about the system size: My system contains exactly 1,000 molecules of a diesel surrogate mixture (primarily C10-C20 hydrocarbons), which translates to approximately 30,000 atoms. After NPT equilibration, the simulation box dimensions are 4.8 x 4.8 x 14.5 nm (a volume of roughly 334 nm³). I realize now that this relatively small total number of atoms, combined with the narrow cross-section in the X and Y dimensions, strongly contributes to the extreme relative fluctuations (the ~245 bar standard deviation) in the instantaneous pressure tensor.
Following your expert guidance, my immediate next step is to compute the stress autocorrelation time for this specific system. This will give me a quantitative baseline for exactly how long these simulations need to be extended to achieve true statistical convergence, especially considering the slow alignment and ordering dynamics of these long-chain hydrocarbons under shear.
Thank you again for pointing me in the exact right direction. Your advice is invaluable!
I followed your advice and calculated the stress autocorrelation time. Interestingly, it is extremely short—only about 0.69ps. Since my production runs are 20 ns long, they are already orders of magnitude longer than the correlation time, but the signal-to-noise ratio remains extremely poor at low shear rates.
I also noticed that even when I increase the shear rate and push the system into the shear-thinning regime, the background noise (instantaneous RMSD of P\_{xz}) stays constantly high at around 245 bar.
Given this constant 245 bar noise and the short correlation time, I am facing a dilemma to get a reliable zero-shear viscosity. Which of the following approaches is more physically rigorous in GROMACS?
Ensemble Averaging: Stick to the very low shear rate and run multiple independent replicates (with different random gen_seed) to average out the noise?
Extrapolation: Run at several higher shear rates (where the stress signal easily overcomes the 200 bar noise, but the fluid is shear-thinning) and extrapolate back to zero-shear viscosity using an empirical model like the Carreau equation?
Any further advice on which path is standard practice would be greatly appreciated.
You can not look at a single correlation time. There the very fast correlations you observe. But there are usually also correlations at much longer time scales and those contribute much more to the statistical error. You need to do a bit more advanced analysis to get both.
I would run one long simulation or average over multiple shorter ones. Extrapolation always uses assumptions.
Thank you very much for your insightful advice. It was extremely helpful. Following your suggestion, I performed block averaging (gmx analyze -ee) instead of relying on the fast correlation time, and I am now running multiple independent replicates.
I have attached a few error estimate plots from my 20 ns runs. I observed that the true statistical error reaches a plateau at a block size of roughly 0.3 to 1.0 ns. This yields a stabilized error estimate of about 2.0 to 3.5 bar (which is much better than the ~245 bar instantaneous noise I initially worried about).
Question 1: Given this slow relaxation time (~1 ns) and the stabilized error of ~2-3 bar, would you consider a 20 ns production run per replicate sufficient to sample the local phase space reliably for these fluids?
Question 2: I have noticed a very interesting composition-dependent behavior when running the ensemble averages (multiple independent 20 ns replicates with different gen_seed):
For my “thinner” (less viscous/lower density) diesel models, the steady-state average stress P\_{xz} is very consistent across all replicates.
However, for diesel models with a high proportion of polycyclic naphthenes, the discrepancy between replicates is massive. For example, in one 20 ns replicate the average stress is -171 bar, while another independent replicate of the exact same system averages -63 bar.
Could this huge variation be directly related to the bulky and rigid structures of multi-ring cycloalkanes causing long-lived metastable entangled conformations (phase space trapping)? In other words, are these specific mixtures highly non-ergodic within a 20 ns timeframe? Is it still physically sound to simply take the arithmetic mean of these 20 ns trajectories?
Thank you again for your time and invaluable expertise!
A more viscous liquid could have much longer correlation times. It seems then that your correlation time there is more than 20 ns. I would say that a simulation of 20 ns is rather short. You should be able to run much longer.