GROMACS version: 2021
GROMACS modification: No
I 've recently built a system with a 32 core Threadripper and two rtx 3090 . Installation of Gromacs 2021 on linux ubuntu went well using
cmake … -DGMX_BUILD_OWN_FFTW=ON -DREGRESSIONTEST_DOWNLOAD=ON -DGMX_GPU=CUDA -DGMX_USE_OPENCL=off -DGMX_CUDA_TARGET_SM=75
In comparing three systems each with two gpu’s gtx 1080, rtx 2080ti and Rtx 3090
and 30,000 300000, 3e6 atoms ( 1AKI in water ) these scale as expected at about 1:2:3 . The RTX 3090 can run 130,000 atoms and 3e6 atoms at resp 162ns/d and 6 ns/d . In each case the gpu’s run at about 50%-70%. the cpu always at 99%
For academic work I have access to Schrodinger’s Maestro. No matter what the model is used in the simulation, Maestro runs the same systems at about twice the speed using one core of the cpu and 99% of a single GPU. I understand GMX is built for large systems and the difference shrinks with larger atom count ( Maestro, 4M atoms 7ns/d ) but still using a single gpu.
I’ve gone through Creating Faster Molecular Dynamics Simulations with GROMACS 2020 | NVIDIA Developer Blog and the “bang for your buck paper”, and benchmarked the 2M atom Ribosome model at 17ns/d on the RTX 3090 but still I am perplexed.
A typical run command is: gmx mdrun -deffnm XXX.npt -bonded gpu -nb gpu -pme gpu -ntomp 4 -ntmpi 16 -npme 1 I’ve attach a log of a run.
So for now, I just want to manage my expectations. Are the speed I see typical ? , is Maestro from another world ? Granted Maestro is a tad bit more expensive, but I feel I must not be configuring the system even close to optimum. A factor of two ( really 4x - twice the speed with 1/2 the gpu ) for most models is at best puzzling.
SR.isop.npt.iso.log (39.9 KB)