GROMACS version: 2022
My research group is considering buying a new GPU unit to be used with GROMACS simulations.
Which of the following NVIDIA GPUs is best suited for GROMACS? RTX3090, A5500, or A30?
I would also appreciate your input on the reason behind your choice.
Thank you!
Regular GPU card for gaming is enough
Depends on the size of system you use. Any decent gaming card (RTX 3090/4080/4090) will be fine for system sizes upto 500,000 atoms. Even with these GPUs, the CPU will become a bottleneck and you will need to get a CPU with anything between 12-24 cores.
With any version since Gromacs-2022, you have the option to run entirely on the GPU (as long as you only run on a single GPU), so then you won’t be bottlenecked by any CPU.
Everything depends on price; we just got some -40 new RTX 3070 & 3080Ti cards, primarily because the price was awesome and these cards had models where we could fit four of them in each 1U node we upgraded.
If you are buying a new card for a workstation, 4090 might be a good deal, or you might want to wait for 4070 cards - unless you find good prices for 3080.
AMD cards are fine too, but unfortunately the drivers are not yet at stable as nvidia, so it might not be the best bet for the first cards you are getting. Same goes for Intel; there’s promise there, and potentially good value-for-money, but drivers are not mature, and absolute performance cannot compete with high-end nvidia cards yet.
Cheers,
Erik
Hello Erik,
“With any version since Gromacs-2022, you have the option to run entirely on the GPU (as long as you only run on a single GPU), so then you won’t be bottlenecked by any CPU.”
What is this option? I couldn’t find it through quick internet search.
Thanks!
Hello, have you figured out how to operate entirely simulation on the GPU, I also confront this situation with a low price CPU i5 10600kf and a 3080GPU
I recently upgraded to Gromacs 2023 but found GPU utilisation at 45-60% (using nvidia-smi) while the 16 CPU cores were >97% utilizes. The GPU and CPU utilisation was comparable on 2021.3 version. Is there any specific option to offload to GPU?
To make the simulation GPU resident, use the following flags along with your mdrun command -update gpu -pme gpu -nb gpu
This should bring the GPU utilization up to ~90%
Thanks for the info
Please read the following user guide section: https://manual.gromacs.org/documentation/current/user-guide/mdrun-performance.html#running-mdrun-with-gpus
gmx mdrun -deffnm run03 -s run03.tpr -c run03.pdb -nb gpu -pme gpu -pmefft gpu -npme 1 -ntmpi 6 -nt 6 -bonded gpu -update gpu -gputasks 012345
I use this kind of command in Gromacs 2023.2 running on a six v100 GPU server. However, GPU utilization for one gpu is <50-60%. Sometimes I will need more threads (12 threads).
If someone has even a moderately powerful CPU like 12th gen or later i7 CPU (this is just an approximation, sometimes a 10th or 11th gen i7 CPU with 16 threads would be enough) and a good GPU like RTX3080ti, RTX3090, RTX4070-90, he/she can offload bonded calculations to the CPU and other calculation to the GPU using this kind of command,
gmx mdrun -deffnm production -cpi production.cpt -s production.tpr -c 300ns.pdb -nb gpu -pme gpu -pmefft gpu -npme 1 -ntmpi 20 -ntomp 1 -ntomp_pme 4 -bonded gpu -update gpu -
In this command, I assume there are 20 threads available.
Check the performance changing -npme value, PP:PME ratio must be optimum. also he/she can change nt to see performance change. Sometimes, lower total threads may give better GPU utilization.
Check GPU options mentioned in gmx mdrun options. There are several ways to change the CPU/GPU offload. Someone have to find the optimum values by trials. It can be done by puttiing mdrun using one set of values, waiting about 1-2 mins and cancelling by Ctrl+C to see performance. During each run you can check the GPU usage using “nvidia-smi” command.
Additionally,
During the mdrun, CPU usage must not be interrupted by other processes. CPU always does the sequential part for each parallel run on the GPU, if the sequential part of any parallel part is disturbed other parts have to wait for that part. It will make a bottleneck. If someone needs to put another run (for example molecular docking run) he/she must use a separate number of thread as -nt for the mdrun and other threads for that run separately by changing the CPU usage of that process if possible. (If it is not possible to limit the CPU usage of that process, better to wait for the mdrun to end before starting that run.