GROMACS version: 2024.4
GROMACS modification: No
Hi all.
I have access to a workstation with a ryzen 9 5900X CPU (12 cores, 24 threads) that now happens to have two GPUs (a RTX3060 12 Gb and a RTX4060Ti 8 Gb).
I decided to use a system of two proteins interacting that I have at hand as a benchmark for the computer.
I always use the option “-pin on”.
Using only the CPU it peaked at 14ns/day
The RTX3060 12 Gb peaked near 56 ns/day, both with all 24 threads or just 6 or 12 threads requested.
And the RTX4060Ti 8Gb peaked 95 ns/day, again in all 3 number of threads tested.
Then, I chose to try to run 4 simultaneously runs (so that each gpu would have 2 processes at the same time, and 6 cpu threads dedicated). I used the “-pinoffset” and limited the number of threads to control that no cpu would be requested twice. RTX3060 and 4060Ti performances fell to 22 and 31 ns/day respectively. When enabling all options to make the run to put everything it could on the gpu, it raised slightly to 25 and 36, respectively.
After that, I decided to have just two simultaneous runs: each one would get half cpu threads, and one of the gpus. A typical, no special options run, reached only 24 and 27 ns/day.
However, things got interesting when I once again enable all the “run it on the gpu” options, when only two processes at once raised the performance now to 43 and 83ns/day.
Is it to be expected? Considering that the CPU threads are not being used twice for different processes? Is there a way to improve several simultaneous simulations processes perfornace?
Thanks a lot in advance!