RTX 3090 vs RTX 3080//CPU-GPU ratio//best performance

GROMACS version: 2020.1
GROMACS modification: Yes/No
Here post your question
Hi everyone,

I am planning to upgrade my workstation by new GPU and CPU, mainly for MD simulations using Gromacs.

I have Titan Xp and intel 7820x. It looks like Titan Xp doesn’t give any privileges for MD in comparison to GTX 1080 Ti, but the price is significantly higher. Doesn’t higher memory of GPU make sense for MD? The CPU has 8 cores, that definitely not enough for me, because it can cover MD simulation only, and doesn’t allow any other performance at the same time, when I am using the system of more than ~100K atoms.

What the ratio of CPU to GPU should I use for the best performance of the system of 250K atoms and less? Does it really matter to use RTX 3090 instead of RTX 3080 or the differences in performance of these two for MD are not significant? How much RAM should I choose for that?

Sincerely,
Vlad

HI,
I found RTX 3070 is best with possible 12+core processor. These are cheap than other card and CUDA core is higher.
for MD simulation you need to have only 32GB max but a 64GB would be nice.
you must use at least 12+core processor for RTX cards

Note that for many years GROMACS releases have not needed 12+ cores (the rule-of-thumb has been 3-4 fast / 6-8 slow cores until 2019) and recent releases run well with just a few cores/GPU. This will of course depend on the type of simulation and the resulting workload.

Secondly, for the simulation itself you do not need a lot of main memory, e.g. 1 GB per core is sufficient. Of course, other tasks like analysis or visualization will often require more memory, so 32-64 GB is a good choice.

The amount of GPU memory needed is also quite moderate, e.g. running a 1 million atom system with GROMACS 2020 full step offload requires ~9 GB of GPU memory.

GROMACS does, through a careful design, support as many of its features as possible in combination with all acceleration and parallelization features. This however means that the amount of work (relative to the total) assigned to the CPU will depend on the input and settings; for instance pull or free energy simulations will require more CPU work than “vanilla” MD with the same simulation system (and “base” settings).

However, it is safe to say that in general, a few faster desktop/workstation CPU cores (or 1.5-2x more server) per GPU are sufficient to get close to peak performance.

For more details on the idea behind, and principles of the heterogeneous design as well as some recent performance benchmarks see our recent paper: https://arxiv.org/abs/2006.09167; here you will also find plots that show how the performance depends on the number of CPU cores per GPU for both single trajectory and ensemble simulations.

I have not measured the performance difference between the two, so I can not comment.

The highest-end GPU, will generally offer the best performance, but it will often not be the best value for money. A faster GPU may also run slightly faster with a couple of more CPU cores compared to a previous-gen or lower-end GPU, but as explained before this does depend on a few things.

Yes, adding few more cores to new GPU’s giving a significant difference in the results.
I have tested GTX 1070 with 8 core and 12 core processor and found a difference of +20-30 ns per day

Please report concrete GROMACS version, simulation system details/settings (at least size, FF, and vanilla MD or not), and hardware as well as relative improvement (+30 is 50% extra wrt 60, but only 5% wrt 600 ;)). Otherwise your experience is hard to extrapolate from and risks creating a false impression that everyone needs X amount of CPU cores.

That you need 12 cores with a GTX 1070 to get ~ peak performance in general is simply not the case, not even for GROMACS 2018 (as we’ve shown in https://arxiv.org/pdf/1903.05918.pdf).

As I said, the resource balance will depend on your use-case and in many if not most cases of vanilla MD the performance will approximately flat from 4-6 cores per GPU, see the paper I linked.

case 1- 3 x GTX 1070 with 2 x 8 core processor gives 25 ns per day (Typical protein-ligand system with approx 85k atoms )
case 2- 3 x GTX 1070 with 2 x 12 core processor gives 50 ns per day (Same system)

config in case 1 with protein-membrane simulation system with approx 130k atoms gives 10 ns per day
config in case 2 with the protein-membrane simulation system gives approx 30 ns per day
I have used gromacs 5.1 +version
The thread assignment was ntmpi 3 -ntomp 14 for three gpu 0,1,2
I designed this as per your previous work (https://onlinelibrary.wiley.com/doi/10.1002/jcc.24030). I will go through the new paper.

That is ancient. Please use a more recent version.

Thanks a lot!