Hardware advice for maximizing throughput (CpHMD/Replica Exchange)

GROMACS version: 2021-beta1-plumed-2.9.2-dev-UNCHECKED
GROMACS modification: Yes

Dear Gromacs community,

I have some questions regarding the hardware for running Replica Exchange. After doing some benchmarks on VMs and reading bibliography, it seems that the best bang for my bucks are 8 CPU cores per 1 GPU for a normal MD simulation. Nonetheless, modern CPUs usually have more than 8 CPU cores, thus only using 1 GPU seems inefficient in terms of maximum throughput in normal MD.

As Berk Hess states, it is better for the maximum throughput to have 2 simulations per 1 GPU. Could we further argue that by having 1 thread per replica, would we also limit the CPU communication time, thus further increasing the maximum throughput? In this case, would it make sense to argue for a CPU with slower higher number of threads than for a CPU with faster but lower number of threads? Following the logic, the most cost-efficient way for maximizing throughput should be 1 slow high-threaded CPU with 2 fast GPU.

Lastly, I will be running phbuilder version of Gromacs where “-update gpu” is not available for Replica Exchange. Any feedback or suggestions are greatly appreciated.

Thank you :)