GROMACS modification: No
I calculate GROMACS on the supercomputer. The configuration is 64vcpu + 8 * V100 32g
How do I allocate resources to achieve the best balance?
I tried to calculate with 1 core + 1gpu, which is not very good
Gromacs works bad with multiple GPUs, speedup is much less than what the first gpu do. According to: GitHub - Biu-G/gromacs-rocm: Gromacs that can be accelerated at AMD GPU
Typically 3-6 cores per GPU are sufficient to get near peak performance. However, running across multiple GPUs requires a simulation system large enough to scale, a GPU interconnect fast enough as well as direct GPU communication enabled. You also need to make sure to use appropriate task assignment (e.g. 3 PP + 1 PME GPU).
What you linked does not have any information on what and how was benchmarked, hence it can not be verified or assessed. Please refer to reliable and reproducible performance data.
OK, mine is 8 v100,64vcpus for a single node. Computationally, it does not get better performance than CPU, compared with AMD 256core. Eight V100 cards have no advantage
Have you tested scalability of your simulations on your hardware (i.e. that you get speedup from 1 to 2 to >2 GPUs)? Is it quite common that a simulation system is just too small and/or CPU-GPU (or GPU-GPU) interconnect too limiting to be able to efficiently scale across many GPUs.