Performance on Fujitsu A64FX

GROMACS version:2021.1
GROMACS modification: No
Here post your question Dear All,
I am running a protein-water simulation system using 500 cores (10 nodes x 50 core processors). The performance I am able to achieve is 120 ns/day. When I increase the nodes to 20 ie 1000 cores, I am not able to get any improvement in the performance.
The Administrator has informed me that the GROMACS performance is not up to the part for A64FX processors compared to Intel processors.

Can someone can give me any comment on this, why am not able to get better performance per core.

I have tried using the ARM performance library and got 2X speedup.
Also, with use of ARM_SVE SIMD support could increase your performance.
Can you share your build step and compiler details so that I can recommend some flags?

There are two types of cost

  • Computation cost
  • Communication cost

With increasing core counts, communications cost dominates. When you use more cores, even though each core has smaller works (computation) to do, but it needs more communication. As a results performance does not scale linearly, it will flat out after certain cores count. I do not think, this behavior has anything with Fujitsu’s processor.

If share your log file, it may be possible to recommend changes that may improve your performance.