Odd performance scaling on AWS

GROMACS version: 2020.2
GROMACS modification: No
MPI Library: OpenMPI 4.0.5
Job input: ion_channel.tpr from Unified European Applications Benchmark Suite

I’ve been running some performance tests on AWS public cloud using EC2 instances (c5.4xlarge) in a private VPC and cluster placement group. I’ve noticed the performance scaling is really odd with two nodes performing just 20-30% better than a single node and similar with performance across 4 and 8 nodes. I’ve noticed the same trend with a smaller job input.

I have also tried using larger instances with better networking (i.e. elastic fabric adapter) and noticed the same.

Has anybody else noticed the same or is this an inherent feature of running HPC in the cloud?

Thanks!

Hard to say without specific data, but in general:

  1. I wouldn’t even try scaling to many nodes without elastic fabric adapters.

  2. Post scaling data for e.g. 1,2, and 4 nodes for a system, and we can compare it to performance on traditional clusters.

  3. For very small systems you will run into scaling limitations, but the ion channel benchmark should scale reasonably well.