GROMACS version: 2021.4
GROMACS modification: No
Here post your question
Dear All,
I’m trying to set up an MD simulation of a fairly large membrane protein (~300k atoms in the system) and would be grateful if somebody could provide suggestions on how to improve the performance (if at all possible).
I have a node with 8 x RTX2080ti, 64 cores (HT; 2.6GHz Xeon Gold 6142), and 384GB.
If I run the this command:
gmx mdrun -v -deffnm ${istep} -ntmpi 8 -nb gpu -pme gpu -npme 1
I get the following report:
Dynamic load balancing report:
DLB was turned on during the run due to measured imbalance.
Average load imbalance: 11.6%.
The balanceable part of the MD step is 66%, load imbalance is computed from this.
Part of the total run time spent waiting due to load imbalance: 7.6%.
Steps where the load balancing was limited by -rdd, -rcon and/or -dds: Z 0 %
Average PME mesh/force load: 1.028
Part of the total run time spent waiting due to PP/PME imbalance: 1.3 %NOTE: 7.6 % of the available CPU time was lost due to load imbalance
in the domain decomposition.
You can consider manually changing the decomposition (option -dd);
e.g. by using fewer domains along the box dimension in which there is
considerable inhomogeneity in the simulated system.Core t (s) Wall t (s) (%) Time: 91964.923 1436.961 6400.0 (ns/day) (hour/ns) Performance: 60.127 0.399
The tasks were split in the following way:
8 GPUs selected for this run.
Mapping of GPU IDs to the 8 GPU tasks in the 8 ranks on this node:
PP:0,PP:1,PP:2,PP:3,PP:4,PP:5,PP:6,PME:7
PP tasks will do (non-perturbed) short-ranged interactions on the GPU
PP task will update and constrain coordinates on the CPU
PME tasks will do all aspects on the GPU
Using 8 MPI threads
Using 8 OpenMP threads per tMPI thread
Is this the most I can squeeze out of this hardware? Thanks a bunch in advance for any suggestions.
With best wishes,
Andrija