GROMACS version: 2024
GROMACS modification: No
Hello everyone,
I have been testing different configurations to optimize CPU/GPU usage and simulation speed in GROMACS. Below is a summary of my test runs:
Speed (ns/day) | Command Used | CPU Usage | GPU Usage
---------------|--------------------------------------|----------------------------------|-----------
895.51 | gmx mdrun -s md2.tpr -deffnm md2 -v -cpi step7_1.cpt -noappend -ntmpi 1 -ntomp 24 -gpu_id 0 -pme gpu -bonded gpu -nb gpu -pin on | One core at 100%, others at 40-60% | 80-90%
754.19 | gmx mdrun -s md2.tpr -deffnm md2 -v -cpi step7_1.cpt -noappend | All cores fully utilized | 80-90%
439.29 | gmx mdrun -s md2.tpr -deffnm md2 -v -cpi step7_1.cpt -noappend -ntmpi 2 -ntomp 12 -gpu_id 0 -pme gpu -npme 1 -bonded gpu -nb gpu -pin on | Half of cores at full usage, rest idle | 80-90%
419.79 | gmx mdrun -s md2.tpr -deffnm md2 -v -cpi step7_1.cpt -noappend -ntmpi 8 -ntomp 6 -gpu_id 0 -pme gpu -npme 1 -bonded gpu -nb gpu -pin on | CPU usage 30-60% | 80-90%
389.95 | gmx mdrun -s md2.tpr -deffnm md2 -v -cpi step7_1.cpt -noappend -ntmpi 8 -ntomp 3 -gpu_id 0 -pme gpu -npme 1 -bonded gpu -nb gpu -update gpu -pin on | All CPUs used, but two at 20-30% | 80-90%
201.18 | gmx mdrun -s md2.tpr -deffnm md2 -v -cpi step7_1.cpt -noappend -ntmpi 8 -ntomp 6 -gpu_id 0 -pme cpu -npme 3 -bonded gpu -nb gpu -pin on | CPU cores at 50-70% | Below 30%
- The fastest run (895.51 ns/day) had one CPU core at 100% while others were at 40-60%. Does this imbalance affect CPU longevity?
- Most configurations had GPU usage at 80-90%, except the slowest one. Should I adjust PP/PME load distribution?
- Some setups had underutilized CPU cores (20-30%), while others had full CPU usage. What is the best way to balance MPI (
-ntmpi
) and OpenMP (-ntomp
)? - The fastest run used
-ntmpi 1 -ntomp 24
, but would it be better to distribute work across more cores?
Looking for insights on how to fine-tune CPU/GPU settings for the best balance between speed and efficiency. Would appreciate any suggestions!