COM pull force and AWH

amnah · June 9, 2021, 1:27am

GROMACS version:2020
GROMACS modification: Yes/No
Here post your question

Dear all ,

I am trying to accelerate the sampling with the AWH method. I have a performance loss mainly because of COM pull force ( 23.8% )

Computing: Num Num Call Wall time Giga-Cycles
Ranks Threads Count (s) total sum %

Domain decomp. 7 5 34 0.675 58.952 1.7
DD comm. load 7 5 34 0.001 0.112 0.0
Send X to PME 7 5 10001 3.817 333.193 9.4
Neighbor search 7 5 34 0.494 43.145 1.2
Launch GPU ops. 7 5 20002 0.750 65.486 1.8
Comm. coord. 7 5 9967 3.521 307.356 8.7
Force 7 5 10001 5.093 444.554 12.6
Wait + Comm. F 7 5 10001 2.768 241.606 6.8
PME mesh * 1 5 10001 18.072 225.363 6.4
PME wait for PP * 17.433 217.396 6.1
Wait + Recv. PME F 7 5 10001 1.426 124.476 3.5
Wait PME GPU gather 7 5 10001 2.500 218.248 6.2
Wait GPU NB nonloc. 7 5 10001 0.048 4.222 0.1
Wait GPU NB local 7 5 10001 0.037 3.218 0.1
NB X/F buffer ops. 7 5 39936 2.245 195.950 5.5
COM pull force 7 5 10001 9.666 843.764 23.8
AWH 7 5 10001 0.093 8.085 0.2
Write traj. 7 5 1 0.158 13.766 0.4
Update 7 5 10001 0.984 85.913 2.4
Constraints 7 5 10001 2.016 175.968 5.0
Comm. energies 7 5 1001 1.103 96.303 2.7

Total 35.505 3542.055 100.0

(*) Note that with separate PME ranks, the walltime column actually sums to
twice the total reported, but the cycle count total and % are correct.

           Core t (s)   Wall t (s)        (%)
   Time:     1420.104       35.505     3999.8
             (ns/day)    (hour/ns)

Performance: 48.675 0.493

Any suggestions or advice would be very appreciated!

Thank you so much

Amnah

hess · June 10, 2021, 11:30am

This high percentage in COM pull force does not necessarily need to come frome the COM pulling itself, it could also come from load imbalance in the force calculation before that.

But why are you running 7 MPI ranks which each 5 threads? That’s seems a very sub-optimal setup. What hardware are you running on?

amnah · June 10, 2021, 3:31pm

Thank you so much for your reply!
What parameters do I need to change in the mdp file to reduce the imbalance? could you please help me with that?
awh.mdp (7.4 KB)

Regarding the hardware, this is the hardware description. The system I am simulating has 700K atoms, I thought 54-55ns/day using 1 node( 8 V100 GPUs) is optimal

I really appreciate your help
Thank you so much!

Amnah

hess · June 11, 2021, 11:27am

I don’t understand how you got to using 7 rank and 5 threads. Did you use mpirun? If so, with how many ranks and on how many nodes. Did you specify the -nt, -ntmpi and/or -ntomp option?

amnah · June 11, 2021, 11:58am

Dear Berk,

I used 1 nod ( 8 GPUs)
This is the command I am using:
mpirun -np $SLURM_NPROCS gmx_mpi mdrun -deffnm 2 -s AWH.tpr -v -nb gpu -pme gpu -npme 1 -nstlist 300

hess · June 11, 2021, 1:42pm

So $SLURM_NPROCS is 7 then, I suppose?
You would like to use 8 ranks with 8 GPUs, I would think. That should give you much better performance. Why does $SLURM_NPROCS get set to 7?

amnah · June 11, 2021, 4:24pm

Hi Berk,

I have no idea why $SLURM_NPROCS gets set to 7, I just tried the following command (np = 8), but it changed to 7 according to the log file.

mpirun -np 8 gmx_mpi mdrun -deffnm 2 -s AWH.tpr -v -nb gpu -pme gpu -npme 1 -nstlist 200

hess · June 14, 2021, 8:00am

I misunderstood what it going on. You ask for -npme 1, so you get 7 PP ranks and 1 pme rank.

But I expect that the performance is fully limited by the fact that only 1 GPU is used for PME. My guess would be that you get better performance using half of the node. Then you can run two runs on one node and get more than double the performance.

Another option is to run PME on the CPU. But even then there are not many systems that scale to 8 V100 GPUs. You should try 4, 2 and 1 GPUs per simulation.

Topic		Replies	Views
AWH accelerated sampling User discussions awh	6	1199	August 17, 2021
Advice needed: AWH and Pull code debugging User discussions awh	9	189	September 24, 2024
AWH with user data User discussions awh	1	660	August 17, 2021
Awh mdp options User discussions	15	1050	May 23, 2023
AWH memory usage User discussions	0	254	July 21, 2020

COM pull force and AWH

Computing: Num Num Call Wall time Giga-Cycles Ranks Threads Count (s) total sum %

Total 35.505 3542.055 100.0

(*) Note that with separate PME ranks, the walltime column actually sums to twice the total reported, but the cycle count total and % are correct.

Related topics

Computing: Num Num Call Wall time Giga-Cycles
Ranks Threads Count (s) total sum %

(*) Note that with separate PME ranks, the walltime column actually sums to
twice the total reported, but the cycle count total and % are correct.