How to improve performances on multiple GPU

DW.Kang · June 16, 2020, 11:00am

GROMACS version: 2018.8
GROMACS modification: No

Dear all gromacs users,

I’m now using HPC clusters, consisting of 2x E5-2670v2 and 2x V100_32GB cards

When I’m using only one V100 card, perfomance is 187 ns/day.

But when using two V100 cards, performance decays to 165 ns/day.

I found in a log file that PME wait for PP takes much of time about 16.5%.
Domain decomp. 2.2
DD comm. load 0.0
DD comm. bounds 0.0
Vsite constr. 4.1
Send X to PME 2.6
Neighbor search 1.3
Launch GPU ops. 24.0
Comm. coord. 4.1
Force 4.7
Wait + Comm. F 3.8
PME mesh * 8.5
PME wait for PP * 16.5
Wait + Recv. PME F 2.5
Wait PME GPU gather 1.7
Wait GPU NB nonloc. 1.0
Wait GPU NB local 0.8
NB X/F buffer ops. 3.1
Vsite spread 6.3
Write traj. 0.0
Update 1.7
Constraints 18.2
Comm. energies 0.2

I tried some multiple sets of thread-MPI and OpenMP with dual V100 cards, but I could not exceed the performance of a single V100 card.

My system has 18240 atoms composed of 3264 TIP4P/Ice water molecules, 192 THF molecules, and 896 H2 molecules with V sites.

I ran the simulations with this .mdp file and command line
–mdp file–
integrator = md
dt = 0.001 ; 2 fs
nsteps = 1000000 ; 100 ps
nstenergy = 10000
nstlog = 10000
nstxout-compressed = 10000
gen-vel = yes
gen-temp = 260
constraint-algorithm = lincs
constraints = none
cutoff-scheme = Verlet
coulombtype = PME
rcoulomb = 0.95
lj-pme-comb-rule = Lorentz-Berthelot
vdwtype = Cut-off
rvdw = 0.95
DispCorr = EnerPres
tcoupl = Nose-Hoover
tc-grps = System
tau-t = 0.2
ref-t = 260
nhchainlength = 1

command lines:
gmx mdrun -deffnm eql -nb gpu -ntomp 5 -dlb yes -ntmpi 4 -gputasks 0011 -pme gpu -npme 1

Is there any ways to maximize the performance in this situation with dual V100 cards?

Any recommendations and advises would be very helpful to me.

Thanks in advance

DW

pszilard · June 16, 2020, 2:36pm

An 18240 atom system can not even fully saturate a single V100 GPU, so running on multiple GPUs will not be likely to give performance benefits. There might however be opportunities to slightly improve performance on a single GPU, but you should share the full log file so we can assess if improvements can be made.

DW.Kang · June 17, 2020, 12:18am

Dear pszilard,

Thanks for reply.

Right, my system is quite small, so calculation with dual cards would be slow as you said.

I wand to show my log file, but the log file is too long to post here, and I don’t know how to upload my log file because when I select the file, they say “new users cannot upload attachments.”

If you let me know how to attach my file, I will do that.

Thanks again for your advice.

DW

Dr_DBW · June 17, 2020, 4:45am

Upload your file somewhere online that you can share, such as Dropbox, Google Drive, and other file sharing services.

DW.Kang · June 17, 2020, 6:34am

Dear Dr_DBW

Thanks for reply.

As you recommended, I uploaded my log file to Google drive.

Here is the link:

Thanks in advance.

Sincerely

DW

pszilard · July 15, 2020, 1:51pm

Looks fine, but your run seem very PME-bound, so I expect 1 PP + 1 PME (i.e. the second GPU dedicated to doing only PME, -ntmpi 2 -npme 1) would run faster, I think. Also, have you tried running on a single GPU – I’m still not sure using two GPUs instead of only one will be faster with only 14k atoms?

DW.Kang · July 22, 2020, 11:39am

Dear pszilard,

Thanks for reply.

In fact, I was using HPC nodes with a single V100 card. But this node was upgraded with dual V100 card, and it costs 1.5 times than previous single V100 card. So I have to increase my performances to compensate this raise of charge.

To your questions, yes. I was runinng my simulations on single V100 card, and the performances on this node was about 180 ns/day.

When I increase the number of atoms in my system, perfomance was degraded; so I judged that my system was saturated in V100 system. But with dual v100 card, performance did not increased.

Anyway, I will try as you recommended.

Sincerely,

DW

Topic		Replies	Views
MD performance help using V_100 GPU User discussions mdrun , mdrun-performance	7	992	June 21, 2023
Loss of performance in v. 2021 User discussions mdrun	25	1511	January 28, 2022
Abysmal MD production performance on GPU node User discussions mdrun	8	1050	December 15, 2023
Optimizing GPU performance for GROMACS? User discussions	6	1478	January 13, 2021
Performance loss User discussions	2	1259	February 20, 2021

How to improve performances on multiple GPU

Related topics