GROMACS version: 2018.8
GROMACS modification: No
Dear all gromacs users,
I’m now using HPC clusters, consisting of 2x E5-2670v2 and 2x V100_32GB cards
When I’m using only one V100 card, perfomance is 187 ns/day.
But when using two V100 cards, performance decays to 165 ns/day.
I found in a log file that PME wait for PP takes much of time about 16.5%.
Domain decomp. 2.2
DD comm. load 0.0
DD comm. bounds 0.0
Vsite constr. 4.1
Send X to PME 2.6
Neighbor search 1.3
Launch GPU ops. 24.0
Comm. coord. 4.1
Force 4.7
Wait + Comm. F 3.8
PME mesh * 8.5
PME wait for PP * 16.5
Wait + Recv. PME F 2.5
Wait PME GPU gather 1.7
Wait GPU NB nonloc. 1.0
Wait GPU NB local 0.8
NB X/F buffer ops. 3.1
Vsite spread 6.3
Write traj. 0.0
Update 1.7
Constraints 18.2
Comm. energies 0.2
I tried some multiple sets of thread-MPI and OpenMP with dual V100 cards, but I could not exceed the performance of a single V100 card.
My system has 18240 atoms composed of 3264 TIP4P/Ice water molecules, 192 THF molecules, and 896 H2 molecules with V sites.
I ran the simulations with this .mdp file and command line
–mdp file–
integrator = md
dt = 0.001 ; 2 fs
nsteps = 1000000 ; 100 ps
nstenergy = 10000
nstlog = 10000
nstxout-compressed = 10000
gen-vel = yes
gen-temp = 260
constraint-algorithm = lincs
constraints = none
cutoff-scheme = Verlet
coulombtype = PME
rcoulomb = 0.95
lj-pme-comb-rule = Lorentz-Berthelot
vdwtype = Cut-off
rvdw = 0.95
DispCorr = EnerPres
tcoupl = Nose-Hoover
tc-grps = System
tau-t = 0.2
ref-t = 260
nhchainlength = 1
command lines:
gmx mdrun -deffnm eql -nb gpu -ntomp 5 -dlb yes -ntmpi 4 -gputasks 0011 -pme gpu -npme 1
Is there any ways to maximize the performance in this situation with dual V100 cards?
Any recommendations and advises would be very helpful to me.
Thanks in advance
DW