Simulations with free energy calculations on GPU is several time slower than normal simulations

GROMACS version: 2021
GROMACS modification: No

I use Gromacs to get the hydration free energy of one capped residue, the simulation box is 4x4x4 nm^3. When only using 8 CPU cores, the simulations speed is about 60ns/day, not so different with the normal simulation without free energy calculation ~110ns/day. However, when I run it on GPU, the normal simulation is ~1100 ns/day, the free energy calculation one is ~160 ns/day, several times difference. And I found the GPU usage is only 20% when running simulations with free energy calculations. Just wondering whether this is a normal situation? Or something wrong with my setup?

The following is the mdp file:
integrator = sd
dt = 0.002
nsteps = 250000

pbc = xyz

nstlist = 100
rlist = 1.2
ns_type = grid

coulombtype = PME
pme_order = 4
rcoulomb = 1.2
fourierspacing = 0.16
ewald_rtol = 1e-5
DispCorr = EnerPres

vdwtype = cut-off
vdw-modifier = potential-switch
rvdw_switch = 1.0
rvdw = 1.1

tc-grps = Protein Non-Protein
tau_t = 0.1 0.1
ref_t = 298 298

; Pressure coupling is on
pcoupl = Parrinello-Rahman
pcoupltype = isotropic
tau_p = 2.0
ref_p = 1.0
compressibility = 4.5e-5

continuation = no
gen_vel = yes
gen_temp = 298
gen_seed = -1

; For GPU version, h-bonds is fater than all-bonds, see Gromacs manual 2021.
constraints = h-bonds
constraint_algorithm = lincs
lincs_iter = 1
lincs_order = 4

nstcomm = 100
comm-mode = Linear
comm-grps = Protein Non-Protein

nstxout = 1000000
nstvout = 1000000
nstfout = 1000000
compressed_x_grps = System
nstxout-compressed = 10000

nstlog = 1000000
nstenergy = 1000000

; Free energy calculation
free_energy = yes
init_lambda_state = 0
delta_lambda = 0
calc_lambda_neighbors = 1 ; only immediate neighboring windows
couple-moltype = Protein_chain_A ; name of moleculetype to decouple
couple-lambda0 = none ;
couple-lambda1 = vdw-q ;
couple-intramol = no
; Vectors of lambda specified here
; Each combination is an index that is retrieved from init_lambda_state for each simulation
; init_lambda_state 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
vdw_lambdas = 0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00
coul_lambdas = 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 1.00
; Options for the decoupling
sc-alpha = 0.5
sc-coul = no
sc-power = 1
sc-sigma = 0.3
nstdhdl = 100

Thanks a lot!

This is typical. Free energy calculations on GPU will be much faster in the upcoming 2022 release.

Thanks for the reply, Justin. BTW, I benefit a lot from your Gromacs tutorial, really appreciate it. I’m looking forward to the new release!