Segmentation fault on NVT, free energy calculation

GROMACS version: 2020.3
GROMACS modification: No

I have a free energy calculation setup of a small ligand in water. The energy minimization went alright, and the system is stable, but I can not go further due to
Segmentation fault gmx_gpu mdrun -nb gpu -ntomp 4 -ntmpi 1 -stepout 1000 -s NVT.tpr -deffnm NVT

My nvp.0.mdp file:

define = -DPOSRES
integrator = sd ; stochastic leap-frog integrator
nsteps = 5000 ; 2 * 5,000 fs = 10 ps
dt = 0.002 ; 2 fs
comm-mode = Linear ; remove center of mass translation
nstcomm = 100 ; frequency for center of mass motion removal

nstxout = 0 ; don’t save coordinates to .trr
nstvout = 0 ; don’t save velocities to .trr
nstfout = 0 ; don’t save forces to .trr
nstxout-compressed = 5000 ; xtc compressed trajectory output every 5000 steps
compressed-x-precision = 1000 ; precision with which to write to the compressed trajectory file
nstlog = 5000 ; update log file every 10 ps
nstenergy = 5000 ; save energies every 10 ps
nstcalcenergy = 100 ; calculate energies every 100 steps

constraint_algorithm = lincs ; holonomic constraints
constraints = all-bonds ; hydrogens only are constrained
lincs_iter = 1 ; accuracy of LINCS (1 is default)
lincs_order = 4 ; also related to accuracy (4 is default)
lincs-warnangle = 30 ; maximum angle that a bond can rotate before LINCS will complain (30 is default)
continuation = no ; formerly known as ‘unconstrained-start’ - useful for exact continuations and reruns

cutoff-scheme = Verlet
ns-type = grid ; search neighboring grid cells
nstlist = 10 ; 20 fs (default is 10)
rlist = 1.0 ; short-range neighborlist cutoff (in nm)
pbc = xyz ; 3D PBC

coulombtype = PME ; Particle Mesh Ewald for long-range electrostatics
rcoulomb = 1.0 ; short-range electrostatic cutoff (in nm)
ewald_geometry = 3d ; Ewald sum is performed in all three dimensions
pme-order = 6 ; interpolation order for PME (default is 4)
fourierspacing = 0.10 ; grid spacing for FFT
ewald-rtol = 1e-6 ; relative strength of the Ewald-shifted direct potential at rcoulomb

vdw-type = PME
rvdw = 1.0
vdw-modifier = Potential-Shift
ewald-rtol-lj = 1e-3
lj-pme-comb-rule = Geometric
DispCorr = EnerPres

tc_grps = System
tau_t = 1.0
ref_t = 300
pcoupl = no

gen_vel = yes ; Velocity generation is on (if gen_vel is ‘yes’, continuation should be ‘no’)
gen_seed = -1 ; Use random seed
gen_temp = 300

free-energy = yes
couple-moltype = ligand
couple-lambda0 = vdw-q
couple-lambda1 = none
couple-intramol = no
separate-dhdl-file = yes
sc-alpha = 0.5
sc-power = 1
sc-sigma = 0.3
init-lambda-state = 0
coul-lambdas = 0.0 0.25 0.5 0.75 1.0 1.00 1.0 1.0 1.0 1.0 1.0 1.0 1.00 1.0 1.00 1.0 1.00 1.0 1.00 1.0
vdw-lambdas = 0.0 0.00 0.0 0.00 0.0 0.05 0.1 0.2 0.3 0.4 0.5 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1.0
nstdhdl = 100
calc-lambda-neighbors = -1

I can start this NVT without GPU however the performance is very slow (~10-30 ns/day).
What could be the problem with the GPU setup in this case?

Thank you.

P.S Here is the output of my grompp before NVT:
gmx grompp -f …/…/MDP/NVT/nvt.1.mdp -c …/ENMIN/ENMIN.gro -r …/ENMIN/ENMIN.gro -p …/…/ligand.top -o NVT.tpr
Ignoring obsolete mdp entry ‘ns-type’
Setting the LD random seed to -224055080
Generated 2775 of the 2775 non-bonded parameter combinations
Generating 1-4 interactions: fudge = 0.5
Generated 2775 of the 2775 1-4 parameter combinations
Excluding 3 bonded neighbours molecule type ‘ligand’
Excluding 2 bonded neighbours molecule type ‘SOL’
Coupling 1 copies of molecule type ‘ligand’
NOTE 1 [file unknown]:
You are using constraints on all bonds, whereas the forcefield has been
parametrized only with constraints involving hydrogen atoms. We suggest
using constraints = h-bonds instead, this will also improve performance.
Setting gen_seed to -1886237300
Velocities were taken from a Maxwell distribution at 300 K
Number of degrees of freedom in T-Coupling group System is 4203.00
NOTE 2 [file …/…/MDP/NVT/nvt.0.mdp]:
Removing center of mass motion in the presence of position restraints
might cause artifacts. When you are using position restraints to
equilibrate a macro-molecule, the artifacts are usually negligible.
NOTE 3 [file …/…/MDP/NVT/nvt.0.mdp]:
You are using geometric combination rules in LJ-PME, but your non-bonded
C6 parameters do not follow these rules. This will introduce very small
errors in the forces and energies in your simulations. Dispersion
correction will correct total energy and/or pressure for isotropic
systems, but not forces or surface tensions.
Estimate for the relative computational load of the PME mesh part: 0.80
NOTE 4 [file …/…/MDP/NVT/nvt.0.mdp]:
The optimal PME mesh load for parallel simulations is below 0.5
and for highly parallel simulations between 0.25 and 0.33,
for higher performance, increase the cut-off and the PME grid spacing.
NOTE 5 [file …/…/MDP/NVT/nvt.0.mdp]:
For free energy simulations, the optimal load limit increases from 0.5 to
0.667

Version 2020.4 fixed an issue with LJ-PME and dispersion correction on GPUs. Maybe you are hit by the same bug. I would suggest to update to 2020.6 or 2021.2.

And note that grompp warns you that you are constraining all bonds with a force for H-bonds constraints. Changing to h-bonds is more correct and will improve performance.

Hess, thanks a lot for the explanation. Version 2021.2 with cuda 11.2 works for me on GPU. However, the speed of my FEP calculation is 10-20 times slower than usual MD simulation on GPU (e.g simple protein in water). Is this inherent for FEP or shall I look through my .mdp parameters?

Perturbed non-bonded calculations can not be done on the GPU, only on the CPU. So if you have a fast GPU and a weak CPU, runs with free-energy can be significantly slower. You should be able to see in the timings table at the end of the log file that a large fraction of the CPU time is spent in the free-energy non-bonded kernel.