Segmentation fault in MARTINI 2.2 with pulling

GROMACS version: 2024.2
GROMACS modification: No

Dear community,

I am using this mdp ile to pull the C-terminus of the peptide into the water.
The initial configuration consists of two identical peptides positioned on the membrane from both sides (one on the upper leaflet and one on the lower leaflet).
I then initiate the simulation with these particular settings, causing the C-terminus to gradually extend into the water (which is necessary for subsequent window selection in umbrella sampling).
Unfortunately, under these settings, the simulation crashes with a segmentation fault after 40 million steps. Each time I restart, the number of steps before crashing varies—it can fail after 70 million steps, 50 million, and so on.
Please advise on what might be causing this issue. Thank you.

integrator = md
dt = 0.02
nsteps = 120000000
nstcomm = 100
comm-grps = Protein_lipids W_Ions

nstxout = 0
nstvout = 0
nstfout = 0
nstlog = 5000
nstenergy = 1000
nstxout-compressed = 10000
compressed-x-precision = 100
compressed-x-grps =
;energygrps = Protein_lipids W_Ions

cutoff-scheme = Verlet
nstlist = 20
ns_type = grid
pbc = xyz
verlet-buffer-tolerance = 0.005

coulombtype = reaction-field
rcoulomb = 1.1
epsilon_r = 15
epsilon_rf = 0
vdw_type = cutoff
vdw-modifier = Potential-shift-verlet
rvdw = 1.1

tcoupl = v-rescale
tc-grps = Protein_lipids W_Ions
tau_t = 1.0 1.0
ref_t = 310 310

Pcoupl = parrinello-rahman
Pcoupltype = semiisotropic
tau_p = 12.0
compressibility = 3e-4 3e-4
ref_p = 1.0 1.0
refcoord_scaling = com

gen_vel = no
gen_temp = 310
gen_seed = -1

constraints = none
constraint_algorithm = Lincs

; umbrella sampling
pull = yes
pull-ngroups = 3
pull-ncoords = 2
pull-group1-name = Membrane
pull-group2-name = first_threeC
pull-group3-name = second_threeC
pull-coord1-type = umbrella
pull-coord1-geometry = cylinder
pull-coord1-groups = 1 2
pull-coord1-dim = N N Y
pull-coord2-type = umbrella
pull-coord2-geometry = cylinder
pull-coord2-groups = 1 3
pull-coord2-dim = N N Y
pull-coord1-init =1.4
pull-coord2-init =1.4
pull-coord1-k = 1000
pull-coord2-k = 1000
pull-cylinder-r = 2.0
pull-nstxout = 100
pull-nstfout = 100
pull-coord1-vec = 0 0 1
pull-coord2-vec = 0 0 -1
pull-group1-pbcatom = 1764
pull-group2-pbcatom = 1764 ;
pull-group3-pbcatom = 1764
pull-pbc-ref-prev-step-com = yes
pull-coord1-rate = 0.00000275 ;
pull-coord2-rate = 0.00000275 ;

The pbc atoms are wrong. Unless you have a very special case, the pbc atom should be part of the pull group. I don’t know if this would resolve your issue though.

And why are you using pull-pbc-ref-prev-step-com = yes? That should not be necessary.

My PBC atom is a lipid atom, located near the center of the membrane. It is a part of pull group Membrane

I use the option pull-pbc-ref-prev-step-com = yes just in case. I didn’t think it could cause any issues.

Oh it seems now I understand what is going on…so i need to set pbc atom for every pull group…

Thanks a lot!

Is it better to remove pull-pbc-ref-prev-step-com = yes?

I would not use pull-pbc-ref-prev-step-com = yes unless you are sure it is necessary (which I think it is not here).

Thank you so much for your help! The issue is resolved, and the system is now completing its calculations fully and without any issues.

Good that it works now. But the issue is not resolved from our point of view. GROMACS should never give a segmentation fault. But unfortunately crashing after 40 million steps and later when restarting does not sound like something we can use for debugging.

I can provide you with any information, but I do not know how to obtain it. The md.log file does not contain any useful information, and the terminal only outputs a segmentation fault error after an indefinite number of simulation steps (the number of steps varies each time). GROMACS does not provide any additional information. There may be a debugging mode that can reveal more details…

Did you get a core file?

Hope this is it

Ah, nice! I likely can’t do anything with this core file.

Could you run:
gdb [executable-file] [core-file]

I don’t know if you get a stack trace directly. You might need to type “where” and “Enter”. Please post the stack trace.

GNU gdb (Fedora Linux) 14.2-1.fc39
Copyright (C) 2023 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type “show copying” and “show warranty” for details.
This GDB was configured as “x86_64-redhat-linux-gnu”.
Type “show configuration” for configuration details.
For bug reporting instructions, please see:
https://www.gnu.org/software/gdb/bugs/.
Find the GDB manual and other documentation resources online at:
http://www.gnu.org/software/gdb/documentation/.

For help, type “help”.
Type “apropos word” to search for commands related to “word”…
Reading symbols from gmx…

This GDB supports auto-downloading debuginfo from the following URLs:
https://debuginfod.fedoraproject.org/
Enable debuginfod for this session? (y or [n]) n
Debuginfod has been disabled.
To make this setting permanent, add ‘set debuginfod enabled off’ to .gdbinit.
(No debugging symbols found in gmx)

warning: Can’t open file /dev/zero (deleted) during file-backed mapping note processing
[New LWP 262603]
[New LWP 262615]
[New LWP 262617]
[New LWP 262616]
[New LWP 262618]
[New LWP 262619]
[New LWP 262621]
[New LWP 262620]
[New LWP 262622]
[New LWP 262623]
[New LWP 262624]
[New LWP 262625]
[New LWP 262626]
[New LWP 262627]
[New LWP 262628]
[New LWP 262629]
[New LWP 262631]
[New LWP 262632]
[New LWP 262633]
[New LWP 262635]
[New LWP 262634]
[New LWP 262636]
[New LWP 262637]
[New LWP 262614]
[New LWP 262630]
[New LWP 262613]
[New LWP 262604]
[Thread debugging using libthread_db enabled]
Using host libthread_db library “/lib64/libthread_db.so.1”.
Core was generated by `gmx mdrun -s pulling.tpr -cpi state.cpt -cpo state.cpt -append -g md.log -e ene’.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x0000000000f5b1bd in Nbnxm::Grid::calcColumnIndices(Nbnxm::Grid::Dimensions const&, gmx::UpdateGroupsCog const*, gmx::Range, gmx::ArrayRef<gmx::BasicVector const>, int, int const*, int, int, gmx::ArrayRef, gmx::ArrayRef) ()
[Current thread is 1 (Thread 0x7f0ea9501000 (LWP 262603))]
Missing separate debuginfos, use: dnf debuginfo-install fftw-libs-double-3.3.10-10.fc39.x86_64 fftw-libs-single-3.3.10-10.fc39.x86_64 glibc-2.38-18.fc39.x86_64 hwloc-libs-2.10.0-1.fc39.x86_64 libcap-2.48-9.fc39.x86_64 libevent-2.1.12-9.fc39.x86_64 libgcc-13.3.1-1.fc39.x86_64 libgomp-13.3.1-1.fc39.x86_64 libstdc+±13.3.1-1.fc39.x86_64 nvidia-driver-cuda-libs-555.42.06-1.fc39.x86_64 openmpi-4.1.5-8.fc39.x86_64 systemd-libs-254.14-1.fc39.x86_64 zlib-1.2.13-4.fc39.x86_64
(gdb)
(gdb)
(gdb) bt
#0 0x0000000000f5b1bd in Nbnxm::Grid::calcColumnIndices(Nbnxm::Grid::Dimensions const&, gmx::UpdateGroupsCog const*, gmx::Range, gmx::ArrayRef<gmx::BasicVector const>, int, int const*, int, int, gmx::ArrayRef, gmx::ArrayRef) ()
#1 0x0000000000f5b585 in Nbnxm::generateAndFill2DGrid(Nbnxm::Grid*, gmx::ArrayRefNbnxm::GridWork, std::vector<int, gmx::Allocator<int, gmx::HostAllocationPolicy> >, float const, float const*, gmx::UpdateGroupsCog const*, gmx::Range, float*, float, bool, gmx::ArrayRef<gmx::BasicVector const>, int, int const*, int, bool) [clone ._omp_fn.0] ()
#2 0x00007f0eba68a286 in GOMP_parallel () from /lib64/libgomp.so.1
#3 0x0000000000f5e26e in Nbnxm::generateAndFill2DGrid(Nbnxm::Grid*, gmx::ArrayRefNbnxm::GridWork, std::vector<int, gmx::Allocator<int, gmx::HostAllocationPolicy> >, float const, float const*, gmx::UpdateGroupsCog const*, gmx::Range, float*, float, bool, gmx::ArrayRef<gmx::BasicVector const>, int, int const*, int, bool) ()
#4 0x0000000000505241 in Nbnxm::GridSet::putOnGrid(float const () [3], int, float const, float const*, gmx::UpdateGroupsCog const*, gmx::Range, float, gmx::ArrayRef, gmx::ArrayRef<gmx::BasicVector const>, int, int const*, nbnxn_atomdata_t*) ()
#5 0x000000000057f21b in nonbonded_verlet_t::putAtomsOnGrid(float const () [3], int, float const, float const*, gmx::UpdateGroupsCog const*, gmx::Range, float, gmx::ArrayRef, gmx::ArrayRef<gmx::BasicVector const>, int, int const*) ()
#6 0x0000000001363ae5 in doPairSearch(t_commrec const*, t_inputrec const&, gmx::MDModulesNotifiers const&, long, t_nrnb*, gmx_wallcycle*, gmx_localtop_t const&, float const () [3], gmx::ArrayRefWithPadding<gmx::BasicVector >, gmx::ArrayRef<gmx::BasicVector >, t_mdatoms const&, t_forcerec, gmx::MdrunScheduleWorkload const&) ()
#7 0x0000000001367524 in do_force(IO_FILE*, t_commrec const*, gmx_multisim_t const*, t_inputrec const&, gmx::MDModulesNotifiers const&, gmx::Awh*, gmx_enfrot*, gmx::ImdSession*, pull_t*, long, t_nrnb*, gmx_wallcycle*, gmx_localtop_t const*, float const () [3], gmx::ArrayRefWithPadding<gmx::BasicVector >, gmx::ArrayRef<gmx::BasicVector >, history_t const, gmx::ForceBuffersView*, float () [3], t_mdatoms const, gmx_enerdata_t*, gmx::ArrayRef, t_forcerec*, gmx::MdrunScheduleWorkload const&, gmx::VirtualSitesHandler*, float*, double, gmx_edsam*, CpuPpLongRangeNonbondeds*, DDBalanceRegionHandler const&) ()
#8 0x0000000001207ac0 in gmx::LegacySimulator::do_md() ()
#9 0x0000000000ad2659 in gmx::Mdrunner::mdrunner() ()
#10 0x00000000004ebf46 in gmx::gmx_mdrun(tmpi_comm
*, gmx_hw_info_t const&, int, char**) ()
#11 0x00000000004ec040 in gmx::gmx_mdrun(int, char**) ()
#12 0x000000000059da69 in gmx::CommandLineModuleManager::run(int, char**) ()
#13 0x00000000004cc13c in main ()
(gdb)

Thanks, this narrows down the issue. But unfortunately I can’t find the source. One would need to run a RelWithDebInfo build.

How many (thread-)MPI ranks and OpenMP threads are you running this on?

nproc
24
lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 40 bits physical, 48 bits virtual
Byte Order: Little Endian
CPU(s): 24
On-line CPU(s) list: 0-23
Vendor ID: GenuineIntel
Model name: Intel(R) Xeon(R) CPU X5660 @ 2.80GHz
CPU family: 6
Model: 44
Thread(s) per core: 2
Core(s) per socket: 6
Socket(s): 2
Stepping: 2
Frequency boost: enabled
CPU(s) scaling MHz: 85%
CPU max MHz: 2801.0000
CPU min MHz: 1600.0000
BogoMIPS: 5600.76
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmo
n pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 popcnt lahf_lm epb p
ti ssbd ibrs ibpb stibp tpr_shadow flexpriority ept vpid dtherm ida arat vnmi flush_l1d
Virtualization features:
Virtualization: VT-x
Caches (sum of all):
L1d: 384 KiB (12 instances)
L1i: 384 KiB (12 instances)
L2: 3 MiB (12 instances)
L3: 24 MiB (2 instances)
NUMA:
NUMA node(s): 2
NUMA node0 CPU(s): 0-5,12-17
NUMA node1 CPU(s): 6-11,18-23
Vulnerabilities:
Gather data sampling: Not affected
Itlb multihit: KVM: Mitigation: VMX disabled
L1tf: Mitigation; PTE Inversion; VMX conditional cache flushes, SMT vulnerable
Mds: Vulnerable: Clear CPU buffers attempted, no microcode; SMT vulnerable
Meltdown: Mitigation; PTI
Mmio stale data: Unknown: No mitigations
Reg file data sampling: Not affected
Retbleed: Not affected
Spec rstack overflow: Not affected
Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl
Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Spectre v2: Mitigation; Retpolines; IBPB conditional; IBRS_FW; STIBP conditional; RSB filling; PBRSB-eIBRS Not affected; BHI Not affected
Srbds: Not affected
Tsx async abort: Not affected

Thanks, but that doesn’t tell me how those resources were used in the run. The log file reports something like this, what do you have?
Using 3 MPI threads
Using 10 OpenMP threads per tMPI thread

Running on 1 node with total 12 cores, 24 processing units, 1 compatible GPU:
CPU info:
Vendor: Intel
Brand: Intel(R) Xeon(R) CPU X5660 @ 2.80GHz
Family: 6 Model: 44 Stepping: 2
Features: apic clfsh cmov cx8 cx16 htt intel lahf mmx msr nonstop_tsc pcid pdcm pdpe1gb popcnt pse rdtscp sse2 sse3 sse4.1 sse4.2 ssse3
Hardware topology: Basic
Packages, cores, and logical processors:
[indices refer to OS logical processors]
Package 0: [ 0 12] [ 1 13] [ 2 14] [ 3 15] [ 4 16] [ 5 17]
Package 1: [ 6 18] [ 7 19] [ 8 20] [ 9 21] [ 10 22] [ 11 23]
CPU limit set by OS: -1 Recommended max number of threads: 24
GPU info:
Number of GPUs detected: 1
#0: NVIDIA NVIDIA GeForce RTX 4070 Ti, compute cap.: 8.9, ECC: no, stat: compatible

GPU selected for this run.
Mapping of GPU IDs to the 1 GPU task in the 1 rank on this node:
PP:0
PP tasks will do (non-perturbed) short-ranged interactions on the GPU
PP task will update and constrain coordinates on the GPU
Using 1 MPI thread
Using 24 OpenMP threads

Thanks. I don’t we can do more with the current information.