Hi.
When I run an awh simulation of a protein solution on gromacs 2024.3 I get segmentation fault (core dumped) upon starting the simulation. I do not think it is due to may simulation parameters and/or system size as i used to simulate the same system using the same .mdp and .gro files in the past on the same computer with gromacs 2024 without getting any error (the only difference is that in the past, the computer base OS was windows11 but currently it is windows10, however i use ubuntu on wsl2 in both cases).
I also tried different versions of gromacs (2024.3 , 2024, 2023.3 , 2022.3) but kept getting the same error.
I am running gromacs on WSL2 ubuntu 22.04 (also tried ubuntu 24.02 but got the same error) on a windows10 OS.
Cuda toolkit version is 11.7 and g++ and gcc are both v11.4.
the PC has 32 CPU cores and a NVIDIA GeForce RTX 3090 with 24 GB gpu ram and 128 GB memory (RAM).
I appreciate if anyone could offer some help to resolve this frustrating problem.
best regards.
I tried to resolve the issue by upgrading cuda 11.7 to cuda v.12.2 and reinstalling gromacs 24.3 using upgraded cuda , but got the same error. Also tried reducing the simulation box to make sure the error is not caused by large system size. as expected i got the same segmention error (core dumped) error mrssage which confirms it is not really the imulation box size that is causing the problem. i am really baffled.
below is the output of nvidia-smi for further information:
Sun Mar 2 16:17:12 2025
±----------------------------------------------------------------------------+
| NVIDIA-SMI 470.256.02 Driver Version: 560.94 CUDA Version: 12.6 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce … On | 00000000:01:00.0 On | Off |
| 0% 40C P8 17W / 450W | 1007MiB / 24564MiB | 1% Default |
| | | N/A |
±------------------------------±---------------------±---------------------+
±----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 21 G /Xwayland N/A |
±----------------------------------------------------------------------------+
and the output of nvcc --version:
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Tue_Jun_13_19:16:58_PDT_2023
Cuda compilation tools, release 12.2, V12.2.91
Build cuda_12.2.r12.2/compiler.32965470_0
Also here is the link to my .tpr file in case necessary to reproduce the error. (gomacs2024.3 , cuda12.2 , wsl2 ubuntu 24.04.2 on windows10- gpu RTX-3090-Ti)
As i further investigate the error, I came upon a strange observation:
The error persists no matter what , when i do not explicitly set the -nt flag in mdrun command, like below:
> NOTE: The number of threads is not equal to the number of (logical) cpus > and the -pin option is set to auto: will not pin threads to cpus. > This can lead to significant performance degradation. > Consider using -pin on (and -pinoffset in case you run multiple jobs)
accordingly i tried to incease -nt to 32 (max num of cores available on my machine) to get the optimum performance, but much to my surprise i keep getting segmentation fault(core dumped) error , for any -nt greater than 16!
here is the output of lscpu command which clearly shows 32 threads is available:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 46 bits physical, 48 bits virtual
Byte Order: Little Endian
CPU(s): 32
On-line CPU(s) list: 0-31
Vendor ID: GenuineIntel
Model name: Intel(R) Core™ i9-14900K
CPU family: 6
Model: 183
Thread(s) per core: 2
Core(s) per socket: 16
Socket(s): 1
Stepping: 1
BogoMIPS: 6374.39
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology tsc_reliable nonstop_tsc cpuid pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced fsgsbase bmi1 avx2 smep bmi2 erms invpcid rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves umip gfni vaes vpclmulqdq rdpid fsrm md_clear flush_l1d arch_capabilities
Hypervisor vendor: Microsoft
Virtualization type: full
L1d cache: 768 KiB (16 instances)
L1i cache: 512 KiB (16 instances)
L2 cache: 32 MiB (16 instances)
L3 cache: 36 MiB (1 instance)
Vulnerability Gather data sampling: Not affected
Vulnerability Itlb multihit: Not affected
Vulnerability L1tf: Not affected
Vulnerability Mds: Not affected
Vulnerability Meltdown: Not affected
Vulnerability Mmio stale data: Unknown: No mitigations
Vulnerability Reg file data sampling: Vulnerable: No microcode
Vulnerability Retbleed: Mitigation; Enhanced IBRS
Vulnerability Spec rstack overflow: Not affected
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp
Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2: Mitigation; Enhanced / Automatic IBRS; IBPB conditional; RSB filling; PBRSB-eIBRS SW sequence; BHI BHI_DIS_S
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Not affected
just curious what could be the cause and how I can increase number of threads to 32 for maximum performance. any advice is greatly apreciated.
I can’t access your tpr file through the link, maybe the link expired?
Could you compile a debug build and run that with “gdb” in front your gmx command line and report the stack trace gdb reports? You might need to install gdb.