Why there is "core.xxxx" files in production run?

lanselibai · January 13, 2022, 5:46pm

GROMACS version: 2021
GROMACS modification: /No
Here post your question
md_0_1.log (74.8 KB)

The core files (e.g. core.245674, core.134940, core.31056) appear from time to time during my production run. But I can see the time is still proceeding, so it is not affecting the MD. But why they appear? The log files are attached, is this related to my high-performance computing resources?

jalemkul · January 14, 2022, 10:51am

These are probably stdout/stderr from the compute cores. They are not GROMACS output.

lanselibai · January 14, 2022, 11:13am

output.log (3.1 MB)

Here is the output log. I can see Segmentation fault (core dumped): Can I ask how can I modify it? Thank you.

GERun: Contents of machinefile:
GERun: node-e00a-014
GERun: node-e00a-014
GERun: node-e00a-014
GERun: node-e00a-014
GERun: node-e00a-014
GERun: node-e00a-014
GERun: node-e00a-014
GERun: node-e00a-014
GERun: node-e00a-014
GERun: node-e00a-014
GERun: node-e00a-014
GERun: node-e00a-014
GERun: 
GERun: GErun command being run:
GERun:  mpirun --rsh=ssh -machinefile /tmpdir/job/9970527.undefined/machines.unique -np 12 -rr mdrun_mpi -v -deffnm md_0_1 -cpi -append -maxh 1
/shared/ucl/apps/intel/2020/impi/2019.6.166/intel64/bin/mpirun: line 103: 245674 Segmentation fault      (core dumped) mpiexec.hydra -machinefile $machinefile "$@" 0<&0
/var/opt/sge/node-j00a-002/active_jobs/9970528.1/pe_hostfile
node-j00a-002
node-j00a-002
node-j00a-002
node-j00a-002
node-j00a-002
node-j00a-002
node-j00a-002
node-j00a-002
node-j00a-002
node-j00a-002
node-j00a-002
node-j00a-002

lanselibai · January 15, 2022, 3:37pm

Hi, here is an update for the core dumped issue. I submit sequentially-appending jobs to achieve long MD duration. When a job completes or gets core dumped issue, the next job will follow.
job_conti.sh.log (483 Bytes)

I have tested:

gromacs/2020.4/intel-2020 with error file md_0_1.e9979959.log (1.2 KB) and
core.184587.log (612 KB);

gromacs/2019.3/intel-2018 with error file md_0_1.e9979960.log (289.2 KB).

For both Gromacs versions, the MD can proceed. “core dumped” issue only occasionally happens for the 2020 version but is not affecting the running.

I will ask our high-performance computer manager the possible reasons. But I would appreciate it if someone knows the reason.

Topic		Replies	Views
Segmentation fault during production run User discussions mdrun	6	83	December 8, 2024
Segfault in mdrun User discussions	0	307	August 22, 2023
Error in docker gromacs User discussions mdrun	0	296	August 2, 2022
Gromacs: mpirun noticed that process rank 63 with PID 0 on node *** exited on signal 11 User discussions mdrun	2	1079	January 26, 2024
Possible bug with simulation extension User discussions	1	387	June 26, 2020

Why there is "core.xxxx" files in production run?

Related topics