Why there is "core.xxxx" files in production run?

GROMACS version: 2021
GROMACS modification: /No
Here post your question
md_0_1.log (74.8 KB)

The core files (e.g. core.245674, core.134940, core.31056) appear from time to time during my production run. But I can see the time is still proceeding, so it is not affecting the MD. But why they appear? The log files are attached, is this related to my high-performance computing resources?

These are probably stdout/stderr from the compute cores. They are not GROMACS output.

output.log (3.1 MB)

Here is the output log. I can see Segmentation fault (core dumped): Can I ask how can I modify it? Thank you.

GERun: Contents of machinefile:
GERun: node-e00a-014
GERun: node-e00a-014
GERun: node-e00a-014
GERun: node-e00a-014
GERun: node-e00a-014
GERun: node-e00a-014
GERun: node-e00a-014
GERun: node-e00a-014
GERun: node-e00a-014
GERun: node-e00a-014
GERun: node-e00a-014
GERun: node-e00a-014
GERun: 
GERun: GErun command being run:
GERun:  mpirun --rsh=ssh -machinefile /tmpdir/job/9970527.undefined/machines.unique -np 12 -rr mdrun_mpi -v -deffnm md_0_1 -cpi -append -maxh 1
/shared/ucl/apps/intel/2020/impi/2019.6.166/intel64/bin/mpirun: line 103: 245674 Segmentation fault      (core dumped) mpiexec.hydra -machinefile $machinefile "$@" 0<&0
/var/opt/sge/node-j00a-002/active_jobs/9970528.1/pe_hostfile
node-j00a-002
node-j00a-002
node-j00a-002
node-j00a-002
node-j00a-002
node-j00a-002
node-j00a-002
node-j00a-002
node-j00a-002
node-j00a-002
node-j00a-002
node-j00a-002

Hi, here is an update for the core dumped issue. I submit sequentially-appending jobs to achieve long MD duration. When a job completes or gets core dumped issue, the next job will follow.
job_conti.sh.log (483 Bytes)

I have tested:

gromacs/2020.4/intel-2020 with error file md_0_1.e9979959.log (1.2 KB) and
core.184587.log (612 KB);

gromacs/2019.3/intel-2018 with error file md_0_1.e9979960.log (289.2 KB).

For both Gromacs versions, the MD can proceed. “core dumped” issue only occasionally happens for the 2020 version but is not affecting the running.

I will ask our high-performance computer manager the possible reasons. But I would appreciate it if someone knows the reason.