Ending simulation after microsecond time scale

GROMACS version: 2019.6
GROMACS modification: No
Here post your question:

I am using the Sirah force field (a coarse grain field). In the institutional cluster (CPU only), my jobs (systems) are expected to sample 10 us; nevertheless, they all ended up before the 5 us at varying time >1 us. If I restart a simulation the simulation continues for >1 us and ends unexpectedly again.

Using the same .tpr file and Gromacs version, I performed a simulation in the cluster and my personal laptop (GPU Nvidia 1050Ti, i7 7700HQ). The cluster finished unexpectedly at 3,500,000 steps, while my laptop reached > 16,900,000 steps. Due to cooling concerns, I finished the laptop run.

I communicated my concern with institutional support, and we have not been able to solve the problem. So any insight into the problem is appreciated. I append the error message.

Thanks
Juuel

Your job looked like:


LSBATCH: User input

#!/bin/bash +H
#BSUB -R “same[model] span[ptile=’!’,Intel_EM64T:16,Intel_a:20,Intel_b:20,Intel_c:32,Intel_d:32]”
#BSUB -q q_hpc
#BSUB -n 16
#BSUB -oo nptsalida_tat.out
#BSUB -eo npterror_tat.out
module load gromacs/2019

mpirun gmx_mpi mdrun -cpt 180 -cpnum yes -rdd 1.4 -deffnm cg_sys_md

Exited with exit code 139.

Resource usage summary:

CPU time :                                   1068627.62 sec.
Max Memory :                                 601.95 MB
Average Memory :                             562.96 MB
Total Requested Memory :                     -
Delta Memory :                               -
Max Swap :                                   5256 MB
Max Processes :                              17
Max Threads :                                17

The output (if any) follows:

===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= PID 62331 RUNNING AT mn74
= EXIT CODE: 139
= CLEANING UP REMAINING PROCESSES
= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES

===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= PID 62331 RUNNING AT mn74
= EXIT CODE: 11
= CLEANING UP REMAINING PROCESSES
= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES

Intel® MPI Library troubleshooting guide:
https://software.intel.com/node/561764