Ending simulation after microsecond time scale

JuuelHerza · November 23, 2020, 2:40pm

GROMACS version: 2019.6
GROMACS modification: No
Here post your question:

I am using the Sirah force field (a coarse grain field). In the institutional cluster (CPU only), my jobs (systems) are expected to sample 10 us; nevertheless, they all ended up before the 5 us at varying time >1 us. If I restart a simulation the simulation continues for >1 us and ends unexpectedly again.

Using the same .tpr file and Gromacs version, I performed a simulation in the cluster and my personal laptop (GPU Nvidia 1050Ti, i7 7700HQ). The cluster finished unexpectedly at 3,500,000 steps, while my laptop reached > 16,900,000 steps. Due to cooling concerns, I finished the laptop run.

I communicated my concern with institutional support, and we have not been able to solve the problem. So any insight into the problem is appreciated. I append the error message.

Thanks
Juuel

Your job looked like:

LSBATCH: User input

#!/bin/bash +H
#BSUB -R “same[model] span[ptile=’!’,Intel_EM64T:16,Intel_a:20,Intel_b:20,Intel_c:32,Intel_d:32]”
#BSUB -q q_hpc
#BSUB -n 16
#BSUB -oo nptsalida_tat.out
#BSUB -eo npterror_tat.out
module load gromacs/2019

mpirun gmx_mpi mdrun -cpt 180 -cpnum yes -rdd 1.4 -deffnm cg_sys_md

Exited with exit code 139.

Resource usage summary:

CPU time :                                   1068627.62 sec.
Max Memory :                                 601.95 MB
Average Memory :                             562.96 MB
Total Requested Memory :                     -
Delta Memory :                               -
Max Swap :                                   5256 MB
Max Processes :                              17
Max Threads :                                17

The output (if any) follows:

===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= PID 62331 RUNNING AT mn74
= EXIT CODE: 139
= CLEANING UP REMAINING PROCESSES
= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES

===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= PID 62331 RUNNING AT mn74
= EXIT CODE: 11
= CLEANING UP REMAINING PROCESSES
= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES

Intel® MPI Library troubleshooting guide:
https://software.intel.com/node/561764

Topic		Replies	Views
Incomplete Simulation Time User discussions	9	1236	June 23, 2020
Slurm Job terminates and restarts while writing *.gro files at the end of mdrun User discussions	1	390	October 25, 2021
Gromacs stop running (or runnin without any output) after 12-18 hours User discussions	2	1664	October 19, 2020
Running on Open MPI User discussions	1	502	October 13, 2020
Calculation stopped but job not finished User discussions mdrun	8	461	October 5, 2023

Ending simulation after microsecond time scale

LSBATCH: User input

mpirun gmx_mpi mdrun -cpt 180 -cpnum yes -rdd 1.4 -deffnm cg_sys_md

=================================================================================== = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES = PID 62331 RUNNING AT mn74 = EXIT CODE: 139 = CLEANING UP REMAINING PROCESSES = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES

=================================================================================== = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES = PID 62331 RUNNING AT mn74 = EXIT CODE: 11 = CLEANING UP REMAINING PROCESSES = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES

Intel® MPI Library troubleshooting guide: https://software.intel.com/node/561764

Related topics

===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= PID 62331 RUNNING AT mn74
= EXIT CODE: 139
= CLEANING UP REMAINING PROCESSES
= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES

===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= PID 62331 RUNNING AT mn74
= EXIT CODE: 11
= CLEANING UP REMAINING PROCESSES
= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES

Intel® MPI Library troubleshooting guide:
https://software.intel.com/node/561764