GROMACS version: 2022.3, 2022
GROMACS modification: No
I ran 2D-AWH on Piz Daint and encounter this error.
slurmstepd: error: Detected 1 oom-kill event(s) in StepId=44077037.0 cgroup. Some of your processes may have been killed by the cgroup out-of-memory handler.
at which I thought it was to do with Piz Daint recent update. But it wasn’t, as I tried to run it on my local workstation and I encountered this error.
Killed
It was killed, with no warning, no seg fault, nothing that I can use to diagnose the death of this job by myself.
To help you with this - here is my mdp file on the AWH segment.
pull-group1-name = S349_A
pull-group2-name = S349_C
pull-group3-name = S349_B
pull-group4-name = S349_D
pull_group5_name = POX
pull-group6-name = 361_C_A
pull-group7-name = 362_N_A
pull-group8-name = 362_CA_A
pull-group9-name = 362_C_A
pull-group10-name = 363_N_A
pull-group11-name = 361_C_B
pull-group12-name = 362_N_B
pull-group13-name = 362_CA_B
pull-group14-name = 362_C_B
pull-group15-name = 363_N_B
pull-group16-name = 361_C_C
pull-group17-name = 362_N_C
pull-group18-name = 362_CA_C
pull-group19-name = 362_C_C
pull-group20-name = 363_N_C
pull-group21-name = 361_C_D
pull-group22-name = 362_N_D
pull-group23-name = 362_CA_D
pull-group24-name = 362_C_D
pull-group25-name = 363_N_D
pull-coord1-groups = 1 2
pull-coord1-geometry = distance
pull-coord2-groups = 3 4
pull-coord2-geometry = distance
pull-coord3-geometry = dihedral
pull-coord3-groups = 8 6 6 7 7 9
pull-coord4-geometry = dihedral
pull-coord4-groups = 9 7 7 8 8 10
pull-coord5-geometry = dihedral
pull-coord5-groups = 13 11 11 12 12 14
pull-coord6-geometry = dihedral
pull-coord6-groups = 14 12 12 13 13 15
pull-coord7-geometry = dihedral
pull-coord7-groups = 18 16 16 17 17 19
pull-coord8-geometry = dihedral
pull-coord8-groups = 19 17 17 18 18 20
pull-coord9-geometry = dihedral
pull-coord9-groups = 23 21 21 22 22 24
pull-coord10-geometry = dihedral
pull-coord10-groups = 24 22 22 23 23 25
pull-coord11-geometry = transformation
pull-coord11-groups = []
pull-coord11-type = external-potential
pull-coord11-potential-provider = AWH
pull-coord11-expression = 0.5*x1+0.5*x2
;
pull-coord12-geometry = transformation
pull-coord12-groups = []
pull-coord12-type = external-potential
pull-coord12-potential-provider = AWH
pull-coord12-expression = (x3+x4+x5+x6+x7+x8+x9+x10)/8
;
awh = yes
awh-potential = convolved
awh-share-multisim = yes
awh-nbias = 1
awh-nstout = 50000
awh1-ndim = 2
awh1-equilibrate-histogram = yes
awh1-target = constant
awh1-share-group = 1
awh1-dim1-coord-index = 11
awh1-dim2-coord-index = 12
awh1-dim1-start = 0.60
awh1-dim1-end = 2.50
awh1-dim1-force-constant = 20000
awh1-dim1-diffusion = 0.0002
awh1-dim1-cover-diameter = 0.1
awh1-dim2-start = -180
awh1-dim2-end = 180
awh1-dim2-diffusion = 2e-4
awh1-dim2-force-constant = 12800
These are my index file content
[ S349_A ]
3936 3937 3938 3939 3940 3941 3942 3943 3944 3945 3946
[ S349_B ]
9809 9810 9811 9812 9813 9814 9815 9816 9817 9818 9819
[ S349_C ]
15682 15683 15684 15685 15686 15687 15688 15689 15690 15691 15692
[ S349_D ]
21555 21556 21557 21558 21559 21560 21561 21562 21563 21564 21565
[ r_361_&_C ]
4153 10026 15899 21772
[ r_362_&_N ]
4155 10028 15901 21774
[ r_362_&_CA ]
4157 10030 15903 21776
[ r_362_&_C ]
4175 10048 15921 21794
[ r_363_&_N ]
4177 10050 15923 21796
[ 361_C_A ]
4153
[ 361_C_B ]
10026
[ 361_C_C ]
15899
[ 361_C_D ]
21772
[ 362_N_A ]
4155
[ 362_N_B ]
10028
[ 362_N_C ]
15901
[ 362_N_D ]
21774
[ 362_CA_A ]
4157
[ 362_CA_B ]
10030
[ 362_CA_C ]
15903
[ 362_CA_D ]
21776
[ 362_C_A ]
4175
[ 362_C_B ]
10048
[ 362_C_C ]
15921
[ 362_C_D ]
21794
[ 363_N_A ]
4177
[ 363_N_B ]
10050
[ 363_N_C ]
15923
[ 363_N_D ]
21796
The grompp stage is healthy and fine. I only copied the relevant part of the grompp output for you.
Pull group 1 'S349_A' has 11 atoms
Pull group 2 'S349_C' has 11 atoms
Pull group 3 'S349_B' has 11 atoms
Pull group 4 'S349_D' has 11 atoms
Pull group 5 'POX' has 1 atoms
Pull group 6 '361_C_A' has 1 atoms
Pull group 7 '362_N_A' has 1 atoms
Pull group 8 '362_CA_A' has 1 atoms
Pull group 9 '362_C_A' has 1 atoms
Pull group 10 '363_N_A' has 1 atoms
Pull group 11 '361_C_B' has 1 atoms
Pull group 12 '362_N_B' has 1 atoms
Pull group 13 '362_CA_B' has 1 atoms
Pull group 14 '362_C_B' has 1 atoms
Pull group 15 '363_N_B' has 1 atoms
Pull group 16 '361_C_C' has 1 atoms
Pull group 17 '362_N_C' has 1 atoms
Pull group 18 '362_CA_C' has 1 atoms
Pull group 19 '362_C_C' has 1 atoms
Pull group 20 '363_N_C' has 1 atoms
Pull group 21 '361_C_D' has 1 atoms
Pull group 22 '362_N_D' has 1 atoms
Pull group 23 '362_CA_D' has 1 atoms
Pull group 24 '362_C_D' has 1 atoms
Pull group 25 '363_N_D' has 1 atoms
Number of degrees of freedom in T-Coupling group PROT is 80758.73
Number of degrees of freedom in T-Coupling group MEMB is 111110.27
Number of degrees of freedom in T-Coupling group SOL_ION is 331359.00
Determining Verlet buffer for a tolerance of 0.005 kJ/mol/ps at 303.15 K
Calculated rlist for 1x1 atom pair-list as 1.285 nm, buffer size 0.085 nm
Set rlist, assuming 4x4 atom pair-list, to 1.210 nm, buffer size 0.010 nm
Note that mdrun will redetermine rlist based on the actual pair-list setup
Calculating fourier grid dimensions for X Y Z
Using a fourier grid of 80x80x112, spacing 0.149 0.149 0.148
Pull group natoms pbc atom distance at start reference at t=0
1 11 3941
2 11 15687 0.943 nm 0.000 nm
3 11 9814
4 11 21560 0.906 nm 0.000 nm
8 1 0
6 1 0 -35.520 deg 0.000 deg
9 1 0
7 1 0 22.639 deg 0.000 deg
13 1 0
11 1 0 -30.483 deg 0.000 deg
14 1 0
12 1 0 15.009 deg 0.000 deg
18 1 0
16 1 0 -35.782 deg 0.000 deg
19 1 0
17 1 0 9.406 deg 0.000 deg
23 1 0
21 1 0 -31.776 deg 0.000 deg
24 1 0
22 1 0 25.560 deg 0.000 deg
24 1 0
22 1 0 0.925 nm 0.000 nm
24 1 0
22 1 0 -0.133 nm 0.000 nm
Estimate for the relative computational load of the PME mesh part: 0.14
NOTE 6 [file awh-2d-new.mdp]:
This run will generate roughly 5957 Mb of data
What do I need to do to solve this issue?
Best wishes
Will