Extreme RAM consumption of an md simulation

GROMACS version: 2022.5
GROMACS modification: No
I encountered an ‘OUT_OF_MEMORY’ error with the following job details for an md simulation with ~250,000 particles:

Nodes: 16
Cores per node: 64
CPU Utilized: 310-17:42:30
CPU Efficiency: 97.49% of 318-17:50:56 core-walltime
Job Wall-clock time: 07:28:14
Memory Utilized: 5.31 TB (estimated maximum)
Memory Efficiency: 392.86% of 1.35 TB (86.43 GB/node)

srun -n 512 -c 2 gmx_mpi mdrun -ntomp 2 -deffnm md

I’m interested in understanding why the simulation required such an excessive amount of RAM and what steps I can take to optimize memory usage in future GROMACS simulations on this cluster.

The workflow (solvation, ions, energy minimization,…) until the md run is pretty similar to the gromacs tutorial.

Any insights or recommendations would be greatly appreciated.

Thanks in advance!

Max

We can’t say much without more information about your setup. But it is strange that you can run energy minimization but not MD. What changes did you make in the mdp parameters between EM and MD?

Thank you for your answer!

Here is the setup:

  • I am investigating a protein
  • ff: amber99sb-star-ildn-q-tip4pd.ff
  • As solvent model I used tip4p in a dodecahedron box (-c -d 1.0 -bt dodecahedron)
  • EM.mdp: is the same as in the lysozyme in water tutotrial
    emtol = 1000.0
    emstep = 0.01
    nsteps = 50000
  • MD.mdp changes:
    nsteps = 500000000 (1μsecond)
    dt = 0.002
    continuation = no
    en_vel = yes
    gen_seed = -1
    For the rest of the MD.mdp file I used the default values

CPU: Intel Gold 6130 S2/C16/T2

In our group we also had this problem with different simulation as soon as 16 nodes are to be claimed.
With 8 nodes the simulation works but the performance is not satisfying.

I hope I could share some useful information.
Thanks for the help

Does it make a difference if you run fewer tasks per node?
E.g. srun -n 64 -c 16 gmx_mpi mdrun -ntomp 16 -deffnm md

It strange that it works on 8 but not on 16 nodes. Using more nodes reduces the memory requirement per node.

What is the memory usage on 8 nodes?

@hess

The MaxRSS on 8 nodes were 87148K.

@MagnusL

I haven´t done it yet, but thats something I will try.

87 MB can’t be correct.

@hess

Yes that was a mistake sorry.
that should be right

Nodes: 8
Cores per node: 64
CPU Utilized: 2027-09:21:42
CPU Efficiency: 98.99% of 2048-01:25:20 core-walltime
Job Wall-clock time: 4-00:00:10
Memory Utilized: 21.28 GB (estimated maximum)
Memory Efficiency: 3.08% of 691.41 GB (86.43 GB/node)

So the memory usage goes from 21 GB to 5.3 TB when doubling the number of nodes. Or do I misunderstand something? Such a change is very unlikely to come from GROMACS. It could be some hidden bug that suddenly triggers order of magnitude more memory usage wen doubling the number of nodes.

What are the last few lines in the log file of the run that goes out of memory?

Yes exactly. The problem is that I don´t have an idea where it is comming from.
I already talked to our cluster support, but their only answer was to double the memory capacity for the next run but that wonn´t help if we are talking about TB.

I dont have the log files anymore, but I am currently waiting for my SLURM job to get started with the adapted command line MagnusL provided.
If I encounter the same problem again I can provide the log file.

Thanks for the input!!

Hey @MagnusL

the simulation with your provided command is running smoothly wie 90ns/day.
Thank you for your advice. But still I do not really understand the problem, can you explain me on which grounds you decided to change the values of -n, -c and -ntomp?

Thanks in advance!

Good to hear that it helped at least.

Unfortunately I don’t have any specific grounds for that recommendation. It’s just that my personal experience has shown that when running on more than 3 or 4 nodes it has been more efficient not to increase the total number of MPI tasks. I.e., to lower the number of tasks per node. I haven’t had the reported RAM issues, though, so it was just that I thought that 512 MPI tasks sounded a bit high.

Edit: It’s quite possible that srun -n 128 -c 8 gmx_mpi mdrun -ntomp 8 -deffnm md would be worth trying, perhaps also srun -n 256 -c 4 gmx_mpi mdrun -ntomp 4 -deffnm md. You might see a difference in performance.