A system with the number of atoms > 2000000

GROMACS version: 2020.2
GROMACS modification: Yes/No
Dear gromacs users,
I have a system, which has atoms more than 2000000 atoms.
But I can not simulation this system with mdrun

How can I run simulation this system??

Best regards,
Jinyoung

Hi,

Please supply some information about what problems you are running into with your simulation.

Regards,
Petter

I have tried to run a transmembrane protein system with the martini force field.
My system has more than 3,700,000 beads (There are more than 3,700,000 lines in the structure gro file)
But when I try to run NVT equilibration simulation, I met the error as below

[cpu7:97508] *** Process received signal ***
[cpu7:97508] Signal: Segmentation fault (11)
[cpu7:97508] Signal code: Address not mapped (1)
[cpu7:97508] Failing at address: 0x1de81b9c0
[cpu7:97508] [ 0] /lib64/libpthread.so.0(+0xf5f0)[0x7ff975bae5f0]
[cpu7:97508] [ 1] /home/byun/apps/GROMACS/gromacs_cpu-2020.2_gcc-7.5.0_openmpi-4.0.2/lib64/libgromacs_mpi.so.5(+0xb8d0fa)[0x7ff976e8c0fa]
[cpu7:97508] [ 2] /appl/compiler/gcc-7.5.0/lib64/libgomp.so.1(GOMP_parallel+0x3f)[0x7ff975dc8acf]
[cpu7:97508] [ 3] /home/byun/apps/GROMACS/gromacs_cpu-2020.2_gcc-7.5.0_openmpi-4.0.2/lib64/libgromacs_mpi.so.5(_Z14spread_on_gridPK9gmx_pme_t\
P11PmeAtomCommPK10pmegrids_tbbPfbi+0x73c)[0x7ff976e8ce7c]
[cpu7:97508] [ 4] /home/byun/apps/GROMACS/gromacs_cpu-2020.2_gcc-7.5.0_openmpi-4.0.2/lib64/libgromacs_mpi.so.5(_Z10gmx_pme_doP9gmx_pme_tN3gmx\
8ArrayRefIKNS1_11BasicVectorIfEEEENS2_IS4_EEPfS8_S8_S8_S8_S8_PA3_KfPK9t_commreciiP6t_nrnbP13gmx_wallcyclePA3_fSK_S8_S8_ffS8_S8_i+0x746)[0x7ff\
976e6d866]
[cpu7:97508] [ 5] /home/byun/apps/GROMACS/gromacs_cpu-2020.2_gcc-7.5.0_openmpi-4.0.2/lib64/libgromacs_mpi.so.5(_Z17do_force_lowlevelP10t_forc\
erecPK10t_inputrecPK6t_idefPK9t_commrecPK14gmx_multisim_tP6t_nrnbP13gmx_wallcyclePK9t_mdatomsN3gmx19ArrayRefWithPaddingINSK_11BasicVectorIfEE\
EEP9history_tPNSK_12ForceOutputsEP14gmx_enerdata_tP8t_fcdataPA3_KfPSX_PK7t_graphSZ_RKNSK_12StepWorkloadERK22DDBalanceRegionHandler+0x8f5)[0x7\
ff976dafb75]
[cpu7:97508] [ 6] /home/byun/apps/GROMACS/gromacs_cpu-2020.2_gcc-7.5.0_openmpi-4.0.2/lib64/libgromacs_mpi.so.5(_Z8do_forceP8_IO_FILEPK9t_comm\
recPK14gmx_multisim_tPK10t_inputrecPN3gmx3AwhEP10gmx_enfrotPNSA_10ImdSessionEP6pull_tlP6t_nrnbP13gmx_wallcyclePK14gmx_localtop_tPA3_KfNSA_19A\
rrayRefWithPaddingINSA_11BasicVectorIfEEEEP9history_tSW_PA3_fPK9t_mdatomsP14gmx_enerdata_tP8t_fcdataNSA_8ArrayRefIfEEP7t_graphP10t_forcerecPN\
SA_21MdrunScheduleWorkloadEPK11gmx_vsite_tPfdP9gmx_edsamiRK22DDBalanceRegionHandler+0x1170)[0x7ff976df06f0]
[cpu7:97508] [ 7] /home/byun/apps/GROMACS/gromacs_cpu-2020.2_gcc-7.5.0_openmpi-4.0.2/lib64/libgromacs_mpi.so.5(_ZN3gmx15LegacySimulator5do_md\
Ev+0x4162)[0x7ff976eac212]
[cpu7:97508] [ 8] /home/byun/apps/GROMACS/gromacs_cpu-2020.2_gcc-7.5.0_openmpi-4.0.2/lib64/libgromacs_mpi.so.5(_ZN3gmx15LegacySimulator3runEv\
+0x7d)[0x7ff976ea77ad]
[cpu7:97508] [ 9] /home/byun/apps/GROMACS/gromacs_cpu-2020.2_gcc-7.5.0_openmpi-4.0.2/lib64/libgromacs_mpi.so.5(_ZN3gmx8Mdrunner8mdrunnerEv+0x\
48f4)[0x7ff976edff84]
[cpu7:97508] [10] gmx_mpi[0x408443]
[cpu7:97508] [11] /home/byun/apps/GROMACS/gromacs_cpu-2020.2_gcc-7.5.0_openmpi-4.0.2/lib64/libgromacs_mpi.so.5(_ZN3gmx24CommandLineModuleMana\
ger3runEiPPc+0x237)[0x7ff9768f0f57]
[cpu7:97508] [12] gmx_mpi[0x404f6c]
[cpu7:97508] [13] /lib64/libc.so.6(__libc_start_main+0xf5)[0x7ff974f57505]
[cpu7:97508] [14] gmx_mpi[0x405031]
[cpu7:97508] *** End of error message ***
/var/spool/slurmd/job937846/slurm_script: line 179: 97508 Segmentation fault      (core dumped) gmx_mpi mdrun -v -deffnm eq_nvt > _eq_nvt_mdr\
un.log 2>&1

As I found, this error is from the number of beads in the system…
Is that right?

A segmentation fault could be due to running out of memory when simulating a large system, yes. Although 4 million atoms is not that much memory to occupy. Maybe someone familiar with Martini simulations knows if there is some setting that would occupy a lot of memory. Do you encounter this error if you run a smaller system?

But I’d suspect just from the error message that there is something else going on with the compiled binary. You could try recompiling Gromacs with static linking.

The most common case where I see a segfault is when the system blows up.

This means your system may have large forces in it, which can cause overflows and other mathematical (and physical!) weirdness.

You say gromacs segfaults when you try to run an NVT equilibration.

  • Did you try to run an energy minimization before? Did it work? Did it converge to a reasonable force?
  • Are you using any elastic band networks? Are they correctly set up?
  • How did you build the coordinate file (and topology)? Does it look sane (bead distance, constraints etc.)?

/var/spool/slurmd/job937846/slurm_script tells me you’re running this in a cluster:

  • Are you running grompp and mdrun with the same version of gromacs? I.E.: if you are running grompp at your workstation and transfering the tpr file, are the gromacs versions equal in both machines?
  • Did you miss module loading something in your SLURM script?
  • Did you call srun gmx_mpi [...] or mdrun gmx_mpi [...]? Even if you’re running in a single node, it’s better practice to call gmx_mpi through srun (or mpirun) instead of simply gmx_mpi [...]
  • Does gromacs work when called by the same SLURM script but a lighter tpr file? (i.e.: is the gromacs install in this cluster sane?)
  • Did you try resubmitting the job exclude-ing the node where your job failed to discard hardware failure?

Cheers!

  1. Firstly, Yes I built my system with martinize2(latest martinize python module) and insane.py for the martini2.2p force field.
  2. And as you mentioned, my system did not converge when running energy minimization, and It is terminated before step 1000, showing the message as below
Step=   18, Dmax= 1.5e-05 nm, Epot= -5.28107e+07 Fmax= 5.51434e+07, atom= 4397
Step=   19, Dmax= 7.3e-06 nm, Epot= -5.28107e+07 Fmax= 5.61273e+07, atom= 4397^MStep=   20, Dmax= 3.6e-06 nm, Epot= -5.28107e+07 Fmax= 5.72483e+0\
7, atom= 4397^MStep=   21, Dmax= 1.8e-06 nm, Epot= -5.28107e+07 Fmax= 5.77911e+07, atom= 4397
Energy minimization has stopped, but the forces have not converged to the
requested precision Fmax < 1000 (which may not be possible for your system).
It stopped because the algorithm tried to make a new step whose size was too
small, or there was no change in the energy since last step. Either way, we
regard the minimization as converged to within the available machine
precision, given your starting configuration and EM parameters.

Double precision normally gives you higher accuracy, but this is often not
needed for preparing to run molecular dynamics.
You might need to increase your constraint accuracy, or turn
off constraints altogether (set constraints = none in mdp file)

writing lowest energy coordinates.

Steepest Descents converged to machine precision in 22 steps,
but did not reach the requested Fmax < 1000.
Potential Energy  = -5.2810672e+07
Maximum force     =  5.7791104e+07 on atom 4397
Norm of force     =  3.8185081e+04
  1. I used the same version of gromacs grompp and mdrun
  2. I only called gmx_mpi mdrun -v -deffnm en_min I did not used srun or mpirun
  3. As you mentioned, when I run the energy minimization with
    mpirun -np 28 gmx_mpi mdrun -v -deffnm en_min and the message below is output
Fatal error:
There is no domain decomposition for 28 ranks that is compatible with the
given box and a minimum cell size of 30.4482 nm
Change the number of ranks or mdrun option -rdd or -dds
Look in the log file for details on the domain decomposition

How can I solve this problem?

Thank you for reply
As you mention about the static linking, do I re-compile gromacs as below?

cmake3 .. \
    -DCMAKE_INSTALL_PREFIX=/home/byun/apps/GROMACS/gromacs-2020.2-gcc-7.5 \
    -DGMX_BUILD_OWN_FFTW=ON \
    -DREGRESSIONTEST_DOWNLOAD=OFF \
    -DGMX_GPU=ON \
    -DGMX_MPI=ON \
    -DBUILD_SHARED_LIBS=OFF \
    -DCMAKE_BUILD_TYPE=Debug \
    2>&1 | tee  cmake.log

Look at what’s going on around atom 4397. mdrun tells you there is a massive force there. A system with forces of a magnitude of 107 are completely unstable.

However it is terminated before reaching at the selected step=1000

mdrun exited prematurely because the system is physically unstable and the minimizer cannot make any additional improvements. This suggests a fundamental issue with the coordinates, the topology, or both.

Thank you :) I need to check and rebuild system

1 Like