Running in several nodes

GROMACS version: 2020.1
GROMACS modification: No

Dear all,

I am trying to set a run using several nodes. Each node contains 24 cores, divided in 2 sockets. (2x12 cores). As a control, I have a run using 1 node that takes 8 days. This is the main part of the script for that run:

## SLURM options ##
#SBATCH --job-name=test
#SBATCH --partition=compute24
#SBATCH --nodes=1
#SBATCH --tasks-per-node=24

SCRATCH=/scratch/$USER/$SLURM_JOB_ID # $SCRATCH is the output dir for the calculation

module load mpi/openmpi/4.0.1/gcc

source $HOME/Softwares/gromacs-2020.1/gromacs-2020.1_built/bin/GMXRC

cd $SLURM_SUBMIT_DIR

# Creating dir for the output -> $SCRATCH
mkdir -p $SCRATCH

ROOT_NAME=out_test

# Copying the necessary inputs to $SCRATCH
cp $ROOT_NAME.tpr $SCRATCH

## RUNNING...
# Launching GROMACS
cd $SCRATCH

gmx_mpi mdrun -v -deffnm $ROOT_NAME -ntomp 24

# Copying the output back
cp  $ROOT_NAME.* $SLURM_SUBMIT_DIR

# Moving back and removing $SCRATCH
cd $SLURM_SUBMIT_DIR
rm -rf $SCRATCH

I’ve tried with mpirun for the run in several nodes, following the examples in Getting good performance from mdrun — GROMACS 2020.4 documentation but I get no acceleration, for instance:

#SBATCH --nodes=2
#SBATCH --tasks-per-node=24

(...)

mpirun -np 2 gmx_mpi mdrun -v -deffnm $ROOT_NAME -ntomp 24

…gives a poor performance. What am I missing? Any suggestion/help will be appreciated.

Best,
-Yasser

Could you try following

#SBATCH --nodes=2
#SBATCH --tasks-per-node=1 
#SBATCH --cpus-per-task=24 

mpirun -np 2 gmx_mpi mdrun -v -deffnm $ROOT_NAME -ntomp 24

Thanks for your reply.

I’ve tried that before, but GROMACS can not find the .tpr, even is I explicitly pass it with the -s flag. The file is definitely in the work directory and copied to the calculation $SCRATCH directory:

#SBATCH --job-name=multinode_test
#SBATCH --partition=compute24
#SBATCH --nodes=2
#SBATCH --tasks-per-node=1
#SBATCH --cpus-per-task=24

cd $SLURM_SUBMIT_DIR

(...)

# Creating dir for the output -> $SCRATCH
mkdir -p $SCRATCH

ROOT_NAME=md_MULTINODE_test

# Copying the necessary inputs to $SCRATCH
cp $ROOT_NAME.tpr  $SCRATCH

## RUNNING...
# Launching GROMACS
cd $SCRATCH

mpirun -np 2 gmx_mpi mdrun -v -deffnm $ROOT_NAME -ntomp $SLURM_CPUS_PER_TASK -s $ROOT_NAME.tpr

(...)

slurm.out:

Program:     gmx mdrun, version 2020.1
Source file: src/gromacs/commandline/cmdlineparser.cpp (line 275)
Function:    void gmx::CommandLineParser::parse(int*, char**)
MPI rank:    1 (out of 2)

Error in user input:
Invalid command-line options
  In command-line option -s
    File 'md_MULTINODE_test.tpr' does not exist or is not accessible.
    The file could not be opened.
      Reason: No such file or directory
      (call to fopen() returned error code 2)

I solved the issue with the .tpr file by eliminating the step of copying to the $SCRATCH directory, and it runs. However, it does not scale in 2 nodes.

Running in 1 node:

#SBATCH --job-name=multinode_test
#SBATCH --partition=compute24
#SBATCH --nodes=1
#SBATCH --tasks-per-node=24

gmx_mpi mdrun -v -deffnm 1_NODE -s $ROOT_NAME.tpr -ntomp $SLURM_TASKS_PER_NODE

slurm_1_node.out:

Using 1 MPI process
Non-default thread affinity set, disabling internal thread affinity
Using 24 OpenMP threads

… will finish Thu Jan 14 (7 days)…

Running in 2 nodes, with mpirun:

#SBATCH --nodes=2
#SBATCH --tasks-per-node=24

mpirun -np 2 gmx_mpi mdrun -v -deffnm 2_NODES -s $ROOT_NAME.tpr -ntomp 24

slurm_2_nodes.out:

Using 2 MPI processes
Non-default thread affinity set, disabling internal thread affinity
Using 24 OpenMP threads per MPI process

… will finish Sun Feb 21 (1 month, 14 days)

Running in 2 nodes (2nd try, as suggested):

#SBATCH --nodes=2
#SBATCH --tasks-per-node=1
#SBATCH --cpus-per-task=24

mpirun -np 2 gmx_mpi mdrun -v -deffnm 2_NODES_2 -s $ROOT_NAME.tpr -ntomp $SLURM_CPUS_PER_TASK

slurm_2_nodes_2.out:

Using 2 MPI processes
Non-default thread affinity set, disabling internal thread affinity
Using 24 OpenMP threads per MPI process

… will finish Thu Mar 11 (2 months, 4 days)

I am definitely missing something. Please, any help or suggestion will be very much appreciated.

Best

I suspect mpirun feeds both tasks into the same node even though a request was made for 1 task/node. Can you see how MPI only works,

#SBATCH --nodes=2
#SBATCH --ntasks = 48 

export OMP_NUM_THREADS=1

mpirun -np 48 gmx_mpi mdrun -v -deffnm 2_NODES_2 -s $ROOT_NAME.tpr

List of things that can possibly go wrong:

  • Wrong placement of MPI processes on Hybrid job (MPI+OpenMP)
  • Using wrong communication protocols (e.g., TCP vs IB)
  • Slow internode connector (ethernet vs. Infiniband)

Regards,
Masrul

I tried your suggestion and returned this:

-------------------------------------------------------
Program:     gmx mdrun, version 2020.1
Source file: src/gromacs/domdec/domdec.cpp (line 2277)
MPI rank:    0 (out of 48)

Fatal error:
There is no domain decomposition for 36 ranks that is compatible with the
given box and a minimum cell size of 1.83 nm
Change the number of ranks or mdrun option -rdd or -dds
Look in the log file for details on the domain decomposition

For more information and tips for troubleshooting, please check the GROMACS
website at http://www.gromacs.org/Documentation/Errors
-------------------------------------------------------
--------------------------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

  Process name: [[34868,1],18]
  Exit code:    1
--------------------------------------------------------------------------

I didn’t work with other values of $OMP_NUM_THREADS (2, 4) or -np 24.

Regards.

Probably your box very small for that many MPI tasks. For the sake of scalability test of machine/gromacs-build, try running for bigger system that can facilitate 48 tasks. By the way, how many atoms your system has?

My system has 15277 atoms (DNA system with a small ligand, in water and KCl, AMBER99bsc1 FF) and the box dimensions are 5.36494^3 nm.

But I am a bit confused. The error informs about 36 MPI ranks (asked 48) for that box size. Why 36? Is that the maximum number of ranks for that size? I ran the system in a single node with 32 CPUs and worked well.

A full .log file would be informative. Some ranks may be assigned to PME rather than PP. Regardless, your system is quite small so you likely won’t benefit from using as many processors as you’re asking for. I would not expect much performance enhancement about 16 or 24 processors.

Attached the .log file: 2_NODES.log (15.2 KB). Right, it seems that the box size is too small. I wanted to have it clear.

Thanks.