Set MPI GROMACS to run on a certain number of CPU cores/threads

GROMACS version: 2020.2_cu9.2.88 for linux x86_64 + CUDA
GROMACS modification: Yes/No
Here post your question

Hi All,

I understand how to set up an MPI run specifying # of GPUs using -gpu_id. However, I am confused as to how to set up an MPI run specifying a # of CPU cores or threads. I have 2x 16 core CPUs.

gmx_mpi mdrun -gpu_id 01 -s protein.tpr -v -deffnm protein_mdout

when I try to specify the number of CPU threads using -nt

I get
“Setting the total number of threads is only supported with thread-MPI and
GROMACS was compiled without thread-MPI”

Is there some syntax I’m misunderstanding?

thanks!

Hello!

By default, Gromacs will use all available processes when it launches.

If you want to specify how many MPI processes to use, launch it using your MPI task manager. For example, to start with 4 processes using mpiexec:

$ mpiexec -np 4 gmx_mpi mdrun

To further specify how many OpenMP threads per MPI task to use, use -ntomp. With 2 MPI processes and 8 OpenMP threads per process,

$ mpiexec -np 2 gmx_mpi mdrun -ntomp 8

By default, Gromacs should also handle that automatically.

But, if you are running on a single machine, it may be sufficient to compile Gromacs with thread-MPI instead of MPI.

Regards,
Petter

1 Like

Hi Petter,

Thank you so much for your reply! I will give it a try. Another question, does GROMACS “bow” to other programs? If I have a run set with no thread parameters specified, it should run to take advantage of all available hardware. However, if someone then starts another process, will it tone down its usage to allow for other processes, and then ramp back up when other processes are done?

Well, Gromacs itself will throw everything it has at all the CPU’s it’s assigned to, pushing them to 100%. But your system load balancer will typically tone it down if other tasks need to do something. That’s external to Gromacs itself, though. But in the end, it’s typically possible to “use” the computer in a limited capacity while a simulation is running.

If you want to share CPU resources of a single compute-node, you have two options:

  • partition the CPU resources between the jobs – which can be done by launching the desired number of total threads (adjusting the #ranks x # threads), e.g. on a 16-core machine run 2x4=8 threads assigned to GROMACS leaving half of the cores empty. Thread pinning is also important (which is done by default when all resoruces are used) and can be done manually using the mdrun -pin on option (or using an MPI launcher/job scheulder e.g. you can tell SLUM to assigned N cores of a node to a job);
  • oversubscribe the CPU resources by launching multiple jobs that, in total require CPU resources than available (hence competing for these); as @pjohansson noted the operating system will make a best effort to allow execution of all work, but unless jobs launched along mdrun are quite lightweight, I would recommend against doing so as it can often lead worse performance than the former approach (e.g. as it can cause imbalance).

Hello !

I recently installed GROMACS 2020 in my cluster, I have found a particular issue…

As pjohansson said, GROMACS try to allocate all available resource, instead by setting
-np X will try to run in X ranks.

However, in our cluster the nodes are shared, and rarely fully occupied by a single job.
Then we tried to run GROMACS in different nodes but with a number of cpus smaller
than the maximum number of cpus present in each core.

For exemple: we have 4 nodes of 28 cores, but we are able to allocate just 8 cores of
each node. (So for a total of 32 mpi tasks)
We tried by allocating the resource via our TORQUE scheduler : #PBS -l nodes=4:ppn=8
And calling GROMACS with : mpirun -np 32 gmx_mpi mdrun -deffnm name

But it GROMACS gives the error :

There are not enough slots available in the system to satisfy the 32 slots
that were requested by the application:
gmx_mpi

Either request fewer slots for your application, or make more slots available
for use.

Do you know how can avoid this? (Notice I have also set OMP_NUM_THREADS=1)

Best regards
quim