Using multiple GPUs on one machine

Hi Roman,

When running on a single node, you can use thread-MPI with GPU direct communication (see example below). This uses CUDA directly for the inter-GPU communications. CUDA-aware MPI is required for GPU direct communications across multiple nodes.

It is unlikely that you will need PME decomposition when running on 4 GPUs, since that typically only benefits at larger scales. Assigning 1 (or part of 1 - again see below) GPU to PME, with the other 3 assigned to the more expensive short-range force calculations, usually gives good balance. But in any case, see that last link below for more on PME decomposition.

Here is an example with STMV (where I am pasting from some other documentation). Note that reference performance, on a range of systems, can be found at https://developer.nvidia.com/hpc-application-performance.

Download the benchmark:
wget https://zenodo.org/record/3893789/files/GROMACS_heterogeneous_parallelization_benchmark_info_and_systems_JCP.tar.gz
tar xf GROMACS_heterogeneous_parallelization_benchmark_info_and_systems_JCP.tar.gz
cd GROMACS_heterogeneous_parallelization_benchmark_info_and_systems_JCP/stmv

Run GROMACS using 4 GPUs (with IDs 0,1,2,3). Here we use 2 thread-MPI tasks per GPU (-ntmpi 8), which we find gives good performance. We set 16 OpenMP threads per thread-MPI task (assuming at least 128 CPU cores in the system). These can be adjusted to map to any specific hardware system, and experimented with for best performance…

export GMX_ENABLE_DIRECT_GPU_COMM=1
gmx mdrun -ntmpi 8 -ntomp 16 -nb gpu -pme gpu -npme 1 -update gpu -bonded gpu -nsteps 100000 -resetstep 90000 -noconfout -dlb no -nstlist 300 -pin on -v -gpu_id 0123

For more info, please see:

Creating Faster Molecular Dynamics Simulations with GROMACS 2020 | NVIDIA Technical Blog

Maximizing GROMACS Throughput with Multiple Simulations per GPU Using MPS and MIG | NVIDIA Technical Blog

Massively Improved Multi-node NVIDIA GPU Scalability with GROMACS | NVIDIA Technical Blog

Alan Gray (NVIDIA)