Running Simulations in Parallel Across Multiple Nodes/GPUs

GROMACS version: 2020.4
GROMACS modification: No

I’m having trouble running simulations in parallel, and I’m not sure if it’s something wrong with the flags I’m using or if it’s because of the way my versions of GROMACS are compiled.

If I want to run using 2 GPUS, my submission file (the relevant part) looks like this:

#SBATCH --nodes=1
#SBATCH --gres=gpu:V100:2
#SBATCH --ntasks-per-node=2
#SBATCH --cpus-per-task=20

mpirun -np 2 mdrun_mpi -ntomp 20 -deffnm pull

What I’m thinking that I’m doing is running using 2 ranks (one for each GPU), and each rank uses 20 OMP threads (so 20 CPUs/GPU ). Did I understand this correctly?
Either way, this is giving me segmentation faults (output file “out-gpu.log” attached). out-gpu.log (5.2 KB)

On the other hand, if I want to run a simulation using CPUs only across multiple nodes, this is what my submission file looks like:

#SBATCH --nodes=2
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=20

mpirun -np 2 mdrun_mpi -ntomp 20 -deffnm pull -pf pullf.xvg -px pullx.xvg

So here I think I’m using 1 MPI process for each node, and 20 CPUs per node. I also got a segmentation fault (output file “out-cpu.log” attached). out-cpu.log (5.8 KB)

Any help would be highly appreciated.

Hi Carmen,

in your case, the error indicates that there is an issue with the simulation box rather than the way you use CPUs and GPUs. If you have a look at the final line of the .gro file you use for starting the simnulation (or the cryst record if you use a pdb file), you should see how your box is defined and should find some extreme angle there.