-pme gpu is not working?

GROMACS version: 2018
GROMACS modification: Yes/No
Here post your question

Hi
I have tried running the following command ti enable pme calculation on gpu but according to the log file it is not working and pme calculation was not being calculated on gpu
mpirun -np $SLURM_NPROCS gmx_mpi mdrun -deffnm productionnew -resethway -noconfout -nsteps 30000 -v -pin on -nb gpu -pme gpu -npme 1 -gputasks 0123 -ntomp 16 -resetstep 10000 -s npttestnew.tpr -dlb yes

Using 8 MPI processes
No option -multi
No option -multi
No option -multi
No option -multi
No option -multi
No option -multi
Using 16 OpenMP threads per MPI process

On host gpu007.pvt.bridges.psc.edu 4 GPUs user-selected for this run.
Mapping of GPU IDs to the 4 GPU tasks in the 4 ranks on this node:
PP:0,PP:1,PP:2,PP:3

NOTE: Your choice of number of MPI ranks and amount of resources results in using 16 OpenMP threads per rank, which is most likely inefficient. The optimum is usually between 2 and 6 threads per rank.

NOTE: GROMACS was configured without NVML support hence it can not exploit
application clocks of the detected Tesla K80 GPU to improve performance.
Recompile with the NVML library (compatible with the driver used) or set application clocks manually.

WARNING: On rank 0: oversubscribing the available 28 logical CPU cores per node with 64 threads.
This will cause considerable performance loss.

Overriding thread affinity set outside gmx mdrun

NOTE: Oversubscribing the CPU, will not pin threads

I want to perform pme calculation to improve the performance
Could you please help me with that issue?
Thank you very much

Hi Amnah,

A few points that might help:

  • From the output you state, I cannot see why the GPU would not run PME (maybe I missed something)
  • if you can, using GROMACS 2020 will give you better GPU usage.
  • you don’t have to set -npme
  • for performance optimisation, starting out without setting -ntomp etc might be better, and starting from all things default might be an easier strategy, then have a look at your log files and start optimising from there.
  • oversubscribing the CPU will most likely be very bad for you performance
  • if you run on a single node, you can get better results just running gmx mdrun with thread-MPI

Thank you very much for your reply

I am able to run PME on GPU when removing the -npme option using the following command

  • mpirun -np 2 gmx_mpi mdrun -deffnm p100vbbtest -resethway -noconfout -nsteps 30000 -pin on -v -nb gpu -pme gpu -ntomp 6 -resetstep 10000 -s npttestnew.tpr -dlb yes
    But I got this error

On host gpu042.pvt.bridges.psc.edu 2 GPUs auto-selected for this run.
Mapping of GPU IDs to the 4 GPU tasks in the 2 ranks on this node:
PP:0,PME:0,PP:1,PME:1

NOTE: GROMACS was configured without NVML support hence it can not exploit
application clocks of the detected Tesla P100-PCIE-16GB GPU to improve performance.
Recompile with the NVML library (compatible with the driver used) or set application clocks manually.

NOTE: GROMACS was configured without NVML support hence it can not exploit
application clocks of the detected Tesla P100-PCIE-16GB GPU to improve performance.
Recompile with the NVML library (compatible with the driver used) or set application clocks manually.

Overriding thread affinity set outside gmx mdrun


Program: gmx mdrun, version 2018
Source file: src/gromacs/ewald/pme-gpu-internal.cpp (line 284)
Function: void pme_gpu_init(gmx_pme_t*, gmx_device_info_t*)
MPI rank: 1 (out of 2)

Feature not implemented:
PME GPU does not support: PME decomposition.

You need to specify a single PME rank, as the error message states PME decomposition is not supported, hence the default assignment ( PP:0,PME:0,PP:1,PME:1) will not work.

However, you are still using a very dated version, ones that is not even the last release of the 2018 series. To avoid chasing possible issues that have been long solved, please at least use the last patch release of the 2018 series and preferably, as @cblau noted, use the latest 2020 release.

Cheers,
Szilard

Thank you for your reply
I have started using 2020.2 but I have a question
I use the following script for version 2020.2 but it gives me two performances? I am not sure why
This is a single simulation, not multi simulations!

#!/bin/bash
#SBATCH -N 1 --tasks-per-node=2
#SBATCH -t 01:00:00
#SBATCH -p GPU --gres=gpu:p100:2

Setup the module command

set echo
set -x
module load gromacs/2020.2_notplumed
cd $SLURM_SUBMIT_DIR
#move to working directory

this job assumes:

- all input data is stored in this directory

- all output should be stored in this directory

cd /pylon5/bio200035p/amnah/6PIJnew
#run GPU program
./mygpu
#Replace below “em” by the default name for your files:
mpirun -np 2 gmx_mpi mdrun -deffnm p100bvbbbtest -resethway -noconfout -nsteps 30000 -resetstep 10000 -s npttestnew.tpr -dlb yes

step 15000: resetting all time and cycle counters

           Core t (s)   Wall t (s)        (%)
   Time:    12765.681      398.934     3199.9
             (ns/day)    (hour/ns)

Performance: 6.498 3.694

GROMACS reminds you: “Never Get a Chance to Kick Ass” (The Amps)

           Core t (s)   Wall t (s)        (%)
   Time:    12745.811      398.308     3200.0
             (ns/day)    (hour/ns)

Performance: 6.508 3.688

Are you sure you are not running two mdrun instances? Please post complete log files to help diagnosing your issues.