Pme calculation on gpu?

GROMACS version:2018
GROMACS modification: Yes/No
Here post your question

Hi GROMACS users
I used the following script to run the simulation but for some reason, I was not able to assign GPU to do the PME calculation? How to do that?
The size of the system is huge (600.000 atoms) but I think 30ns/day is nothing
could you please advise me about some ways to speed up the simulation like for example increasing the number of nodes, gpu or cpu cores, and so on ?

Thank you in advance

#!/bin/bash
#SBATCH -N 4 --tasks-per-node=2
#SBATCH -t 01:00:00
#SBATCH -p GPU --gres=gpu:p100:2

Setup the module command

set echo
set -x
module load gromacs/2018_gpu
cd $SLURM_SUBMIT_DIR
#move to working directory

this job assumes:

- all input data is stored in this directory

- all output should be stored in this directory

cd /pylon5/bio200035p/amnah/6PIJnew
#run GPU program
./mygpu
#Replace below “em” by the default name for your files:
mpirun -np 8 gmx_mpi mdrun -deffnm p100test -resethway -noconfout -nsteps 30000 -v -pin on -nb gpu -pme gpu -npme 1 -gputasks 01 -ntomp 12 -resetstep 10000 -s npttestnew.tpr -dlb yes

Using 8 MPI processes
No option -multi
No option -multi
No option -multi
Using 12 OpenMP threads per MPI process

No option -multi
No option -multi
No option -multi
No option -multi
On host gpu017.pvt.bridges.psc.edu 2 GPUs user-selected for this run.
Mapping of GPU IDs to the 2 GPU tasks in the 2 ranks on this node:
PP:0,PP:1

NOTE: Your choice of number of MPI ranks and amount of resources results in using 12 OpenMP threads per rank, which is most likely inefficient. The optimum is usually between 2 and 6 threads per rank.

           Core t (s)   Wall t (s)        (%)
   Time:     7860.125       81.876     9600.0
             (ns/day)    (hour/ns)

Performance: 31.660 0.758