Gromacs 2021 on WSL2 with GPU

GROMACS version: 2021
GROMACS modification: Yes/No
Here post your question
Dear All,
I have recently upgraded to gromacs2021. I have nvidia gpu card Geforce RTX 2060. I have wsl2 on windows 10. I installed cuda tool kit, container etc following the instructions (CUDA on WSL :: CUDA Toolkit Documentation). Then i compiled gromacs 2021 as follows

sudo cmake … -DGMX_BUILD_OWN_FFTW=OFF -DREGRESSIONTEST_DOWNLOAD=OFF -DCMAKE_C_COMPILER=gcc -DGMX_GPU=CUDA -DREGRESSIONTEST_PATH=/mnt/c/Users/veeru/Downloads/gromacs/regressiontests-2021

Everything was OK. Now I am running a simulation. I prepared input files using charm-gui web service and running simulation as per attached script. I have total 275773 atoms including tip water. it seems like gromacs running on GPU as shown below-
Command line:
gmx mdrun -v -deffnm step5_2

Reading file step5_2.tpr, VERSION 2021 (single precision)
Changing nstlist from 20 to 100, rlist from 1.225 to 1.346

1 GPU selected for this run.
Mapping of GPU IDs to the 2 GPU tasks in the 1 rank on this node:
** PP:0,PME:0**
PP tasks will do (non-perturbed) short-ranged interactions on the GPU
PP task will update and constrain coordinates on the CPU
PME tasks will do all aspects on the GPU
Using 1 MPI thread
Using 12 OpenMP threads

starting mdrun ‘Title’
500000 steps, 1000.0 ps.
step 600: timed with pme grid 120 120 120, coulomb cutoff 1.200: 3943.0 M-cycles
step 800: timed with pme grid 108 108 108, coulomb cutoff 1.294: 4282.4 M-cycles
step 1000: timed with pme grid 100 100 100, coulomb cutoff 1.398: 4681.8 M-cycles
step 1200: timed with pme grid 104 104 104, coulomb cutoff 1.344: 12492.1 M-cycles
step 1400: timed with pme grid 108 108 108, coulomb cutoff 1.294: 10829.8 M-cycles
step 1600: timed with pme grid 112 112 112, coulomb cutoff 1.248: 10153.6 M-cycles
step 1800: timed with pme grid 120 120 120, coulomb cutoff 1.200: 9483.0 M-cycles
optimal pme grid 120 120 120, coulomb cutoff 1.200
step 58100, will finish Mon Apr 26 16:54:37 2021

However, I do not see much improvement in speed. It is about 5hrs/1ns. Am i doing anything wrong? How can I improve the speed? Thank you readme.log (1.9 KB)