How to offload more work on GPU than CPU?

GROMACS version:2021.4
GROMACS modification: No

Hi,
I am running production MD on a remote server with 1 MPI process and 8 OpenMP threads and a Nvidia A100 GPU.
My command: nohup gmx_mpi mdrun -v -deffnm step7 -nb gpu &
My system was running much slower than I expected. As I checked Nvidia-smi, I saw 100% use of CPU and only 40% of GPU.
When I tried to offload bonded and update to gpu, I got the following error:
Inconsistency in user input:
Update task on the GPU was required,
but the following condition(s) were not satisfied:
Nose-Hoover temperature coupling is not supported.

Therefore I could only offload nb.

What is the reason for the low activity of GPU? and Will adding bonded and update (after changing tcoupl to berendsen for example) give me a higher activity?

Thank you in advance.

The GPU cannot be fully utilized if it has to wait for the CPU to finish its tasks before the next simulation step (and vice versa). You can increase the number of CPU processes to improve the balance (say, 2 or 4 MPI Processes with 8 OpenMP threads each), or shift more work to the GPU as you are currently trying (but how much effect that will have I cannot say).

Thank you for your answer. I used a remote server of an HPC center and increasing number of CPU processes was not an option for them.

I changed Tcoupl and ran the simulation with this command:
nohup gmx_mpi mdrun -v -deffnm step7 -nb gpu -bonded gpu -update gpu &

I got 80% of gpu utlity and 540 ns/day for a system with 42000 atoms. (Before only 211 ns/day for the same system)