Hi - The commands you’re using both use gpu_id 0,1 - one of them should be 2,3 or you’re targeting both simulations to the same GPUs
Saying that you have 2 CPUs is misleading - you appear to have 2 sockets, each with 14 physical CPUs and 28 logical CPUs.
You will have maximum throughput limiting to 1 GPU per simulation. Also, you can have a big performance hit from not thread pinning. Something like this might get you decent performance
gmx mdrun -nt 14 -pin on -pinoffset 0 -gpu_id 0 &
gmx mdrun -nt 14 -pin on -pinoffset 14 -gpu_id 1 &
gmx mdrun -nt 14 -pin on -pinoffset 28 -gpu_id 2 &
gmx mdrun -nt 14 -pin on -pinoffset 42 -gpu_id 3
Note that the first sim is now pinned to logical cores 0-13, the second to logical cores 14-27, and so on. If you can’t use the whole computer, feel free to reduce those thread counts, but try not to have a simulation that spans between cores 27-28, since that’s probably the socket boundary unless your computer counts CPUs weird.
There is also a whole lot you can do to try to optimize per-simulation performance. Have you checked out the performance guide? It has lots of good examples.
Another thing - to really maximize GPU utilization, if you have enough CPUs (which you appear to), you can run 2 simulations per GPU.