GROMACS modification: Yes/No
Here post your question Dear all,
We are having two cluster nodes with two cards each. I wanted to run GROMACS using the s four cards. Anyone here having experience of doing multiple node GROMACS simulation? If yes, kindly let me know how to do that.
Just follow the instructions to create an MPI build.
Having said that, paralleling over multiple GPUs is difficult unless the system is large enough, and you have a very low-latency network (e.g. Infiniband). If you try to use gigabit ethernet with cheap cards/switches, you might have round-trip times of 300 microseconds. Since we need to send a handful of messages in a typical simulation step, you can easily use 1-2 milliseconds just waiting for the network.
… and since a single GPU will typically get you more than 500 steps/second, you will be bottlenecked by your network, rather than GPUs.
so it is better to add more cards on a single node than more nodes with cards?
Yes, with some caveats:
If you are looking for highest absolut performance, start with the fastest GPUs you can afford.
You need a recent version of GROMACS
You still need a decent number number of CPU cores per GPU (although much less than what it used to be)
The connectivity CPU-to-GPU matters, so you can’t just plug four GPUs in a cheapo consumer motherboard. Dual CPU sockets also means it will be slower to communicate between some cards.
There is a reason some people shell out lots of $$$ for professional cards/nodes with direct GPU-to-GPU interconnects (such as DGX-1).
You can find some multi-GPU benchmarks in the recent paper by @pszilard :