Dear users, I have never used clusters before. Could you please help me to choose the most suited cluster to run a standard MD with PME of a system of ~250K atoms (protein & membrane & explicit water) to reach range of the microseconds (2-30µSec):
Dell Intel Gold 6148:
a) 216 nodes have 40 cores per node and 192 GB of memory per node
b) 32 nodes have 40 cores, 384 GB of memory, and 2 NVIDIA Tesla V100 GPUs
Dell Intel Xeon E5-2680 v4:
c) 648 nodes have 28 cores per node and 128 GB of memory per node
d) 160 nodes have 28 cores, 128 GB or memory, and 1 NVIDIA Tesla P100 GPU
How long is the cluster time supposed to be taken?
If you can, use a GPU, however make sure that GROMACS is using the GPU by having a look at the .log file (it’s quite long, but somewhere in the upper part it should read GPU support: CUDA and further down GPU info: Number of GPUs detected: 1 ).
For your system, you most likely get the most performance per compute-power / cost if you use a “thread-mpi” version (typically using gmx, not gmx_mpi) on a single node with a single GPU, so your options d).
If you’re out for lots of sampling, using two GPUs per node for two simulations that run at the same time might be a good idea, however there you would have to take care of the numa architecture of the node, so that the two simulations don’t keep stealing resources from one-another. (so you’d pick b) )
If you’re out for the longest trajectory, no matter the cost, you might want to use multiple nodes with gmx_mpi and setup d). How many you can efficiently use depends on system size (the smaller the harder to scale to many nodes, aim at least for around 1000 atoms per node) and the communication bandwidth between different nodes, but you’ll see that especially the GPUs will be hardly used.
Note that if best absolute performance is your goal, depending on the network, you could get quite a lot of performance running on a larger number of the Intel Gold 6148 CPU nodes!
Single GPU runs will of course be better bang for buck, but if you need longer trajectories, you will need to scale to get better performance, which is easier on CPUs – especially if those 216 nodes are less occupied than the 32 nodes with GPUs.
Thanks for the helping. I chose the cluster, but when I tried to run the following command on 20 nodes of this cluster using command:
mpiexec gmx_mpi -deffnm md,
I got Fatal error:
17240 of the 902076 bonded interactions could not be calculated because some atoms involved moved further apart than the multi-body cut-off distance (0.987808 nm) or the two-body cut-off distance (1.60975 nm), see option -rdd, for pairs and tabulated bonds also see option -ddcheck
Before trying to run it on cluster, I have ran it on my Desktop. On the Desktop (16 threads 1GPU) I’ve got 12ns/day with no fatal errors… I am confused. I have a system of ~350K atoms of docked proteins (output of Rosseta Software) within POPC membrane. The system was built using CHARMM-GUI Membrane Builder using ff charmm36. I manually added disulfide bonds using CHARMM-GUI - might it be a reason of the fatal error?