GROMACS version: 2023.3 and 2023.2
GROMACS modification: No
Hello Gromacs community,
I am seeking help to enhance the performance of our local supercomputer.
I compared a 125 ns equilibration run on OSC (Ohio Supercomputer) versus our local supercomputer. The local supercomputer took 28 hours, whereas OSC only took 40 minutes for the same 125 ns equilibration run. I have attached the output files of both simulations for your reference.
I utilized all the CPUs from our local supercomputer (2 sockets times 24 cores times 2 threads = 96 CPUs), and 48 CPUs from OSC (2 sockets times 24 cores times 1 thread = 48 CPUs). Please see below for the CPU comparison.
I believe the production run will take a significantly longer time, as indicated by the comparison between OSC and the local supercomputer. Therefore, it needs to be addressed before we proceed with the production run. Could you please provide your ideas on how to deal with this issue.
The attempt I will do to fix
-
Use one thread instead of 2 threads to see if it is causing problem.
-
Manually change the domain decomposition, as suggested in line 2923 of the local supercomputer output file.
step6.1_equilibration_test_GPU_server.log (135.0 KB)
step6.1_equilibration_test_OSC.log (119.4 KB)Local GPU server OSC
±----------------------±-----------------------±--------------------------+
| | AMD EPYC 7352 | Intel Xeon Platinum 8268
±----------------------±-----------------------±--------------------------+
| Cores | 24 | 24
| Threads | 48 (SMT) | 48 (Hyper-Threading)
| Base Clock Speed | 2.3 GHz | 2.9 GHz
| L3 Cache | 16 MB | Varies
| Architecture | Zen 2 | Cascade Lake
| Socket | SP3 (PCIe 4.0) | FCLGA3647 (PCIe 3.0)
| PCIe Support | PCIe 4.0 | PCIe 3.0
| TDP | Not specified | Not specified
| Virtualization | AMD-V | VT-x, VT-d
| Manufacturer | AMD | Intel
±----------------------±-----------------------±--------------------------+