GROMACS version: 2023.3
GROMACS modification: No
Hi folks
I know that there have been several threads on running Gromacs with Apple M1 and M1 or M2 chips (e.g. Error compiling Gromacs 2023's checks on Mac M2), but I recently got a MacBook Pro with the M3 chip, so I was interested to see how it would perform. I am starting this thread in case other people have concerns or suggestions or just need a starting point.
Here are the details:
My MacBook Pro is the 14" version with 18GB of RAM and an M3 Pro chip (12 core CPU, 18 core GPU). OS is Sonoma 14.1.2
I am comparing it to a Linux PC running Ubuntu 20.04 with 32GB RAM, a 24-core i9 processor and a 3080ti card with CUDA 11.2.
Test runs were run on HIV-1 protease (1ajx.pdb) with an inhibitor and a 1.5nm solvent jacket (11,137 atoms total).
Note: Mac and PC are approximately equally fast running ML in Keras, with the Mac using the Tensorflow metal plugin and the PC using the CUDA code.
…
Installation
per Error compiling Gromacs 2023's checks on Mac M2 - #5 by hess etc
Install homebrew etc
/bin/bash -c “$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)”
brew install wget
brew install cmake
brew install hwloc
brew install subversion
brew install gcc
brew install libomp
brew install opencl-headers
wget https://ftp.gromacs.org/regressiontests/regressiontests-2023.3.tar.gz
cd gromacs-2023.3
Build, with OpenMP
Note: Always start builds in a new build directory
rm -rf build ; mkdir build ; cd build
cmake … -DGMX_BUILD_OWN_FFTW=ON -DREGRESSIONTEST_DOWNLOAD=OFF
-DCMAKE_C_COMPILER=/opt/homebrew/bin/gcc-13
-DCMAKE_CXX_COMPILER=/opt/homebrew/bin/g+±13
-DGMX_MPI=no
-DGMX_OPENMP=ON
-DOpenMP_{C,CXX}FLAGS=“-Xpreprocessor -fopenmp -I/opt/homebrew/opt/libomp/include”
-DOpenMP{C,CXX}_LIB_NAMES=omp
-DOpenMP_omp_LIBRARY=/opt/homebrew/opt/libomp/lib/libomp.dylib
-DREGRESSIONTEST_PATH=//Users/gvigers/Work/programs/gromacs/regressiontests-2023.3
Nope:
CMake Warning at cmake/gmxDetectCpu.cmake:100 (message):
Did not detect build CPU features - detection program did not compile.
Please file a bug report if this is a common platform.
Call Stack (most recent call first):
cmake/gmxDetectSimd.cmake:69 (gmx_run_cpu_detection)
cmake/gmxDetectSimd.cmake:155 (gmx_suggest_simd)
cmake/gmxManageSimd.cmake:91 (gmx_detect_simd)
CMakeLists.txt:650 (gmx_manage_simd)
Try with Apple clang compiler instead
cd … ; rm -rf build ; mkdir build ; cd build
cmake … -DGMX_BUILD_OWN_FFTW=ON -DREGRESSIONTEST_DOWNLOAD=OFF
-DCMAKE_C_COMPILER=/usr/bin/clang
-DCMAKE_CXX_COMPILER=/usr/bin/clang++
-DGMX_MPI=no
-DGMX_OPENMP=ON
-DOpenMP_{C,CXX}FLAGS=“-Xpreprocessor -fopenmp -I/opt/homebrew/opt/libomp/include”
-DOpenMP{C,CXX}_LIB_NAMES=omp
-DOpenMP_omp_LIBRARY=/opt/homebrew/opt/libomp/lib/libomp.dylib
-DREGRESSIONTEST_PATH=/Users/gvigers/Work/programs/gromacs/regressiontests-2023.3
make check
100% tests passed, 0 tests failed out of 83
sudo make install
source /usr/local/gromacs/bin/GMXRC
See below for test results.
Build again with no shared libraries and add GPUs and hwloc (still using clang)
cd … ; rm -rf build ; mkdir build ; cd build
cmake … -DGMX_BUILD_OWN_FFTW=ON -DREGRESSIONTEST_DOWNLOAD=OFF
-DCMAKE_C_COMPILER=/usr/bin/clang
-DCMAKE_CXX_COMPILER=/usr/bin/clang++
-DGMX_MPI=no
-DGMX_OPENMP=ON
-DGMX_GPU=OpenCL
-DBUILD_SHARED_LIBS=off
-DGMX_HWLOC=ON
-DOpenMP_{C,CXX}FLAGS=“-Xpreprocessor -fopenmp -I/opt/homebrew/opt/libomp/include”
-DOpenMP{C,CXX}_LIB_NAMES=omp
-DOpenMP_omp_LIBRARY=/opt/homebrew/opt/libomp/lib/libomp.dylib
-DREGRESSIONTEST_PATH=/Users/gvigers/Work/programs/gromacs/regressiontests-2023.3
make check
sudo make install
source /usr/local/gromacs/bin/GMXRC
Looks good:
gmx --version
:-) GROMACS - gmx, 2023.3 (-:
Executable: /usr/local/gromacs/bin/gmx
Data prefix: /usr/local/gromacs
Working dir: /Users/gvigers/Work/programs/gromacs
Command line:
gmx --version
GROMACS version: 2023.3
Precision: mixed
Memory model: 64 bit
MPI library: thread_mpi
OpenMP support: enabled (GMX_OPENMP_MAX_THREADS = 128)
GPU support: OpenCL
NB cluster size: 8
SIMD instructions: ARM_NEON_ASIMD
CPU FFT library: fftw-3.3.8
GPU FFT library: VkFFT internal (1.2.26-b15cb0ca3e884bdb6c901a12d87aa8aadf7637d8) with OpenCL backend
Multi-GPU FFT: none
TNG support: enabled
Hwloc support: disabled
Tracing support: disabled
C compiler: /usr/bin/clang AppleClang 12.0.5.12050022
C compiler flags: -Wno-missing-field-initializers -fno-stack-check -fno-stack-check -O3 -DNDEBUG
C++ compiler: /usr/bin/clang++ AppleClang 12.0.5.12050022
C++ compiler flags: -Wno-missing-field-initializers -fno-stack-check -fno-stack-check -Weverything -Wno-c++98-compat -Wno-c++98-compat-pedantic -Wno-return-std-move-in-c++11 -Wno-source-uses-openmp -Wno-c++17-extensions -Wno-documentation-unknown-command -Wno-covered-switch-default -Wno-switch-enum -Wno-extra-semi-stmt -Wno-weak-vtables -Wno-shadow -Wno-padded -Wno-reserved-id-macro -Wno-double-promotion -Wno-exit-time-destructors -Wno-global-constructors -Wno-documentation -Wno-format-nonliteral -Wno-used-but-marked-unused -Wno-float-equal -Wno-conditional-uninitialized -Wno-conversion -Wno-disabled-macro-expansion -Wno-unused-macros -Wno-unused-parameter -Wno-unused-variable -Wno-newline-eof -Wno-old-style-cast -Wno-zero-as-null-pointer-constant -Wno-sign-compare SHELL:-Xpreprocessor -fopenmp -I/opt/homebrew/opt/libomp/include -O3 -DNDEBUG
BLAS library: External - detected on the system
LAPACK library: External - detected on the system
OpenCL include dir: /Library/Developer/CommandLineTools/SDKs/MacOSX11.3.sdk/System/Library/Frameworks/OpenCL.framework
OpenCL library: /Library/Developer/CommandLineTools/SDKs/MacOSX11.3.sdk/System/Library/Frameworks/OpenCL.framework
OpenCL version: 1.2
…
Timings on 1ajx.pdb:
Mac, Steepest-descents minimization:
Steepest Descents converged to Fmax < 500 in 1932 steps
Potential Energy = -5.4745700e+05
Maximum force = 4.9502078e+02 on atom 3131
Norm of force = 1.1838092e+01
real 0m9.929s
PC, Steepest-descents minimization:
Steepest Descents converged to Fmax < 500 in 1913 steps
Potential Energy = -5.4743575e+05
Maximum force = 4.8873819e+02 on atom 3131
Norm of force = 1.1845837e+01
real 0m4.175s
Mac, 500ps MD run, CPUs only:
Core t (s) Wall t (s) (%)
Time: 27530.204 2294.186 1200.0
38:14
(ns/day) (hour/ns)
Performance: 18.830 1.275
real 38m21.783s
Mac, 500ps MD run, Add GPUs:
Core t (s) Wall t (s) (%)
Time: 6834.195 569.527 1200.0
(ns/day) (hour/ns)
Performance: 75.853 0.316
real 9m35.661s
PC, 500ps MD run:
Core t (s) Wall t (s) (%)
Time: 1802.629 75.117 2399.8
(ns/day) (hour/ns)
Performance: 575.103 0.042
real 1m21.512s
Mac, Calculating interaction energies (CPU-only task)
real 10m6.758s
PC, Calculating interaction energies (CPU-only task)
real 13m2.150s
…
Conclusions:
The Mac is suprisingly capable, given it’s energy consumption :)
I couldn’t get the g+±13 compiler to work, but the Apple compiler looks good.
Compiling with GPU support on the M3 chip gives ~4x boost in speed.
My PC is ~7.5x as fast for MD runs, slightly slower for CPU-only tasks.
Minimizations have worked well on ~450 test cases (data not shown). I have not tested MD runs extensively.
Side note: I tried the same tests on a Mac Studio with the M2 ultra chip and got similar results, except that compiling with GPUs made it 2x slower! I did not pursue this very far.
Questions:
Any suggestions for improvements?
Are there other tests that folks would like to see?
Guy Vigers
P.S. Thanks to the developers for a great program, as always!