I encountered some problems when using gprof to test gromacs. I didn’t find a relevant solution. Can you help me?
Here are some of my environment configurations and personal attempts :
**1.**source version:gromacs-2020.5
**2.**Cpu architecture:X64 3.I added the following information in CMakeLists.txt so that I can use gprof for performance testing.
Can you run file on the gmx binary? Same on libgromacs.so.7.0.0. I suspect possibly the debug/profile information could have been removed further down CMakeList.txt somehow.
What’s the file size of gmon.out? Is it significantly not-zero?
XXX@dell:~$ gprof -v
GNU gprof (GNU Binutils for Ubuntu) 2.34
Based on BSD gprof, copyright 1983 Regents of the University of California.
This program is free software. This program has absolutely no warranty.
So I tried with an simple C example program, and I got a gmon.out size of only a few bits, and gprof working as expected with an invocation similar your first post. Maybe that could be a good sanity test your could run.
Otherwise, I don’t really have more ideas. With a size of 20KB, your gmon.out file does look like it contains data, so I would suspect gprof, but that’s not a solution to your problem. I do know perf works fine with gromacs as far as profiler go.
Yes, I have made some subsequent attempts. It turns out that this is not the problem of gprof. It should be the optimization option of -O0 that needs to be added when gromacs is compiled. The code I modified is as follows.
XXX@dell:~/exp/gromacs-2020.5$ git diff
diff --git a/CMakeLists.txt b/CMakeLists.txt
index 0911eb2..fecc34d 100644
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -337,8 +337,8 @@ gmx_c_flags()
# EXTRA_COMPILER_FLAGS so that we we don't perpetrate bugs where
# things that work in C compilation (e.g. merging from old branches)
# might not also work for C++ compilation.
-set(EXTRA_C_FLAGS "")
-set(EXTRA_CXX_FLAGS "")
+set(EXTRA_C_FLAGS "-O0")
+set(EXTRA_CXX_FLAGS "-O0")
# Run through a number of tests for buggy compilers and other issues
include(gmxTestCompilerProblems)
@@ -929,3 +929,7 @@ ADD_CUSTOM_TARGET(uninstall
###########################
set_directory_properties(PROPERTIES
ADDITIONAL_MAKE_CLEAN_FILES "install_manifest.txt")
+SET(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -pg -g -O0")
+SET(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -pg -g -O0")
+SET(CMAKE_EXE_LINKER_FLAGS "${CMAKE_EXE_LINKER_FLAGS} -pg")
+SET(CMAKE_SHARED_LINKER_FLAGS "${CMAKE_SHARED_LINKER_FLAGS} -pg")
☑My gprof can now output data through gmon.out, but the more specific reason is that I have to test again and again to update to this forum.
Excellent! (One note though: with -O0 there is obviously no optimization - if you use the data for unit test coverage, that is fine; but if you want to profile for performance, that might be misleading (perf does work on optimized binaries)).
Thank you for reminding. 😁My goal is really to test the performance of gromacs.
Then I may have to rethink the method, for example, continue to use gprof but turn on the -O2 or -O1 option or find out what exactly affects the output.
In addition, I also used the perf that comes with Linux to test the performance of gromacs and draw its flame diagram, but the function granularity in perf is fine to the system call of Linux, which is contrary to my goal of obtaining which function of gromacs itself is more time-consuming.
Perhaps perf has other ways to help me locate the time occupied by the functions of gromacs itself? I should also think about ways in this regard. Do you have any experience in this field?😃
Gprof does not support profiling shared libraries (at least on Linux). There are other tools (see the link), but GROMACS can be linked statically, so no need to ditch gprof.
For me, gprof worked fine with optimizations (-O3) when GROMACS was built as a static library:
# Configure and build
$ cmake .. -DCMAKE_BUILD_TYPE=Profile -DBUILD_SHARED_LIBS=OFF -DGMX_BUILD_SHARED_EXE=OFF && cmake --build .
[....]
# Check that optimizations are there
$ /path/to/gmx -version | grep 'C++ compiler flags'
C++ compiler flags: -fexcess-precision=fast -funroll-all-loops -mavx2 -mfma -Wall -Wextra -Wpointer-arith -Wmissing-declarations -Wundef -Wstringop-truncation -Wno-missing-field-initializers -Wno-cast-function-type-strict -fopenmp -O3 -DNDEBUG -pg
# Run
$ /path/to/gmx mdrun [....]
[....]
# Check file size (after ~10 seconds long mdrun)
$ ls -sh gmon.out
8.1M gmon.out
# Check that there is meaningful output
$ gprof /path/to/gmx | head
Flat profile:
Each sample counts as 0.01 seconds.
no time accumulated
% cumulative self self total
time seconds seconds calls Ts/call Ts/call name
0.00 0.00 0.00 8191 0.00 0.00 gmx::erfinv(double)
0.00 0.00 0.00 1763 0.00 0.00 gmx::SelectionParserSymbolIterator::~SelectionParserSymbolIterator()
0.00 0.00 0.00 861 0.00 0.00 gmx::SelectionParserSymbolIterator::SelectionParserSymbolIterator(gmx::SelectionParserSymbolIterator const&)
No modifications to CMake were required. I tried with GROMACS 2023-rc1, but I don’t think 2020 would be any different.
Few notes:
As noted in our manual, gprof is not the best tool if you want to profile mdrun. For once, it is not that great at profiling multi-threaded applications. You can build without OpenMP and run with -ntmpi 1, but the results would have limited practical relevance. For other GROMACS tools, such as grompp, it is less of a problem.
Compiling with -DGMX_SIMD=None is not recommended for mdrun performance. Especially for non-GPU builds. You’re deliberately disabling optimizations of the most “hot” part of the code, and, unlike parallelism, it’s highly unlikely to mess with the profiler.
Regarding perf: for me, flamegraph-rs, which is perf-based, works fine even with shared-library build and Release build mode (RelWithDebInfo might produce better traces, though). If you’re using OpenMP and/or threadMPI for parallelism, the results are messy, but GROMACS functions are there.