Does openMP respect LD_PRELOAD?

GROMACS version:2023.1
GROMACS modification: No

I’m currently profiling gromacs-2023.1 with CUDA acceleration. I’m trying to trace UVM page faults using NVIDIA nsight systems with a custom cudaMalloc shim library. It seems however, that GMX doesn’t interact with the CUDA API itself, and instead openMP forks threads which make the CUDA library calls. If that is truly the case, does openMP respect the LD_PRELOAD export for my shim library? Currently it doesn’t seem like the forked threads are utilizing my shim library. I’ve verified my cudaMalloc shim library works outside of gromacs.

Hi,

Firstly, I think most if not all CUDA API calls are made from the main thread outside of any OpenMP region. Secondly, OpenMP threads are also application threads that belong to the GMX process so naturally any preloading passed in an environment variable like LD_PRELOAD will apply to the whole binary and all its threads, including those launched by the OpenMP runtime.

As a sanity check you could compile GROMACS with OpenMP support disabled to see if there is some unexpected interference between your shim lib and OpenMP.

Cheers,
Szilárd