I’m trying to compile gromacs 2020.4 with GPU-support on a cluster with GPU- and CPU-only nodes. However, when trying to compile on the GPU-nodes, I fail due to missing libraries that I don’t have permission to install.
So I tried on the non-GPU nodes where the libraries are available but cmake will not detect the presence of GPUs during compilation and hence my compiled program (even though it passes all checks) has no GPU-support as can be seen from the md.log from a testrun. Furthermore, this test run does not proceed without GPU-support but is rather stuck at the first step (md.log remains at step 0).
My question now is if there is a way to compile gromacs with support for the GPUs without the cards actually being present on the node I am compiling on? If so, does anyone spot what I am doing wrong with my settings?
Compiling a GPU-enabled build does not require GPU hardware, so I suspect there are some software configuration issues here.
If library dependencies are not found at build-time, unless these are optional and you make sure to no link against those, you won’t be able to run binaries built elsewhere that link against these libraries.
Your GROMACS version header shows GPU support: CUDA which means that it is build with CUDA support. Further below, the CUDA runtime: N/A indicates that there is no (compatible) CUDA runtime available and that is why you do not get GPUs detected. Furthermore the “(GPU detection deactivated)” message suggests that some error condition e.g. incompatible CUDA runtime disabled the GPU detection. Have you made sure that the CUDA runtime is available and functional on the host where you are trying to run?
first of all thanks for making clear that I don’t require compilation on
the compute-nodes! So now I can got on looking for the error elsewhere.
I learned that on our system not the complete libraries are missing on
the compute nodes but only the headers to save space. So linking against
these libraries should not be a problem, only the compilation must be
done on the login-nodes where the headers are available.
It is more likely that an incompatible runtime is used. What does not make sense is that already your initial report shows CUDA driver: 10.10 and a runtime 10.0 suggested by the use of Cuda compilation tools, release 10.0, V10.0.130. These should be compatible, not sure why they are not. The only thing I can think of is, try to use CUDA 10.1 (or an earlier version like 9.2).
Side-note: the GROMACS build system will by default link statically against the CUDA runtime (see the value of the CUDA_cudart_static_LIBRARY cmake cache variable in CMakeCache.txt) and therefore a missing library can not be the issue – and in any case that would prevent even launching gmx.