GROMACS version: 2025.2 (build gromacs-2025.2-mpi_openmpi_cuda_h550cf53_1.conda)
GROMACS modification: No
I’m using Snakemake to parallelize simulations on my institution’s GPU cluster. This required a few redirection tricks to simulate user input for the commands that expect it, but I was able to get everything working on my local machine and on a GPU node during an interactive session.
When I run my workflow with Snakemake on a login node with remote execution to queue the jobs to SLURM, however, the GROMACS commands mysteriously hang, seemingly doing nothing. This happens early in my workflow during the execution of solvate
, but later commands (like genion
) also hang if I manually create the expected intermediates and allow the workflow to progress.
If I log into the GPU node, the commands have been successfully submitted to the node, but the solvate
command is sleeping (ps
output attached). If I strace
the process, I get the following two lines looping continuously until I kill the process or it times out.
epoll_wait(3, [], 32, 0) = 0
clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=100000}, NULL) = 0
Given that everything works in interactive sessions on the GPU nodes connected to a terminal, I’m wondering if this is related to GROMACS being executed remotely without a controlling terminal, but other than that I’m a loss for what the issue could be and how to resolve it.
Thanks for any help!
Marc
gromacs_hanging_ps.txt (7.9 KB)