MD run using MPI on HPC

GROMACS version:2022.1
GROMACS modification: Yes/No
Good Morning everyone,
I am using Gromacs installed on our institute’s HPC

:-) GROMACS - gmx_mpi, 2022.1 (-:

When I ran nohup mpirun -hosts r2n1 r2n2 r2n3 r3n1 r3n2 -np 120 gmx_mpi mdrun -v -deffnm md_100
where r2n1, r2n2… are the names of nodes used for MPI run.

My run got terminated immediately and I got the following error:
[proxy:0:0@r2n1] HYDU_create_process (utils/launch/launch.c:75): execvp error on file r2n2 (No such file or directory)

===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= PID 148862 RUNNING AT r2n1
= EXIT CODE: 255
= CLEANING UP REMAINING PROCESSES
= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES

thanks for your time and guidance
looking forward to your suggestions

Perhaps you need to give the full path to the GROMACS executable? Regardless, I would try first running a simple program like /bin/hostname to first troubleshoot issues with the MPI and queue system’s setup.

Giacomo

respected @giacomo.fiorin
Thanks for replying and suggesting the fix, I tried adding the path but now it runs on 2 nodes (atleast) , when i try to use all 10 nodes
plus I ran following (in bold) command for all 10 nodes available to me on our institute HPC

(base) [neeraj@master 6cu_pro_MD]$ **mpirun -hosts </etc/hosts> r1n1 r1n2 r1n3 r1n4 r2n1 r2n2 r2n3 r2n4 r3n1 r3n2 -np 240 gmx_mpi mdrun -v -deffnm npt**
[proxy:0:0@r1n2] HYDU_create_process (utils/launch/launch.c:75): execvp error on file **r1n3 (No such file** or directory)

(base) [neeraj@master 6cu_pro_MD]$ mpirun -hosts </etc/hosts> r1n1 r1n2 r1n4 r2n1 r2n2 r2n3 r2n4 r3n1 r3n2 -np 240 gmx_mpi mdrun -v -deffnm npt

And if remove the node(like r1n3) which is not found then following happens

[proxy:0:0@r1n2] HYDU_create_process (utils/launch/launch.c:75): execvp error on file r1n4 (No such file or directory)
(base) [neeraj@master 6cu_pro_MD]$ mpirun -hosts </etc/hosts> r1n1 r1n2 r2n1 r2n2 r2n3 r2n4 r3n1 r3n2 -np 240 gmx_mpi mdrun -v -deffnm npt
[proxy:0:0@r1n2] HYDU_create_process (utils/launch/launch.c:75): execvp error on file r2n1 (No such file or directory)

So basically only first 2 nodes are taken/used in MPI and the 3rd consecutive node is not found eventhough path is given
Sir, please suggest something, What to do?

Sorry, but I was simply suggesting a way to identify potential issues with MPI and queuing system. Nobody is in a position to suggest a fix yet.

My best recommendation is to make sure that you are launching a MPI program correctly for your cluster. This is something that only the people who maintain it can know for sure.

Giacomo

I agree with Giacomo. You first need to check that your syntax for running the commands on your system is correct. Your local system administrators are probably better at that than we are. But … unless you have a very customized mpi version I would also try, e.g.:
mpirun --host r1n1,r1n2,r2n1,r2n2,r2n3,r2n4,r3n1,r3n2 -np 240 gmx_mpi mdrun -v -deffnm npt