GROMACS version:
GROMACS modification: Yes/No
Here post your question
dear gromacs users and developers these are my commands (below) for running simulaion. the simulation is running very slow. can anyone please suggest the reason and possible solution of this problem.
architecture of installaion
gmx_mpi mdrun -version
GROMACS version: 2022.4
Precision: mixed
Memory model: 64 bit
MPI library: MPI (CUDA-aware)
OpenMP support: enabled (GMX_OPENMP_MAX_THREADS = 128)
GPU support: CUDA
SIMD instructions: AVX2_256
CPU FFT library: fftw-3.3.8-sse2-avx-avx2-avx2_128
GPU FFT library: cuFFT
RDTSCP usage: enabled
TNG support: enabled
Hwloc support: disabled
Tracing support: disabled
C compiler: /usr/bin/cc GNU 8.5.0
C compiler flags: -mavx2 -mfma -Wno-missing-field-initializers -fexcess-precision=fast -funroll-all-loops -pthread -O3 -DNDEBUG
C++ compiler: /usr/bin/c++ GNU 8.5.0
C++ compiler flags: -mavx2 -mfma -pthread -Wno-missing-field-initializers -fexcess-precision=fast -funroll-all-loops -pthread -fopenmp -O3 -DNDEBUG
CUDA compiler: /usr/local/cuda-11.6/bin/nvcc nvcc: NVIDIA (R) Cuda compiler driver;Copyright (c) 2005-2022 NVIDIA Corporation;Built on Tue_Mar__8_18:18:20_PST_2022;Cuda compilation tools, release 11.6, V11.6.124;Build cuda_11.6.r11.6/compiler.31057947_0
CUDA compiler flags:-std=c++17;–generate-code=arch=compute_35,code=sm_35;–generate-code=arch=compute_37,code=sm_37;–generate-code=arch=compute_50,code=sm_50;–generate-code=arch=compute_52,code=sm_52;–generate-code=arch=compute_60,code=sm_60;–generate-code=arch=compute_61,code=sm_61;–generate-code=arch=compute_70,code=sm_70;–generate-code=arch=compute_75,code=sm_75;–generate-code=arch=compute_80,code=sm_80;–generate-code=arch=compute_86,code=sm_86;-Wno-deprecated-gpu-targets;–generate-code=arch=compute_53,code=sm_53;–generate-code=arch=compute_80,code=sm_80;-use_fast_math;-D_FORCE_INLINES;-mavx2 -mfma -pthread -Wno-missing-field-initializers -fexcess-precision=fast -funroll-all-loops -pthread -fopenmp -O3 -DNDEBUG
CUDA driver: 11.60
CUDA runtime: 11.60
commands
gmx_mpi -quiet grompp -f md.mdp -c npt_confout.gro -n index.ndx -p npt_processed.top -t npt_state.cpt -po md_mdout.mdp -pp md_processed.top -o md.tpr -maxwarn 3
gmx_mpi -quiet mdrun -s md.tpr -mp md_processed.top -mn index.ndx -o md_traj.trr -x md_traj_comp.xtc -cpo md_state.cpt -c md_confout.gro -e md_ener.edr -g md_md.log -xvg xmgrace -nb gpu &>> all_output.txt
gmx_mpi -quiet check -f md_traj_comp.xtc -s1 md.tpr -c md_confout.gro -e md_ener.edr -n index.ndx -m doc.tex &>> check.txt
output
1 GPU selected for this run.
Mapping of GPU IDs to the 2 GPU tasks in the 1 rank on this node:
PP:0,PME:0
PP tasks will do (non-perturbed) short-ranged interactions on the GPU
PP task will update and constrain coordinates on the CPU
PME tasks will do all aspects on the GPU
Using 1 MPI process
Non-default thread affinity set, disabling internal thread affinity
Using 40 OpenMP threads
starting mdrun ‘Generic title’
5000000 steps, 10000.0 ps.
Writing final coordinates.
Core t (s) Wall t (s) (%)
Time: 2112165.704 52804.182 4000.0
14h40:04
(ns/day) (hour/ns)
Performance: 16.362 1.467