Hello, I’ve encountered the same problem as you. When I was submitting calculations on my Ubuntu linux system, I realized that my gromacs were very fast before, then I accidentally upgraded the kernel by updating the CUDA and drivers, and now after downgrading the kernel (5.15.0-71-generic), I found out that the gromacs have gone from 130ns/day to 3ns/day. (Normally, I was using CPU is 2400% and now the CPU usage is only 1200%. Moreover, my GPU usage is below 10%). Here is the configuration and kernel information of the server.
The CPU model of the server is: 13th Gen Intel(R) Core™ i7-13700KF, the GPU model is: 01:00.0 VGA compatible controller: NVIDIA Corporation Device 2782 (rev a1), and the driver information of the graphics card is as follows:
I would appreciate your help in checking this out and if you need any more information, please do not hesitate to contact me and I look forward to hearing from you.
Additionally, I added the npt log file for not updating CUDA and its drivers (npt_win0_conf0.log) and the npt log file for updating CUDA and its drivers (npt_win26_conf455.log). npt_win0_conf0.log (59.8 KB) npt_win26_conf455.log (61.7 KB)
Looking forward to your reply!
Do you have any other compute-intensive workload running on your machine?
The following line indicates that the CPU performance is the bottleneck:
Force 1 24 50001 1904.652 156221.791 81.9
And your observation that “Normally, I was using CPU is 2400% and now the CPU usage is only 1200%” (while still running 24 threads) suggests that somehow all 24 threads are put onto 12 (logical) cores now.
Since you ask GROMACS to use all logical cores (-nt 24), it expects to have them fully available. If you have another application running, you should tell GROMACS to use only some of the cores. Since your CPU has both P- and E-cores, I’d suggest -ntmpi 1 -ntomp 8 -pin on to limit GROMACS to using only P-cores and with one thread per physical core.
For the same system (with 381731 atoms), the simulation time changed from 3.714 ns/day to 19.936 ns/day, and the CPU utilization was found to be 600%, and the GPU utilization was lower, about 20-40%.
The CPU utilization was found to be 960% and the screen appeared to have a lot of calculations about PME. The GPU utilization was also very low at about 30%. The simulation took 8.925 ns/day.
In addition, I used the (kill -STOP PID) to suspend the application that was calculating in the background before the gromacs calculations were submitted. Thus, all of the above studies are based on the case where no other application is computing in the background.
The log files of the above 2 NPTs are saved to the attachment. How should I speed up the calculation of groamcs to improve GPU and CPU utilization?
In the top screenshot, you have gmx using 600% of the CPU, but the “load average” is around 30 (the three numbers correspond to the last 1, 5 and 15 minutes), so there’s around 30 active threads competing for the 23 (logical) cores, so there are some other processes using the CPU.
A large “Rest” time (from your log) and the observation that a lower value of ntomp seems to give better performance also suggest that something else on your system is heavily using the CPU.
That’s just status reports from PME autotuning, no need to worry.
Offloading bonded to the GPU (-bonded gpu) could help move more load from CPU to GPU. You can also increase neighbor search interval (e.g., -nstlist 200) while keeping -update gpu, this will reduce “Neighbor search” time.
If you do all this, there will be very little CPU work, so you can reduce -ntomp to 4 or 6, which will reduce interference with whatever else is using the CPU.
Ideally, you could pin each load to a separate set of CPU cores (in GROMACS, you can use -pin, -pinoffset, -pinstride options; for other applications, you can use, for example, the taskset utility).
Dear al42and!
Thank you very much for your patience in responding. After restarting my server, I typed in the command htop and found that all CPUs were being utilized, and finally realized that my server had a virus. The problem caused by the virus has now been resolved. Currently, the system of 16 W atoms is simulated using gromacs 2023.5 (GPU-4070Ti) with a time scale of 150 ns/day. Thank you for your help and being able to get my problem solved in a timely manner.