GROMACS version: 2020.1
GROMACS modification: No
Hello everyone! I’m facing a serious and intermittent problem during computationally intensive simulations on my Linux system (latest version), and I’ve exhausted the basic solutions. Below I detail what happens and my PC configuration.
During heavy simulation runs, the system starts working normally, but completely crashes in the middle of the process, losing the video signal (DisplayPort disconnects) and requiring a forced reboot (manual power off).
Sometimes the simulation finishes correctly.
-
Other times, the system freezes completely, and the case fans always run at full speed (turbo mode).
-
I can’t switch to TTY (Ctrl+Alt+F3 doesn’t respond).
-
When frozen, the monitor shows “no signal” and goes into standby mode.
-
This behavior does not occur during normal tasks such as browsing, watching videos, or light work.
PC Configuration
- CPU: Intel Core i9-14900F
- GPU: Gigabyte GeForce RTX 5070 Ti Aorus Master (16 GB GDDR7)
- Motherboard: Asus Prime Z790-A WIFI (DDR5, LGA1700)
- RAM: 2x32 GB Kingston Fury Beast DDR5 5200 MHz
- SSD: Lexar NM790 2 TB NVMe
- PSU: Corsair RM1200x 1200W 80 Plus Gold
- Case: Redragon Wideload Pro
- Fans: 6x Aigo Darkflash CL12 Rainbow
- Monitor: Pichau Nexus Wide S34 (Ultrawide, WQHD, 165Hz) via DisplayPort
- NVIDIA Driver: Proprietary, version 570.144
- CUDA: 12.8
- Command nvidia-smi shows everything normal, no errors, GPU recognized.
- Linux version: 24.04
I would appreciate any help or suggestions you can offer — any guidance would be greatly appreciated!
Hi!
It is tricky, the series 5 is still not super stable under Linux, so it could be a driver issue or a question of doing a RMA of the card. I had similar issues years ago with a 3090 that turned out to be faulty, however I have a 5090 around that is totally fine and Ubuntu just does not agree with it and the current drivers :/ What is the card load? if it happens after a while, randomly, under heavy load, smells like hardware.
Cheers,
Thank you for your reply, my friend. As I said, everything seems to be working normally. Gromacs even starts the simulation, but after a while, the GPU dynamics and fans freeze and stop, and go into turbo mode.
I will share here some screenshots of commands and diagnostics that I ran during a simple peptide simulation in water lasting 500 ns, before a crash occurs.
Your GPU load seems to be about 31%… which should not be a problem at all for the card (you are only drawing 100W of power). The CPU is not overloaded either and all the temperatures look all right. I was going to say that it could also be a PSU issue but, with that total system load, maybe that is not the case. To troubleshoot this you can try stressing the GPU and see if the computer is stable or not. A good software for this is FurMark, that is available on Linux.
Let it run for a few hours and see what happens. If the computer crashes while running, you can replace the GPU, use another that draws, roughly, the same power (and has stable Linux drivers!) and repeat the test. If everything is stable with this second GPU it is probably the card, either at hardware level or due to the drivers, otherwise it may be a PSU issue. If you find that the new card is problematic, I would recommend testing it with Furmark on Windows using Studio drivers. If it crashes you will probably need to RMA it.
It is a bit annoying to do all this testing but possibly necessary in your case :/. As I was saying, I encountered myself problems with Linux and the new series 5, being my 5090 totally not recognized by NVIDIA drivers (I have a 4060 in the same machine that works perfectly fine, though). However, the card is absolutely fine in Windows. I can live with this because I like to use the new generations to play games and the previous ones for more “serious” hobbyist computation, simply due to the maturity of the older platforms under Linux, but definitely it is not ideal, no.
Cheers and good luck!
1 Like