GROMACS version: 2020.2
GROMACS modification: No
Hi all,
I’m running a set of GPU-accelerated GPCR simulations in a POPC:SDPC:CHL1 bilayer with three different ligands bound. I’ve run 2/3 ligands in duplicate for 600 ns so far, but I’m running into issues with the 3rd system:
The settings are identical to the other two ligands, and the system was generated in the same way (using CHARMM-GUI membrane builder) as earlier. But ~200 ns into the simulation it crashed, so I used the checkpoint file to resume the simulation, it ran okay for a while but crashed again. I repeated this a few times and left the lab, but when I came back I noticed the computer became unresponsive and didn’t output any video to the monitors. The first time, I did a hard reset and noticed only one of my two monitors was working. In graphics settings (Ubuntu 20.04 LTS) the one monitor that was working was no longer recognised. nvidia-smi also gave me an error message (I forgot what it was). So I apt-get purged the nvidia drivers and autoinstalled the newest version and this fixed the monitor and nvidia-smi issues. When I checked the simulation it crashed at ~320 ns with the same error as before:
The Z-size of the box (9.272714) times the triclinic skew factor (1.000000) is smaller than the number of DD cells (7) times the smallest allowed cell size (1.326000)
I also noticed that in the last frame before the crash, gmx -v said vol was 1.00!
I thought maybe the system wasn’t equilibrated enough, so I dumped a frame from a few ns into the production run, and re-minimised and equilibrated that, starting a new run. This time it ran fine up until ~150 ns I found out, after rebooting since there was no video again. I didn’t need to reinstall the drivers this time though, they were still working. I checked the volume from the EDR file and there was a noticeable sharp dip just before the simulation crashed. Hoping it was just a one-off, I resumed the simulation, but once again my colleague informs me it has become unresponsive. I haven’t been back in the lab to check where it crashed, but regardless, I don’t think finishing the 600 ns run in these circumstances is realistic.
Does anyone know why a simulation would suddenly blow up so far into the run, on a system pretty much identical to the others, which ran fine?
I’m considering increasing the pressure coupling time constant, but would that affect the dynamics to the extent that it makes comparisons to the previous runs invalid? I compared the initial cell size to the other runs and there wasn’t much difference. The only irregularity I noticed was a paucity of CHL1 in a particular part of the membrane. Perhaps rebuilding the system would fix the issue?