Lincs error on newer but not older machines after installing gmx2020.3

GROMACS 2030.3
GROMACS modification: none
Dear Users,

I am running two machines both with gmx 2020.3 , Linux Mint 19.x The ‘slower machine (A) is an AMD Ryzen 7, 12 core , 2x1080Ti and the other (B) an AMD Threadripper 32 core and 2x2080Ti both with 32 gig .Gromacs and Nvidia toolbox 11 were recently a compiled with the latest g++, gcc on both machines.

I copied all the files ( mpd, restraints, itp top etc ) of a model of 200 nylon molecules 650 atoms each from A to B . The itp was created with gmx x2top and charges were modified slightly to bring total charge for all 200 molecules < 0.01 A surface was created in packmol ` 300 X 700 X 50 A which condenses down to 300 x39,3.4 on system A. with x constraints on the end atoms.

The model runs nicely on A, but not on B. I cannot get the model to even minimize - well it converges to > +10e8. If I use 1-5 molecules minimization reaches ~ -1. In contrast ,on A, minimization = ~ - 6e5 for all 200 molecules.

A bond rotation error appears in B under all conditions in nvt, and I’ve tried to use shake as opposed to Lincs to get to an equilibrium situation. On A the gmx selects ntomp =4 and ntmpi =4 ( other settings work as well ) on B I’ve adjusted these parameters to as low as 1 each and have tried as few as 10 molecules during nvt but the links, and many #steps appear . The errors appear with and without constraints which to apparently cross domain boundaries. I have tried adjusting cutoffs and grid spacings.

It turns out that none of the models - so far tried -that were made and ran on gromacs 2020.1 on system B as recently as a few weeks ago, now run Other cuda based programs ( Maestro, vmd ) do run normally. I’ve several hundred models that were created and ran on B, so I do not believe its the hardware.

This information is probably insufficient to provide solid suggestions ( please do if you know what is going on ) other than perhaps to comment on ‘typical’ differences between the older A and the middle age B, but may be sufficient to ask me for the proper info.

Regards,
Paul Buscemi
UMN, BICB

OK, my half-bad. gmx 2020.3 is on both machines but gmx on machine A compiled with g++ 7.4 and cuda 10.1, while (B) - I thought - was compiled with g++ 8.4 and Cuda 11. but it would not have compiled at all. I had assumed that gmx would support cuda 11 by now, but apparently not. I must have been running the prior version of gromacs compiled with 10.1 but trying to run it with cuda 11 which did install - hence nothing worked. btw…I do not think Cuda 10.2 will compile with gcc,g++ 8.4

After many tries, I 've not been able to downgrade to cuda 10. Using the commands in Nvidia legacy 10.2 all seems to go well but the system reverts to 11.0 even after purging nvidia and cuda.

Can anyone tell me if and under what conditions beta3 will work with cuda11. I tried using gcc,g++ 8.4 to compile both cuda 11 and beta3 but segmentation/links errors still appear. The compilation itself seems to proceed normally which was not the case with 2020.3 and cuda 11
Regards
Paul

I was able to track down the error to gmxManageNvccConfig.cmake. There, the RTX 2080 Ti with sm 75 was not referenced, but the system would compile with opencl included in the build. When opencl was not included, then an incorrect architecture error appeared Unsupported gpu architecture 'compute_30'
by adding:
If(CUDA_VERSION VERSION_EQUAL “11.1")
list (APPEND GMX_CUDA_NVCC_GENCODE_FLAGS"-gencode;arch=compute_75,code=compute_75")
endif()
This forced the use of sm_75 for cuda and all errors were removed.

This is a know issue and has been fixed for 2020.4.

I think you should be able to work around this by adding GMX_CUDA_TARGET_SM=75 to you cmake command line.

Thank you for your response.
That is basically what I have done by going to the gmaManagenvcc file and telling it to look for SM_75.
Your way is easier !
Paul

As Berk noted, you do not need to edit source files.

However, to your original issue, did figure out whether the simulation instability was in your setup?

Thank you for the follow-up.

I did not like going to the source code either, but that is where the search and fix mission led me. If I had known how to change the input in the build, I would have done so.

On the instability. The whole issue started with a hopeful update of linux, cuda, compilers and gmx using updated depositories and run files that I had used several times in the past. That’s when problems started and I first reported to the forum

On the most recent build of 2020.3 I used the Nvidia deb for cuda toolbox, no added drivers or alternative g compilers ( used those installed in the toolbox -9 in the clean build) and the latest cmake/make
with:

cmake … -DGMX_BUILD_OWN_FFTW=ON -DREGRESSIONTEST_DOWNLOAD=ON -DGMX_GPU=on -DCMAKE_CXX_COMPILER=/usr/bin/g+±9 -DCMAKE_C_COMPILER=/usr/bin/gcc-9 -DGMX_USE_OPENCL=on . The computer is a 64 core threadripper with 32G ram

Installation went smoothly

Unfortunately no, Llncs warnings are copious on models that have I have used on this computer running 2019.x

Using identical models - other than ntomp and ntmpi during the run , two other computers have no difficulty. Those computers are Ryzen 2700 and Intel I4, running 2020.3 but compiled with gxx 7.4 . If you think it worth a shot, I will compile with g-7 instead of g-9

or perhaps you could point me to a known working model ( with npt’s ) that I could use as a control to determine if the problem is in my build or in the gpu configuration.

Regards,
Paul

In the current build an example of the errors is;


Step 5, time 0.005 (ps) LINCS WARNING
relative constraint deviation after LINCS:
rms 27578.398438, max 1943094.500000 (between atoms 3741 and 3740)
bonds that rotated more than 30 degrees:
atom 1 atom 2 angle previous, current, constraint length
1868 1867 43.4 0.2026 0.1810 0.1530
5549 5548 45.0 0.0972 0.0976 0.0972

3741 3740 30.9 2.8485 211797.4062 0.1090
5961 5960 46.1 0.1037 0.1140 0.1090
5964 5960 30.6 0.1521 0.1458 0.1530
5963 5962 90.0 0.0966 0.1274 0.0972
3743 3742 82.5 0.2249 10.1204 0.0972

Back Off! I just backed up step5b_n3.pdb to ./#step5b_n3.pdb.1#

Back Off! I just backed up step5c_n3.pdb to ./#step5c_n3.pdb.1#
Wrote pdb files with previous and current coordinates

“DGMX_CUDA…” not “GMX_CUDA… “ yes ?

Would you please tell me what distribution of linux is recommended ? Because gmx 2019.x worked on the Threadripper with linux mint 19.2 and at the same time gmx 2020.3 works on two other machines running mint 19.2, it’s beginning to appear that the issue is with linux mint 20 ( or ubuntu 20.04 )

regards

Paul

Yes, you pass values to cmake (that is set cache entries) using the -D VARIABLE=VALUE or -DVARIABLE=VALUE syntax.

I doubt the Ubuntu OS itself is the issue – we use that regularly on both desktop and server. It could be that the issue is caused by a combination of the CPU or GPU compiler/toolkits. You mentioned that on the other machines you use gcc 7.4. I suggest start by eliminating the differences between the working and non-working cases, i.e. use the same gcc version, CUDA version and only after that look further.

Also, have you ran the make check before installation on the machine where you observed issues?

Thank you for the response,
Today I did ( ugh ) wiped the disk, installed mint 19.3, installed cuda 11.1 from the local deb, gcc,g++ 7.4 from the toolbox for gmx. so all items and methods were the same as the other two machines except cmake 3.1 from the ppa repository not the latest 3.9 from the tar, I tried to keep it as simple as possible. I compiled with and without opencl, The same errors occurred. Now minimization runs but with energies of +e6 with a simple model ( will try no cuda tomorrow ). What are you referring to when you mentioned CPU compiler ?

The make check in the first half/section ran with no errors but the second half did show - as I recall- 17/46 fall/pass It is far too large to post, but perhaps you could suggest a next move.

btw - for hopefully the not too distant future - must gromacs be compiled with MPI for nvlink to function ?

Regards
pb

Even though 2020.3 works on a Ryzen 2700 8 core, and two other intel i7’s it was not happy with the threadripper 32core. Moving to Gromacs 2020.4 seems to have corrected the issue. Conditions below

With this out of the way. I’d like to move on to using nvlink. Would a proper cmake command be:

make … -DGMX_BUILD_OWN_FFTW=ON -DREGRESSIONTEST_DOWNLOAD=ON -DGMX_GPU=on -DGMX_USE_OPENCL=off -DGMX_CUDA_TARGET_SM=75 ```-DGMX_MPI=on` -DGMX_GPU_PME_PP_COMMS=1
-DGMX_FORCE_UPDATE_DEFAULT_GPU=1

I take it that MPI must be set to “on” ?? System is 2x RTX2080ti, 32core AMD TR

Thanks for your comments along this ‘interesting’ journey.

Paul

Unfortunately OpenCL on NVIDIA is broken (most likely due to an NVIDIA compiler bug). That is the reason for the make check errors too, otherwise you should not observe any!
Our documentation does note it but admittedly that may not be easy to find.

Have your previous issues been also observed with OpenCL?

I am referring to the compiler that produces the code which will run on the CPU, that is gcc/g++ in this case (in contrast to the GPU compiler, e.g. nvcc).

There are ongoing efforts to enable direct GPU communication with CUDA-aware MPI, but this feature is not part of any official release yet.

Good to hear that 2020.4 works well.

In case if you want to diagnose this (later?) it would help to be specific about the way 2020.3 was “not happy”. A Ryzen issue ahs been fixed in 2020.4 but that only affects the 3000-series, not the 2000-series ThreadRipper.

Two issues with the above: the first half looks like a cmake invocation (rather than make); the last two options should be environment variables rather than cmake cache variables.

No, GPU direct communication is not supported with MPI. If you want to evaluate the experimental direct communication / NVLink support, you should:

  • configure a standard thread-MPI build with CUDA (note, not MPI!)
  • set the GMX_GPU_PME_PP_COMMS and GMX_GPU_DD_COMMS environment variables to enable the direct communication path. As a result, when you run gmx mdrun you should see a message in the log about peer access between GPUs; e.g. for a 4-GPU machine this is how the message will look like:
Note: Peer access enabled between the following GPU pairs in the node:
 0->1 0->2 0->3 1->0 1->2 1->3 2->0 2->1 2->3 3->0 3->1 3->2 

Thank you again. My responses follow your questions . I’ve only two questions at the end re environmental variables.

Have your previous issues been also observed with OpenCL? <=== there were compiling issues when OpenCl was not included int the cmake. notably “OpenCL not found” So I had to explicitly install it even though not summoned.

I am referring to the compiler that produces the code which will run on the CPU, that is gcc/g++ in this case (in contrast to the GPU compiler, e.g. nvcc). <== Oh, of course.

Even though 2020.3 works on a Ryzen 2700 8 core, and two other intel i7’s it was not happy with the threadripper 32core. Moving to Gromacs 2020.4 seems to have corrected the issue. Conditions below …. Good to hear that 2020.4 works well.

In case if you want to diagnose this (later?) it would help to be specific about the way 2020.3 was “not happy”. <=== An apology, I was much too cute. The errors were never consistent but included : angle torsion too high, cutoff too long, core dumps, errors that appeared to caused by non-eqilibrium or molecules moving too far too fast and a poorly equilibrated system may have been a large part of the issue. With 2020.4 minimization with good negative pot’l occurred before I could pick up my coffee. The TR is my workhorse and I am thrilled to have it working. I probably will not (purposely) break it again. any time soon

A Ryzen issue has been fixed in 2020.4 but that only affects the 3000-series, not the 2000-series ThreadRipper. <.===== The TR is a 2990 2nd gen, but it is curious that after at least a half-dozen installations with different downloads of the gmx.tar that 2020.3 never worked properly in my hands. 2020.4 was a quick install with no errors.

“C"make … -DGMX_BUILD_OWN_FFTW=ON -DREGRESSIONTEST_DOWNLOAD=ON -DGMX_GPU=on -DGMX_USE_OPENCL=off -DGMX_CUDA_TARGET_SM=75 ```-DGMX_MPI=on` -DGMX_GPU_PME_PP_COMMS=1
-DGMX_FORCE_UPDATE_DEFAULT_GPU=1

Two issues with the above: the first half looks like a cmake invocation (rather than make);<=== yes, I cut off the “c” in copying the command

the last two options should be environment variables rather than cmake cache variables. <== the last two were my attempt at invoking nvlink. I did not use these -or MPI -in the current build. Regarding MPI - I was unsure how to interpret the note in performance improvements and use of nvlink “…GROMACS to be built with its internal thread-MPI library rather than an extermal MPI library…” not understanding that an internal thread MPI was automatically included… I suppose -ntomp -ntmpi should have given me a clue.

No, GPU direct communication is not supported with MPI. <=== good, not a fan of MPI not fully understanding the benefit - if any - to or with cuda.

If you want to evaluate the experimental direct communication / NVLink support, you should: <=== Will take a deep breath for this…. but

  • configure a standard thread-MPI build with CUDA (note, not MPI!) <== . no specific setting needed ?
  • set the GMX_GPU_PME_PP_COMMS and GMX_GPU_DD_COMMS environment variables to enable the direct communication path. <=== add these to GMXRC ? Do I need to set these to a value
  • As a result, when you run gmx mdrun you should see a message in the log about peer access between GPUs; e.g. for a 4-GPU machine this is how the message will look like:
Note: Peer access enabled between the following GPU pairs in the node:
 0->1 0->2 0->3 1->0 1->2 1->3 2->0 2->1 2->3 3->0 3->1 3->2 

I’ll let you know how it turns out. I can tell you now that the simple installation of an nvlink does not negatively affect performance
Best,
Paul

A new thread is probably appropriate but to wind this one up:

Hardware: Nvidia Titan RTX nvlink HB Bridge 4-slot, 2-x MSI RTX 2080Ti, AMD 2990
Standard build (described in the earlier posts of this thread - no MPI ) and the suggested environmental variables the run recognized the nvlink but failed when DLB was invoked.

Command line: gmx mdrun -deffnm pdms.nvt -nb gpu -pme gpu -ntomp 4 -ntmpi 16 -npme 1 -nsteps 100000
Reading file pdms.nvt.tpr, VERSION 2020.4 (single precision)
Enabling GPU buffer operations required by GMX_GPU_DD_COMMS (equivalent with GMX_USE_GPU_BUFFER_OPS=1).

This run uses the ‘GPU halo exchange’ feature, enabled by the GMX_GPU_DD_COMMS environment variable.

This run uses the ‘GPU PME-PP communications’ feature, enabled by the GMX_GPU_PME_PP_COMMS environment variable.

Overriding nsteps with value passed on the command line: 100000 steps, 100 ps
Changing nstlist from 10 to 100, rlist from 1 to 1
On host TR1 2 GPUs selected for this run.
Mapping of GPU IDs to the 16 GPU tasks in the 16 ranks on this node:
PP:0,PP:0,PP:0,PP:0,PP:0,PP:0,PP:0,PP:0,PP:1,PP:1,PP:1,PP:1,PP:1,PP:1,PP:1,PME:1
PP tasks will do (non-perturbed) short-ranged interactions on the GPU
PP task will update and constrain coordinates on the CPU
PME tasks will do all aspects on the GPU
Using 16 MPI threads
Using 4 OpenMP threads per tMPI thread

NOTE: This run uses the ‘GPU halo exchange’ feature, enabled by the GMX_GPU_DD_COMMS environment variable.
NOTE: DLB will not turn on during the first phase of PME tuning
starting mdrun ‘PDMS’
100000 steps, 100.0 ps.

NOTE: DLB can now turn on, when beneficial

GPU’s stopped , run hung at this point. no mention of GPU halo exchange feature in the log file. Opening a new terminal does not invoke the environmental features - a new run starts and runs normally . I’ll report any changes in a new thread.

pb

You are using experimental GPU code which we know contains some issues. We do not plan to fix these in the 2020 release (we should probably disable them). But the 2021 release will have more solid support for these features.

I was forewarned that the code was experimental and thought it would not hurt too much to try. I wasn’t asking for or anticipating help but simply reporting the results for a particular system. Consider it a mini-beta test. Looking forward to 2021….
pb

Your feedback is appreciated. I just wanted to warn that one (unfortunately) can expect issues with experimental features.