Turning off DD

GROMACS version: 5.0
GROMACS modification: Yes
Hello everyone,

I have been trying to test our algorithm with Gromacs. Basically I need to modify the atomic forces according to the atomic coordinates. Therefore I pass the cvec ptr f to a customized object and modify the force for every atom. However, in the next iteration, the program is stuck at the repartition of domain decomposition. Seems like some memory access out of range issue appears there. I carefully checked all the matrix and vectors I use and could not find a problem.

So I hope to know if I can just turn off the DD algorithm. Although -pd(particle decomposition) flag once exists but I don’t think it’s still there in version 5.0. Or, if someone knows how to modify the atomic forces properly, could he or she point out how to achieve that? Thank you and hope you have a nice Sunday.

In the 5.0 release DD is only used when using multiple MPI ranks. So you can only use a single MPI rank.

Hi hess,

Thank you for your reply. I will try to see if this helps. Unfortunately I am developing something to get better MD performance. Therefore I believe later I still need to figure out what’s wrong with the DD function even if running with 1 MPI rank works for me.

You can access the global atom coordinates for local atoms. If that is sufficient for computing your forces, the solution is simple.

Thank you for your help.

I calculate the customized forces in serial. Therefore I collect x vec (the coordinates, this algorithm needs all the coordinates of non-solvent atoms) first and use them to compute the updated atomic forces. In this process, rvec* f is passed into an customized object as the force array. Then I directly modify its values by f[atom_index][dim] = f_modified[i][dim]. Since I test with a relatively small protein, I print a sentence once this is done. The result is I have 3xNatoms outputs, which proves that there’s no mismatch here.

Initially f is a null pointer, and if DD is invoked it remains null. Then its address is then passed to the function dd_partition_system() and there’s a comment saying that When f!=NULL, *f will be reallocated to the size of state_local.

I have two questions about this process:

  1. When dd_partition_system() is called for the first time (f is still NULL), what size will it be allocated? I still cannot figure this out by reading the code of dd_partition_system().
  2. My program cannot pass the repartition of DD when f is not NULL. I got ‘memory access out of range’ issue in the MD iteration subsequent to the force modification. bMasterState here is by default FALSE, will f still be reallocated? Should I set bMasterState to TRUE and collect state before the repartition?

Thank you for your patience!

With DD there is no single force vector is not available for the whole system. Each domain has a local force vector (with a local and non-local part). The size of this vector changes at each repartitioning and so does the set of local atoms.

Note that performance will be bad when you collect all coordinates to a single rank. It might not be worth implementing the MPI parallel case. You can always run over many cores in one node using OpenMP.

Thank you so much for your explanation. It really helps me understand the data structure in Gromacs.

We actually do serial computation to calculated the forces we want. And the code collecting all the coordinates is written by some other developer. So far I would like to keep it unchanged and maybe in the future I can get his permission to improve this part.

The workflow of my program is the following:

  1. Use MASTER(cr) to determine if it is at the 0th rank;
  2. If so, initialize an object with state_global, top_global and f input (They are pointers);
  3. Copy them to PETSc Vec;
  4. Enter the MD loop;
  5. At the end of each MD iteration, collect coordinate vectors, then if(MASTER(cr)), calculate the modified forces and override the original atomic forces with the modified forces;
  6. Distribute coordinates back;
  7. Enter the next MD iteration

You mentioned that each rank manages a portion of atoms, and therefore owns a force vector of these atoms. This is confusing for me since at step 2 and 5, I can reach the atomic force of all the non-solvent atoms by using top_global with gmx_mtop_atomloop_all_init/next without an out of range or mismatch issue. Does this mean f is continuous in memory but it is divided into segments belonging to different ranks?

Although each rank has its own f, I don’t see the reason that after my modification, an out of range problem is invoked in the repartition of DD because I don’t redistribute f among ranks. But I noticed something weird here. I tried to print f in different locations (print the address it points to), it remained the same after initialized by dd_partition_system() and in the constructor of my customized object. However, after I override the original atomic forces, the address is changed. I only use the mater rank to print. I will look into to see if this is the reason not to pass the repartition.

Whether you have access to a buffer says nothing about whether the data in there is valid or used. Often (or always?) there is a global force buffer available on the master rank for writing out the forces. But the forces in there are only valid when trajectory writing is called with the force flag on. Those forces are never used for integration.

But the more fundamental question is how expensive your additional force calculation is. If that takes more time than the force calculation in GROMACS, there is no point at running GROMACS with MPI and DD,

Hi hess,

Thank you so much for your patience.

The original version of our method is faster than conventional MD. What I am developing is to make it even faster. Since the performance is important for us to evaluate the new algorithm, I am afraid that we have to keep using MPI and DD.

I also tried to use that f_global with -nstfout != 0 but it brings some other problems (seems like a deadlock). Anyway, could you tell me if this f_global is synchronized with the f vectors from different ranks? Thank you again.

As I said in my previous reply, f_global is only used for output and for nothing else.

Collecting all coordinates to one rank and then distributing the forces quickly gets prohibitively expensive. The cheapest way to do this is to let all ranks send their list of local coordinates to the master rank. But to do that without reordering requires your method to update the order of the atoms every DD step.

PS: Why are you working with a 7 year old version of GROMACS?

Thank you for your explanation.

This is because I am adding some features to a platform which is basically a secondary development of Gromacs. Most of the codes were written in or before 2016. To avoid invoking more problems I tend to not modify the existing workflow logic.

I actually found this problem this morning. It is really similar to my problem. The workflow is also similar. The only difference is that I don’t modify the coordinates.
https://mailman-1.sys.kth.se/pipermail/gromacs.org_gmx-developers/2011-April/005141.html

In the thread, he also uses dd_collect_vec() that sends the local data to the master rank with dd_collect_vec_sendrecv()(I generally only use one node).

As for the order of the atoms, every time customized forces are calculated in a MD iteration, there’s a process to synchronize the coordinates between MD and the customized object and gmx_mtop_atomloop_all_init/next is called for scanning and selecting. Could you tell me if this matches what you meant by ‘update the order of the atoms’ in the last reply?

dd_collect_vec() gets the atoms in the order of the global topology. This is a bit more expensive because of the cost of reordering.

Thank you for being patient!

Could you specifically explain where this reordering happens? In the dd partition?

In the previous implementation, our program read and write coordinates by iterating all the atoms according to top_global. But in my project I just need to read the coordinates and only write the forces. When you say reordering, do you mean reorder the atoms according to the order in top_global?

Yes. The dd_collect_vec does the reordering so the result is according to global topology order. How you access the elements of the force vector is completely up to you.

Thank you so much!

Now I have a better understanding about the programing logic of Gromacs. I am going to see if the method mentioned in that thread solves my problem.

After testing our algorithm I may update the code to patch the latest version of Gromacs.