Calling "do_force" twice in one step (plumed hrex) computes wrong energy (LJ-SR and Coul-SR)

GROMACS version: 2018.8
GROMACS modification: Yes, PLUMED 2.6.0 patch

I’ve found that when using PLUMED HREX in gromacs 2018.8 (which supports -rerun option for perturbed mass), sometimes LJ-SR and Coul-SR are computed wrongly when do_force() is called at the second time.

In patched md.cpp, the first time, when replica exchange in HREX, reads

if (plumedswitch && bHREX) {
      gmx_enerdata_t *hrex_enerd;
      snew(hrex_enerd, 1);
      init_enerdata(top_global->groups.grps[egcENER].nr,ir->fepvals->n_lambda,hrex_enerd);
      ......
      do_force(fplog, cr, ir, step, nrnb, wcycle, top, groups,
               state->box, state->x, &state->hist,
               f, force_vir, mdatoms, hrex_enerd, fcd,
               state->lambda, graph,
               fr, vsite, mu_tot, t, ed, bBornRadii,
               GMX_FORCE_STATECHANGED |
               GMX_FORCE_DYNAMICBOX |
               GMX_FORCE_ALLFORCES |
               GMX_FORCE_VIRIAL |
               GMX_FORCE_ENERGY |
               GMX_FORCE_DHDL |
               GMX_FORCE_NS,
               ddOpenBalanceRegion, ddCloseBalanceRegion);

      fprintf(fplog, "\nLocalUSwap: %10.3e\n", hrex_enerd->term[F_EPOT]);
      plumed_cmd(plumedmain,"GREX cacheLocalUSwap",&hrex_enerd->term[F_EPOT]);
      sfree(hrex_enerd);

the second time reads (where the unpatched code computes energies and forces):

/* PLUMED */
        plumedNeedsEnergy=0;
        if(plumedswitch){
          int pversion=0;
          plumed_cmd(plumedmain,"getApiVersion",&pversion);
          long int lstep=step; plumed_cmd(plumedmain,"setStepLong",&lstep);
          plumed_cmd(plumedmain,"setPositions",&state->x[0][0]);
          plumed_cmd(plumedmain,"setMasses",&mdatoms->massT[0]);
          plumed_cmd(plumedmain,"setCharges",&mdatoms->chargeA[0]);
          plumed_cmd(plumedmain,"setBox",&state->box[0][0]);
          plumed_cmd(plumedmain,"prepareCalc",NULL);
          plumed_cmd(plumedmain,"setStopFlag",&plumedWantsToStop);
          int checkp=0; if(bCPT) checkp=1;
          if(pversion>3) plumed_cmd(plumedmain,"doCheckPoint",&checkp);
          plumed_cmd(plumedmain,"setForces",&f[0][0]);
          plumed_cmd(plumedmain,"isEnergyNeeded",&plumedNeedsEnergy);
          if(plumedNeedsEnergy) force_flags |= GMX_FORCE_ENERGY | GMX_FORCE_VIRIAL;
          clear_mat(plumed_vir);
          plumed_cmd(plumedmain,"setVirial",&plumed_vir[0][0]);
        }
        /* END PLUMED */
        do_force(fplog, cr, ir, step, nrnb, wcycle, top, groups,
                 state->box, state->x, &state->hist,
                 f, force_vir, mdatoms, enerd, fcd,
                 state->lambda, graph,
                 fr, vsite, mu_tot, t, ed, bBornRadii,
                 (bNS ? GMX_FORCE_NS : 0) | force_flags,
                 ddOpenBalanceRegion, ddCloseBalanceRegion);
        /* PLUMED */
        if(plumedswitch){
          if(plumedNeedsEnergy){
            msmul(force_vir,2.0,plumed_vir);
            plumed_cmd(plumedmain,"setEnergy",&enerd->term[F_EPOT]);
            plumed_cmd(plumedmain,"performCalc",NULL);
            msmul(plumed_vir,0.5,force_vir);
          } else {
            msmul(plumed_vir,0.5,plumed_vir);
            m_add(force_vir,plumed_vir,force_vir);
          }
          if(bDoReplEx) plumed_cmd(plumedmain,"GREX savePositions",NULL);
          if(plumedWantsToStop) ir->nsteps=step_rel+1;
          if(bHREX){
            fprintf(fplog, "\nLocalUNow: %10.3e\n", enerd->term[F_EPOT]);
            plumed_cmd(plumedmain,"GREX cacheLocalUNow",&enerd->term[F_EPOT]);
          }
        }
        /* END PLUMED */

Energy calculations are always correct when do_force() is called the first time. But when replica exchanging, the LJ-SR and Coul-SR parts of energy are calculated wrong for all the energy groups, when do_force() is called the second time, for not exchanged topology and coordinates.

However… When I use -rerun, the energies are all correct (which calls do_force() only once).

Figures and manually-added outputs are attached. I wonder why do these phenomena happen. It seems there is a bug here. Maybe calling do_force() twice causes problem.

Fig 1,2 at mdrun, Fig 3,4 at rerun, Fig 5: energies at replica exchange (energy groups are “water” and “non-water”)