CgenFF parameters showing high penalty scores

GROMACS version:2022.3
GROMACS modification: CUDA enabled
Parameterization of an unprotonated form of Histidine is showing high penalty scores for CG and other sidechain atoms. How do I optimize / validate the parameters that I have generated from paramchem /CgenFF.

GROUP ! CHARGE CH_PENALTY
ATOM N1 NG321 -1.072 ! 7.610
ATOM H2 HGPAM2 0.396 ! 2.558
ATOM C3 CG311 0.338 ! 21.571
ATOM H4 HGA1 0.090 ! 0.000
ATOM C5 CG321 -0.186 ! 96.922
ATOM C6 CG2O2 0.777 ! 6.227
ATOM H7 HGA2 0.090 ! 2.270
ATOM H8 HGA2 0.090 ! 2.270
ATOM C9 CG2R51 0.206 ! 145.125
ATOM O10 OG2D1 -0.548 ! 2.558
ATOM N11 NG2D1 -0.525 ! 165.188
ATOM C12 CG2R51 0.205 ! 46.208
ATOM C13 CG2R53 0.118 ! 91.834
ATOM N14 NG2R50 -0.469 ! 36.744
ATOM H15 HGR52 0.169 ! 23.894
ATOM H16 HGR52 0.188 ! 65.342
ATOM H17 HGPAM2 0.396 ! 2.558
ATOM O18 OG311 -0.553 ! 3.286
ATOM H19 HGP1 0.290 ! 0.000

Direct answer: ff validation is typically done via some type of empirical comparison of simulated thermodynamics to experimental values (e.g., logP, NMR, free energy of solvation) or comparison to QM/DFT (or simply by refitting to QM/DFT and thereby ensuring that match).

Better answer: are you sure parameters are not already available? I.e, why use an autogeneration approach to reparameterize a molecule that is presumably already available based on some more rigorous methods than parameter-by-analogy? By deprotonated do you mean neutral or anionic histidine?

Hi, the inputted structure was DFT optimized using B3LYP and a high level basis pople basis set. The scores are still high. I looked for the parameters for this negatively charged form of histidine on the charmm forum but to no avail.

I believe the main chain parameters for the amino acid will be very slightly deviated / unchanged as compared to regular histidine, which is reiterated by the fact those atoms do have low penalties. The problem is mainly localised to the imidazole ring parameters.

I think the penalty score is not so much a measure of how right or wrong it is, but how similar the build-by-analogy base is (I could be wrong, but that’s worth looking up). With a pKa of 14.5 for imidazole, it’s perhaps unsurprising that there are not a lot of analogies here. If this is for something like an iron complex, you may find that you need to add artificial angles/dihedrals to get the correct coordination geometries anyway, in which case you might ask how relevant the base parameters even are if they’re going to be overridden by additional potentials. We found that we had to add additional non-physics-based parameters to get proper geometries for neutral his/anionic cys coordinated zinc (Exploring CRD mobility during RAS/RAF engagement at the membrane - ScienceDirect).

I had made similar considerations for my unprotonated histidines. But due to lack of any background data on Beta sheet unprotonated histidine as well as any experimental data, only main chain psi and phi torsional constraints could be derived and applied. i too am working with an overall negatively charged Zn-His ensemble. Would you recommend I use the generated parameters and try to optimize them according to experimental data?

The geometry of the file you supply to CGenFF isn’t relevant, it’s only the topology and atom types which are used to generate the parameters (by analogy to existing similar systems).

<On a side note: B3LYP is quite outdated and there are better hybrid functionals with similar cost. Pople basis sets are also very old and not well suited to DFT. In fact, 6-311G isn’t even really of triple-zeta quality. They were designed 50 years ago for use with HF/MP2 calculations but there are far better options for both nowadays.>

None of that is particularly relevant to validating and optimising CGenFF parameters though, as you’ll be using MP2/6-31G* for that, to ensure consistency with the rest of the CHARMM force field :)

  1. Validate your parameters. Optimise the geometries of each conformer at MP2/6-31G* and compare to the MM-minimised geometries with your test parameter set. Compare the individual bond lengths, valence angles, and dihedral angles. I think the accepted tolerance was 0.02 A and 3 deg for bonds and valence angles, but check with the CGenFF literature.

Validating partial charges is trickier. I like to compare the CGenFF charges with a set of different partial charge algorithms. ESP, NPA, Hirshfeld (anything but Mulliken! :P). Are the signs the same? Note that CGenFF charges tend to be more polarised due to the method used to fit them (water interaction energies at each charge site). You’ll have to use your chemical intuition here.

  1. Once you’ve identified which parameters are of poor quality, you can start optimising them. I suggest a thorough reading of the CGenFF and CHARMM FF literature, as they go over many examples.

  2. Revalidate your new parameters to see if you’ve actually improved them. If not, you’ll have to figure out why and try again.

This overall process is likely going to take several weeks to several months depending on how many parameters are off, how fast you learn, and whether the molecule likes you or not.

I guess my first question is how sure are you that they are anionic histidines? When people say “unprotonated histidine” or “deprotonated histidine” they usually mean neutral histidine. The neutral and cationic histidines are the only ones relevant to biology (as far as I know). The pKa for neutral vs. anionic imidazole is extremely high, and pKa should be similar for histidine to imidazole. I suspect if you have a protein that is folded in aqueous solution and coordinated Zn, then your histidines are neutral (i.e., deprotonated at either the ND1 or NE2, but not at both) and it is the deprotonated nitrogen that coordinates the Zn. In this case, you can use the standard parameters for your histidines. In our usage, we had an NMR structure that provided some information on the coordination geometry, and we found that it was wildly unstable unless we added some additional parameters (in this case artificial bonds and angles, but I think we also evaluated including dihedral terms at one point as well and the person doing that study decided that the artificial dihedrals were unnecessary).

@plagueis Thank you for the concise reply. To address your side note, I completely agree that B3LYp is a obsolete method due to the BSSE and missing dispersion forces. Keeping that in mind, 6-311G(d,p) and empericial dispersion functional gd3bj were inputted during DFT optimization. Keeping in mind that my non standard amino acid will be solvent exposed, SMD solvation model was also included.

Regarding your query about the partial charges, I did compare cGenFF charges with ESP charges and while the partial charges of the sidechain do match ( variance of +- 0.16) the charge sign of the C-alpha do not. But again, I was hoping to use the main chain charges of regular HIS parameters instead to rectify this

I will go forward with side-by-side comparisons of dihedrals/angles and bond distances, but would you recommend I use the lsqPar package that is provided as a way to optimize CgenFF parameters?

@therealchrisneale I am not working on an enzyme system but instead an artificially created B-sheet catalytic aggregate. The Non-protonated ( No H on ND1 or NE2) HIS was suspected due to the formation of multiple coordinations only via HIS in experimental data

Well, that part of the SC is rigid. And nobody’s going to convince me that the charmm HSE partial charges are exact out of any QM theory (-0.70000 is too round a number for that to be true). I’ll be in the minority view on this list for saying this, but why not just take HSE, remove the HE2 atom, change NE2 charge from -0.36 to -0.7 and CG from 0.22 to -0.05 (by analogy to HSD), and make some other minor charge adjustments to get a full anion. You may even need to add some new null dihedral parameters after changing the nitrogen atom type from NR1 to NR2, but it’s an aromatic ring so that stuff should be simple by analogy. If the difference between -0.7 and -0.6 is going to be a big deal, then how can one anyway argue that a point charge/LJ representation of Zn is realistic or that that the competition between regular charmm his and this differently parameterized version is not going to cause its own problems?

@therealchrisneale That is sound advice.I did already test that and created a set of Charmm parameters by analogy to HSD and HSE. The following were my parameters.

N NH1 -0.47 0
HN H 0.31 1
CA CT1 0.07 2
HA HB 0.09 3
CB CT2 -0.09 4
HB1 HA 0.09 5
HB2 HA 0.09 6
ND1 NR1 -0.70 7
CG CPH1 -0.05 8
CE1 CPH2 0.25 9
HE1 HR1 0.13 10
NE2 NR2 -0.70 11
CD2 CPH1 0.22 12
HD2 HR3 0.10 13
C C 0.51 14
O O -0.51 15

I reused the parameters for mainchain atoms from the HSE and HSD entries and in a way fused the parameters for the imidazole ring from both standard HIS entries. I also kept the bonds, impropers and cmaps similar to regular histidine and only removed a hydrogen entry from aminoacids.hdb file of regular HIS to treat the unprotonated histidine.
But using these parameters give me a more tetraheadrel conformation with an explicit water molecule than an expected trigonal planar or trigonal pyramidal (with one water). Which is why I’m interested in using other parametrization methods to see if I can still replicate that result.

Indeed, they’re not. The target data for partial charge assignment is not even the QM dipole moment directly, it’s a value overestimated by 15-20% to reflect mean-field, condensed-phase polarization. Water interactions are then used to validate and further refine partial charges, using a cheap QM method (HF/6-31G*) that serendipitously provides an appropriately overestimated strength of interaction (with some additional shifting of minima and scaling of the interaction energy based on TIP3P properties).