Issues with pdb2gmx for an expanded protein

GROMACS version: version 2020.1-Ubuntu-2020.1-1
GROMACS modification: Yes/No

Hello,

I expanded 2Y3J four times (pdb given below) on Mercury CCDC. I then ran the following command to create a gro file:
gmx pdb2gmx -f 2y3j_exp4.pdb -o 2y3j.gro -water none -ff oplsaa -p topol.top -n index_pdb2gmx.ndx -ignh

However, I run into the error:
Fatal error:
** Atom OXT in residue MET 6 was not found in rtp entry MET with 17 atoms**
while sorting atoms.

I read that this used to be able to be resolved by adding -ignh, however, this command is no longer valid. How can I fix things in this case?

Thank you.

gmx pdb2gmx -f input.pdb -o processed.gro -ignh -ter -water spc
select terminals and forcefield thereafter.
residue names of your protein doesnt match with gromac ff rtp entry. either customize rtp. but that be tiresome. gromacs will keep on throwing errors for each amino acids.
hope it helps.

Dear Yogesh,

Thank you for the suggestion. I tried:

gmx pdb2gmx -f 2y3j_exp4.pdb -o processed.gro -ignh -ter -water tip4p

However, it now gives a dangling bond error:

There is a dangling bond at at least one of the terminal ends. Fix your
coordinate file, add a new terminal database entry (.tdb), or select the
proper existing terminal entry.

Thanks.

pdb2gmx is not detecting the separate chains correctly. Each chain needs either its own unique chain identifier or needs to be separated by TER.

Hi Justin,

Each chain is separated by TER, e.g.

TER 86 MET B 6

Please post the entire screen output from pdb2gmx.

This is the GROMACS terminal output:

All occupancies are one
Opening force field file /usr/share/gromacs/top/oplsaa.ff/atomtypes.atp
Reading residue database… (Oplsaa)
Opening force field file /usr/share/gromacs/top/oplsaa.ff/aminoacids.rtp
Opening force field file /usr/share/gromacs/top/oplsaa.ff/aminoacids.hdb
Opening force field file /usr/share/gromacs/top/oplsaa.ff/aminoacids.n.tdb
Opening force field file /usr/share/gromacs/top/oplsaa.ff/aminoacids.c.tdb

Back Off! I just backed up topol.top to ./#topol.top.26#
Processing chain 1 ‘A’ (168 atoms, 24 residues)
Identified residue ALA1 as a starting terminus.
Identified residue MET6 as a ending terminus.
8 out of 8 lines of specbond.dat converted successfully
Special Atom Distance matrix:
MET6 MET6 MET6
SD40 SD82 SD124
MET6 SD82 1.564
MET6 SD124 3.128 1.564
MET6 SD166 4.692 3.128 1.564
Select start terminus type for ALA-1
0: NH3+
1: ZWITTERION_NH3+ (only use with zwitterions containing exactly one residue)
2: NH2
3: None
0
Start terminus ALA-1: NH3+
Select end terminus type for MET-6
0: COO-
1: ZWITTERION_COO- (only use with zwitterions containing exactly one residue)
2: COOH
3: None
0
End terminus MET-6: COO-


Program: gmx pdb2gmx, version 2020.1-Ubuntu-2020.1-1
Source file: src/gromacs/gmxpreprocess/pdb2gmx.cpp (line 745)

Fatal error:
Atom OXT in residue MET 6 was not found in rtp entry MET with 17 atoms
while sorting atoms.
.

hi try altering your end terminal residue choice. 0, 0 or 0 2, 2 0, 3 3 etc.

@wcc6571 you left out a lot of screen output about all the WARNINGS of non-sequential residues. The file has a crazy amount of fragmented stuff at the end of it. I suggest you delete redundant and incorrectly formatted coordinate entries.

The actual problem is that you have four chains with the same chain ID. So there are four polypeptides labeled A, each with a Met-6 as the C-terminus (and correctly having OXT). pdb2gmx sees residue 6 with another residue following it, which it interprets as meaning that Met-6 is not actually at a terminus:

ATOM     40  SD  MET A   6       3.159  -0.365  25.912  1.00 52.74           S
ATOM     41  CE  MET A   6       1.673  -1.242  26.340  1.00 54.57           C
ATOM     42  OXT MET A   6       3.476   4.532  24.892  1.00 59.23           O
ATOM     43  N   ALA A   1       1.807  20.028  44.815  1.00 28.70           N
ATOM     44  CA  ALA A   1       2.511  20.987  43.923  1.00 26.55           C
ATOM     45  C   ALA A   1       2.923  20.345  42.578  1.00 26.52           C

Each hexapeptide needs to be defined as its own chain and needs to be separated by TER. If you correct all of those problems, pdb2gmx will work.

Thanks for your reply,@jalemkul @Yogesh. I’ve given the four chains different IDs (i.e. A to AF) and added a TER group after each MET 6, in the PDB file. I run the same command as before:

gmx pdb2gmx -f 2y3j_hbond_edit3.pdb -o processed.gro -ignh -ter -water tip4p

Now I get the following error, where GROMACS doesn’t seem to read the different chain IDs:

GROMACS: gmx pdb2gmx, version 2020.1-Ubuntu-2020.1-1
Executable: /usr/bin/gmx
Data prefix: /usr
Working dir: /home/cwc53_unix/2Y3J_expanded_edit2
Command line:
gmx pdb2gmx -f 2y3j_hbond_edit3.pdb -o 2Y3J_processed_hbondedit3.gro -water tip4p -missing

Select the Force Field:
From ‘/usr/share/gromacs/top’:
1: AMBER03 protein, nucleic AMBER94 (Duan et al., J. Comp. Chem. 24, 1999-2012, 2003)
2: AMBER94 force field (Cornell et al., JACS 117, 5179-5197, 1995)
3: AMBER96 protein, nucleic AMBER94 (Kollman et al., Acc. Chem. Res. 29, 461-469, 1996)
4: AMBER99 protein, nucleic AMBER94 (Wang et al., J. Comp. Chem. 21, 1049-1074, 2000)
5: AMBER99SB protein, nucleic AMBER94 (Hornak et al., Proteins 65, 712-725, 2006)
6: AMBER99SB-ILDN protein, nucleic AMBER94 (Lindorff-Larsen et al., Proteins 78, 1950-58, 2010)
7: AMBERGS force field (Garcia & Sanbonmatsu, PNAS 99, 2782-2787, 2002)
8: CHARMM27 all-atom force field (CHARM22 plus CMAP for proteins)
9: GROMOS96 43a1 force field
10: GROMOS96 43a2 force field (improved alkane dihedrals)
11: GROMOS96 45a3 force field (Schuler JCC 2001 22 1205)
12: GROMOS96 53a5 force field (JCC 2004 vol 25 pag 1656)
13: GROMOS96 53a6 force field (JCC 2004 vol 25 pag 1656)
14: GROMOS96 54a7 force field (Eur. Biophys. J. (2011), 40, 843-856, DOI: 10.1007/s00249-011-0700-9)
15: OPLS-AA/L all-atom force field (2001 aminoacid dihedrals)
15

Using the Oplsaa force field in directory oplsaa.ff

going to rename oplsaa.ff/aminoacids.r2b
Opening force field file /usr/share/gromacs/top/oplsaa.ff/aminoacids.r2b
Reading 2y3j_hbond_edit3.pdb…
Read ‘CSD ENTRY 0001’, 8 atoms
Analyzing pdb file
Splitting chemical chains based on TER records or chain id changing.
WARNING: Chain identifier ‘0’ is used in two non-sequential blocks.
They will be treated as separate chains unless you reorder your file.
There are 2 chains and 0 blocks of water and 3 residues with 8 atoms

chain #res #atoms
1 ‘0’ 1 4
2 ‘0’ 2 4

All occupancy fields zero. This is probably not an X-Ray structure
Opening force field file /usr/share/gromacs/top/oplsaa.ff/atomtypes.atp
Reading residue database… (Oplsaa)
Opening force field file /usr/share/gromacs/top/oplsaa.ff/aminoacids.rtp
Opening force field file /usr/share/gromacs/top/oplsaa.ff/aminoacids.hdb
Opening force field file /usr/share/gromacs/top/oplsaa.ff/aminoacids.n.tdb
Opening force field file /usr/share/gromacs/top/oplsaa.ff/aminoacids.c.tdb

Back Off! I just backed up topol.top to ./#topol.top.1#
Processing chain 1 ‘0’ (4 atoms, 1 residues)
Problem with chain definition, or missing terminal residues.
This chain does not appear to contain a recognized chain molecule.
If this is incorrect, you can edit residuetypes.dat to modify the behavior.
8 out of 8 lines of specbond.dat converted successfully


Program: gmx pdb2gmx, version 2020.1-Ubuntu-2020.1-1
Source file: src/gromacs/gmxpreprocess/resall.cpp (line 557)

Fatal error:
Residue ‘B20’ not found in residue topology database

For more information and tips for troubleshooting, please check the GROMACS
website at http://www.gromacs.org/Documentation/Errors

I’m struggling to fix this, would you have any suggestions?

Your file is still misformatted because instead of A-F, your chain IDs are being read as 0, which are still non-sequential according to the warning.

Would you know how to correct for this? Thank you.

You need to share what the file actually looks like to be able to say that. There is something wrong with the manner your input file is formatted, it is not how pdb2gmx is expecting, so it is not reading the information out “correctly” or how you think it should.

@Dr_DBW

This is the pdb I’m working with: https://drive.google.com/file/d/1QYg2GkNbCp2Ja31aBjLMGL07099mOKhE/view?usp=sharing

I’ve attached the command output. I don’t understand why it cannot detect the different protein chains, even though I manually added on a TER group. It’s definitely a formatting issue, because I quadrupled the protein structure on Mercury (CDCC software) beforehand, and things work on the single 2Y3J structure. Would really appreciate it if you could have a look.

Command line:
gmx pdb2gmx -f 2y3j_exp.pdb -o processed_exp.gro -ignh -ter

Select the Force Field:
From ‘/usr/share/gromacs/top’:
1: AMBER03 protein, nucleic AMBER94 (Duan et al., J. Comp. Chem. 24, 1999-2012, 2003)
2: AMBER94 force field (Cornell et al., JACS 117, 5179-5197, 1995)
3: AMBER96 protein, nucleic AMBER94 (Kollman et al., Acc. Chem. Res. 29, 461-469, 1996)
4: AMBER99 protein, nucleic AMBER94 (Wang et al., J. Comp. Chem. 21, 1049-1074, 2000)
5: AMBER99SB protein, nucleic AMBER94 (Hornak et al., Proteins 65, 712-725, 2006)
6: AMBER99SB-ILDN protein, nucleic AMBER94 (Lindorff-Larsen et al., Proteins 78, 1950-58, 2010)
7: AMBERGS force field (Garcia & Sanbonmatsu, PNAS 99, 2782-2787, 2002)
8: CHARMM27 all-atom force field (CHARM22 plus CMAP for proteins)
9: GROMOS96 43a1 force field
10: GROMOS96 43a2 force field (improved alkane dihedrals)
11: GROMOS96 45a3 force field (Schuler JCC 2001 22 1205)
12: GROMOS96 53a5 force field (JCC 2004 vol 25 pag 1656)
13: GROMOS96 53a6 force field (JCC 2004 vol 25 pag 1656)
14: GROMOS96 54a7 force field (Eur. Biophys. J. (2011), 40, 843-856, DOI: 10.1007/s00249-011-0700-9)
15: OPLS-AA/L all-atom force field (2001 aminoacid dihedrals)
15

Using the Oplsaa force field in directory oplsaa.ff

Opening force field file /usr/share/gromacs/top/oplsaa.ff/watermodels.dat

Select the Water Model:
1: TIP4P TIP 4-point, recommended
2: TIP4PEW TIP 4-point with Ewald
3: TIP3P TIP 3-point
4: TIP5P TIP 5-point (see http://redmine.gromacs.org/issues/1348 for issues)
5: TIP5P TIP 5-point improved for Ewald sums
6: SPC simple point charge
7: SPC/E extended simple point charge
8: None
1
going to rename oplsaa.ff/aminoacids.r2b
Opening force field file /usr/share/gromacs/top/oplsaa.ff/aminoacids.r2b
Reading 2y3j_exp.pdb…
Read ‘AMYLOID BETA A4 PROTEIN; 6 AMYLOID PROTEIN, CEREBRAL VASCULAR AMYLOID PEPTIDE, CVAP, PREA4,; 7 PROTEASE NEXIN-II, PN-II’, 8 atoms
Analyzing pdb file
Splitting chemical chains based on TER records or chain id changing.
WARNING: Chain identifier ‘0’ is used in two non-sequential blocks.
They will be treated as separate chains unless you reorder your file.
There are 2 chains and 0 blocks of water and 4 residues with 8 atoms

chain #res #atoms
1 ‘0’ 2 4
2 ‘0’ 2 4

All occupancy fields zero. This is probably not an X-Ray structure
Opening force field file /usr/share/gromacs/top/oplsaa.ff/atomtypes.atp
Reading residue database… (Oplsaa)
Opening force field file /usr/share/gromacs/top/oplsaa.ff/aminoacids.rtp
Opening force field file /usr/share/gromacs/top/oplsaa.ff/aminoacids.hdb
Opening force field file /usr/share/gromacs/top/oplsaa.ff/aminoacids.n.tdb
Opening force field file /usr/share/gromacs/top/oplsaa.ff/aminoacids.c.tdb

Back Off! I just backed up topol.top to ./#topol.top.22#
Processing chain 1 ‘0’ (4 atoms, 2 residues)
Problem with chain definition, or missing terminal residues.
This chain does not appear to contain a recognized chain molecule.
If this is incorrect, you can edit residuetypes.dat to modify the behavior.
8 out of 8 lines of specbond.dat converted successfully


Program: gmx pdb2gmx, version 2020.1-Ubuntu-2020.1-1
Source file: src/gromacs/gmxpreprocess/resall.cpp (line 557)

Fatal error:
Residue ‘B20’ not found in residue topology database

The spacing of the file is totally wrong and you have an unknown residue as the error states. PDB is a fixed-column format so you need to properly align all the fields.