Which strategy is best for creating topology files for large molecules?

GROMACS version: 22.04
GROMACS modification: Yes/No
I want to create a topology file for a copolymer like PLGA with 1100 atoms to be used in the charmm forcefield to perform MD of protein encapsulation. As you know, cgenff calculates topology files for molecules with less than 384 atoms. Should I use cgenff and split the copolymer into smaller pieces and generate topology files for each separate piece and combine them or use swissparam to generate the topology? Which one of them is the best choice?

Any polymer should be parametrized from its constituent monomers, e.g. terminal residues and internal (repeating) building blocks.

Thank you very much for your effective suggestion. According to your opinion, I used Maestro Schrodinger to generate the copolymer with the residues of constituent monomers containing 1000 atoms and 123 residues. I created topology files (itp, prm, pdb, top) for one monomer of PLA and PGA with CGenFF, and try add them to the charmm36-jul2022 forcefield and then for my polymer (PLGA) I use gmx pdb2gmx to generate gro file. But I got this error: “Residue ‘PLA’ not found in residue topology database”. I tried several strategies below, but I got the same error again. would you please guide me whether this strategy I have chosen is correct or not and how can I fix the errors?

I open “residuetypes.dat” and add entries for your monomers:
PLA myforcefield.ff
PGA myforcefield.ff
And I append the contents of my monomer topology files (top or itp) into charmm36-jul2022 forcefield itp files.

If you have a topology for the monomers, you need to write .rtp entries (and perhaps .tdb entries) for these species. The presence of .itp files is irrelevant for the purposes of pdb2gmx.

This is unnecessary and also the wrong syntax. You do not need to edit residuetypes.dat for this kind of system, only if you are trying to introduce a new residue into an existing biopolymer type (like Protein, etc.).

Thank you very much for your reply and help. According to your suggestion I parametrized my polymer (PEG-PLA-PGA) from its constituent monomers, e.g. terminal residues (for the beginning and end of the chain) and internal (repeating) building blocks, (for example for PGA I introduced 3 residues including PGI, PGA & PGT with help of CGecff and some edits) and Copy and paste my new residues at the end of aminoacids.rtp. Finally, append the new residue names to the file residuetypes.dat
in the directory
PEI Protein
PEG Protein
PET Protein
PLI Protein
PLA Protein
PLT Protein
PGI Protein
PGA Protein
PGT Protein
and performed “gmx pdb2gmx -f PEG40-PLGA38.pdb -o PEG40-PLGA38_processed.gro -p PLGA_PEG.top -i PLGA_PEG-posre.itp -ter -ignh” But it encountered a Fatal error: Atom O in residue PGT 121 was not found in rtp entry PGT with 7 atoms while sorting atoms. (also the whole output has been uploaded as “outputs.dat” along with this text). Please guide me how to solve this problem.

For example PGT added in aminiacid.rtp
[ PGT ]
;
[ atoms ]
C1 CG321 0.076 1
C2 CG2O2 0.729 2
O1 OG311 -0.602 2
O2 OG2D1 -0.561 2
H3 HGA2 0.090 1
H2 HGA2 0.090 1
H4 HGP1 0.430 2
[ bonds ]
C1 -O1
C1 C2
C1 H3
C1 H2
C2 O1
C2 O2
O1 H4
[ impropers ]

A part of PEG40-PLGA38.pdb file related to PGT
.
.
.
ATOM 884 C1 PGT A 121 -1.422 -14.696 18.023 1.00 0.00 C
ATOM 885 C2 PGT A 121 -1.885 -16.158 18.129 1.00 0.00 C
ATOM 886 O1 PGT A 121 -0.880 -16.978 18.531 1.00 0.00 O
ATOM 887 O2 PGT A 121 -3.021 -16.546 17.870 1.00 0.00 O
ATOM 888 H3 PGT A 121 -0.781 -14.612 17.147 1.00 0.00 H
ATOM 889 H2 PGT A 121 -0.813 -14.422 18.885 1.00 0.00 H
ATOM 890 H4 PGT A 121 -1.217 -17.860 18.558 1.00 0.00 H
outputs.dat (8.6 KB)

There is no need to define the new residues as protein. This is what is causing the problem, because xlateat.dat is translating O1 to O when it finds the atom in a C-terminal residue of a protein chain. If you do not define your polymer as protein, the match to “protein-cterm” should not be satisfied and it won’t be internally renamed by pdb2gmx.

I appreciate you for excellent explanation and reply. According to your suggestion I modified the new residue names (Protein → Other) to the file residuetypes.dat in the directory and performed pdb2gmx program, but it encountered a Fatal error: There were 412 missing atoms in molecule Other_chain_A, if you want to use this incomplete topology anyhow, use the option -missing. And in other parts of output it mentioned:
atom H4 is missing in residue PGT 121 in the pdb file You might need to add atom H4 to the hydrogen database of building block PGT in the file aminoacids.hdb (see the manual) (and there are similar errors for other hydrogen bonds for the other 412 missing atoms as output.)
I ran pdb2gmx again with -missing, in this condition I was able to get the .gro file but without the H bands which is not good for me. Please guide me to solve this problem and If I need to rebuild the hydrogen database and add them to aminoacids.hdb, is it possible to generate automatically the hdb file through charmm2gmx program with using the str file (created with Cgenff) as the input file?

Thanking you.

The chemical nature of PGA and PLA is very simple so it should be straightforward to write the .hdb entries by hand using the function types noted in the manual.