We modified the source codes of GROMACS, which enables it to support the CIF format structure files as direct input and subsequently generate molecular topology files. For very large protein complexes lacking PDB-format files, our contribution streamlines the preprocessing phase of molecular dynamics simulations.
Command: gmx pdb2gmx -f x.cif -o x.gro -p x.top -i x.itp
Github link: https://github.com/zyzhangGroup/Gromacs-CIF
Contact person: Hengyue Wang email@example.com
How the work has been tested/reviewed: A md5sum comparison between topology files generated from the CIF format and those generated from the PDB format was conducted, thereby affirming their correctness.
There has been discussion of CIF support already, notably whether to use an external library, or an internal solution, as yours.
Gromacs is mostly concerned with atoms, coordinates, and possibly also cell dimensions. I am not very familiar with the CIF format, but I see that it supports a large amount of data items (like details related to the experimental setup, chemical data, citation …). These are irrelevant for MD, but they should be skipped over cleanly to support files from any (standard-conforming) software.
- Can you confirm whether your implementation is likely to work with any (standard-conforming) CIF file, skipping over without producing error on the data items that are irrelevant for Gromacs?
- Did you test with a few files produced by a variety of software to ensure this?
- Does your code supports writing out to a CIF file? (for visualization of such large structure after simulation)
Glad you are interested in my work! For the following questions, here is the answer:
The CIF file download from Protein Data Bank only tests our software. We skipped all data not contained in the PDB ATOM and HATETM lines. So if the CIF file from another source has that information, it should be supported by our software. Additionally, our codes do not support writing CIF files.