In need of guidance to use make_ndx

GROMACS version: 2024.1
GROMACS modification: No

Hello GROMACS community.

I am attempting to follow jalemkul’s umbrella sampling tutorial.
I have succeeded so far until I reached the section of Generating configurations.

In particular, I am unable to explain myself how the following instructions actually work:

gmx make_ndx -f npt.gro
...
 > r 1-27
 > name 19 Chain_A
 > r 28-54
 > name 20 Chain_B
 > q

To be more specific:

a) How does make_ndx know which of the five chains I am selecting residues 1-27?

b) What name 19 means? Why not, for example, name 5 and we save one character?

Could a kind-hearted member explain those commands verbosely?

Things that I have done before deciding to post:

  1. I have agreed with an old post stating that “make_ndx is quite cryptic to understand” (May 2021).

  2. I have tried gmx make_ndx followed by h. Interesting by still fully cryptic.

  3. I have tried gmx make_ndx -h, which offers no explanation as to how the logic works.

  4. I have read the reference manual, specifically the part about Using groups. (And lost all hope.)

Thank you.

Ivan

Yes, it would be great to make make_ndx more user-friendly. Anyhow …

a) make_ndx uses the residue numbering in the .gro file, starting from 1 and not restarting the numbering in following chains. So, r 1-27 is the first 27 residues (chain A), and r 28-54 are the next 27 residues (chain B).

b) name 19 chain_A means that you are renaming selection group 19 to chain_A. I think you will find a list of the other (automatically generated, unless you specify an index file as input to gmx make_ndx) selection groups if you look in the make_ndx output. Otherwise, it should be listed if you just enter a blank line in its interface.

I hope that helps for now at least.

Hi MagnusL.

I understand better now.

So, to make it a bit more clear to the next user getting acquainted with GROMACS, the .gro format is leaner that you might expect.

This is the starting structure for the tutorial:

from pymol import cmd
cmd.load('2BEG_model1_capped.pdb')
cmd.get_chains()
['A', 'B', 'C', 'D', 'E']

which shows that there are five chains. All identical, all 27 residues long (not shown).

However, this is the .gro derived from that .pdb:

from pymol import cmd
cmd.load('npt.gro')
cmd.get_chains()
['']

As shown here, the .gro does not have a chain concept.
Residues are included (not shown) but they must be picked from a continuum of concatenated positions:

1, 2, ..., 27, 28, ..., 54, 55, ...
|<-chain A-->|<-chain B-->|<-chain C, etc->

Thank you MagnusL.

Ivan

And there is an additional twist to this story:
For make_ndx, the word r (residue) is not synonymous to amino acid.

Residue 1 is not an amino acid.

echo 'list residues' | gmx make_ndx -f npt.gro

output

>
1 ACE     2 LEU     3 VAL     4 PHE     5 PHE     6 ALA     7 GLU     8 ASP     9 VAL
10 GLY    11 SER    12 ASN    13 LYS    14 GLY    15 ALA    16 ILE    17 ILE    18 GLY
19 LEU    20 MET    21 VAL    22 GLY    23 GLY    24 VAL    25 VAL    26 ILE    27 ALA
28 ACE    29 LEU    30 VAL    31 PHE    32 PHE    33 ALA    34 GLU    35 ASP    36 VAL
37 GLY    38 SER    39 ASN    40 LYS    41 GLY    42 ALA    43 ILE    44 ILE    45 GLY
46 LEU    47 MET    48 VAL    49 GLY    50 GLY    51 VAL    52 VAL    53 ILE    54 ALA
55 ACE    56 LEU    57 VAL    58 PHE    59 PHE    60 ALA    61 GLU    62 ASP    63 VAL
64 GLY    65 SER    66 ASN    67 LYS    68 GLY    69 ALA    70 ILE    71 ILE    72 GLY
73 LEU    74 MET    75 VAL    76 GLY    77 GLY    78 VAL    79 VAL    80 ILE    81 ALA
82 ACE    83 LEU    84 VAL    85 PHE    86 PHE    87 ALA    88 GLU    89 ASP    90 VAL
91 GLY    92 SER    93 ASN    94 LYS    95 GLY    96 ALA    97 ILE    98 ILE    99 GLY
100 LEU   101 MET   102 VAL   103 GLY   104 GLY   105 VAL   106 VAL   107 ILE   108 ALA
109 ACE   110 LEU   111 VAL   112 PHE   113 PHE   114 ALA   115 GLU   116 ASP   117 VAL
118 GLY   119 SER   120 ASN   121 LYS   122 GLY   123 ALA   124 ILE   125 ILE   126 GLY
127 LEU   128 MET   129 VAL   130 GLY   131 GLY   132 VAL   133 VAL   134 ILE   135 ALA
136 - 11224 SOL     11225 - 11255 NA      11256 - 11276 CL

Notice that residues 1, 28, 55, 82, and 109 are not amino acids but N-terminal acetylations.

What a twist for the unguarded structural biologist!
:)

Indeed, this can be a bit confusing. A residue is not as strictly defined, in most computational tools, as expected, from a biochemical point of view.

I believe this has to do with the assumption that everything belongs to a residue. This has been the case in, e.g., PDB files for a very long time. It is a convenient unit to separate monomers as well as separate small molecules of the system.