Using trjcat for bulk trajectory merging issues

GROMACS version:2021
GROMACS modification: Yes
Here post your question

In the directory, there are 1250 .trr trajectory files named as MHPC_post_data_0000400000_4.trr, MHPC_post_data_0002000000_4.trr, MHPC_post_data_0003600000_4.trr…MHPC_post_data_1998800000_4.trr.
The script I am using is as follows:

#!/bin/bash
#Specify the input file directory
input_directory=“/data/Liws2021/hexanediol/million_3_result/trr/”
#Specify the output file name
output_file=“/data/Liws2021/hexanediol/million_3_result/million_3pc_4us_combined.xtc”
#Use the ls command to get the list of input files and sort them by file name
file_list=(ls -v {input_directory}MHPC_post_data_*.trr)
#Use the gmx trjcat command to concatenate all files
#gmx trjcat -f $file_list -o $output_file -cat -settime
gmx trjcat -f $file_list -o $output_file

Aim to obtain a trajectory of 5000 frames, lasting 4 microseconds.
However, after merging, I found topology issues in frames with timestamps 1162400ps, 1163200ps, 1164000ps, 1164800ps in the resulting .xtc file. I used the gmx check command to inspect the file:

Command line:
gmx check -f million_3pc_4us_combined_protein.xtc

Checking file million_3pc_4us_combined_protein.xtc
Reading frame 0 time 800.000
#Atoms 1008
Precision 0.001 (nm)
Reading frame 1400 time 1120800.000 Warning at frame 1452: there are 637 particles with all coordinates zero
Warning at frame 1453: there are 38 particles with all coordinates zero
Warning at frame 1454: there are 75 particles with all coordinates zero
Warning at frame 1455: there are 497 particles with all coordinates zero
Reading frame 4000 time 3200800.250

Item #frames Timestep (ps)
Step 5000 800
Time 5000 800
Lambda 0
Coords 5000 800
Velocities 0
Forces 0
Box 5000 800

However, the .gro file extracted from the untreated .trr files showed no abnormalities in atomic coordinates.
I have identified that the issue likely arises during the process of merging .trr files into .xtc files, but I am unsure of the specific reasons. I am seeking assistance and clarification from the community.

Have you tested if the error occurs in a recent version of GROMACS (2023.3)?

Thank you for your reply.
Oh, I haven’t tried that yet. I can install the new version to test it.
Actually, I have previously processed another batch of .trr files with the same version and the same operations, and I didn’t encounter this issue during the merging process and subsequent analysis. I think whether the difference in the .trr files corresponding to these frames during the continuation process caused this error to occur.
Thanks again for your suggestions and help.

I don’t think it will be solved in 2023.3 either, but it’s always a good thing to try first. If it’s a bug, (which is not certain yet) it is important to check if it has already been fixed or not.

If it still fails in 2023.3, it would be interesting to get some more details about the files. You say that “the .gro file extracted from the untreated .trr files showed no abnormalities” do you mean the .gro file from the last frame or have you converted all frames to .gro files? I would suspect that some of the frames in one of the trr files are broken for some reason. Could you run gmx check on the trr file corresponding to frames 1453-1455 in the xtc file?

I converted the four frames to .gro files from million_3pc_4us_combined_protein.xtc and found that some atomic coordinates were set to zero.

I checked the two original .trr files, MHPC_post_data_1162000000_4.trr and MHPC_post_data_1163600000_4.trr, using gmx check, and no errors were detected. Here are the results:

Command line:
gmx check -f MHPC_post_data_1162000000_4.trr
Checking file MHPC_post_data_1162000000_4.trr
trr version: GMX_trn_file (single precision)
Reading frame 0 time 2324000.000
#Atoms 909854
Last frame 3 time 2326400.000
Item #frames Timestep (ps)
Step 4 800
Time 4 800
Lambda 4 800
Coords 4 800
Velocities 4 800
Forces 0
Box 4 800

Command line:
gmx check -f MHPC_post_data_1163600000_4.trr
Checking file MHPC_post_data_1163600000_4.trr
trr version: GMX_trn_file (single precision)
Reading frame 0 time 2327200.000
#Atoms 909854
Last frame 3 time 2329600.000
Item #frames Timestep (ps)
Step 4 800
Time 4 800
Lambda 4 800
Coords 4 800
Velocities 4 800
Forces 0
Box 4 800

I also extracted the corresponding .gro files from the original .trr files, and it seems that there are no atomic coordinates set to zero either.

Thanks again for your suggestions and help.

Hi,

I’m a bit confused by your file names and times. MHPC_post_data_1162000000_4.trr starts at time 2324000 ps. So it seems to be after the gaps in the xtc file (at times 1120800, 11624800, 11632800 and 11640800 ps), right? Do you have corresponding trr files? I.e., something like MHPC_post_data_581000000_4.trr and MHPC_post_data_581200000_4.trr (or similar)? I’d expect the broken frames to be somewhere there.

I apologize for my oversight. I misunderstood the meaning of the file names.

Following your advice, I have located the MHPC_post_data_0581200000_4.trr file along with its corresponding log file:

580900000 299.941742 0.000000 1.701561 0.000000 0.000000 210.000000 210.000000 210.000000
581000000 300.292358 0.000000 1.703550 0.000000 0.000000 210.000000 210.000000 210.000000
581100000 299.741882 0.000000 1.700427 0.000000 0.000000 210.000000 210.000000 210.000000
581200000 300.394958 0.000000 1.704132 0.000000 0.000000 210.000000 210.000000 210.000000
581300000 299.922394 0.000000 1.701452 0.000000 0.000000 210.000000 210.000000 210.000000
581400000 299.869507 0.000000 1.701151 0.000000 0.000000 210.000000 210.000000 210.000000
581500000 300.224152 0.000000 1.703164 0.000000 0.000000 210.000000 210.000000 210.000000
581600000 299.812256 0.000000 1.700827 0.000000 0.000000 210.000000 210.000000 210.000000
581700000 299.658508 0.000000 1.699955 0.000000 0.000000 210.000000 210.000000 210.000000
581800000 300.102234 0.000000 1.702472 0.000000 0.000000 210.000000 210.000000 210.000000
581900000 300.129486 0.000000 1.702626 0.000000 0.000000 210.000000 210.000000 210.000000
582000000 299.907349 0.000000 1.701366 0.000000 0.000000 210.000000 210.000000 210.000000
582100000 299.947296 0.000000 1.701593 0.000000 0.000000 210.000000 210.000000 210.000000
582200000 300.343048 0.000000 1.703838 0.000000 0.000000 210.000000 210.000000 210.000000
582300000 299.867889 0.000000 1.701142 0.000000 0.000000 210.000000 210.000000 210.000000
582400000 299.865204 0.000000 1.701127 0.000000 0.000000 210.000000 210.000000 210.000000

These entries represent the frames with issues. After running gmx check , the problem was indeed identified:

Checking file MHPC_post_data_0581200000_4.trr
trr version: GMX_trn_file (single precision)
Reading frame 0 time 1162400.000
#Atoms 909854
Reading frame 1 time 1163200.000 Warning at frame 1: there are 208723 particles with all coordinates zero
Reading frame 2 time 1164000.000 Warning at frame 2: there are 211680 particles with all coordinates zero
Reading frame 3 time 1164800.000 Warning at frame 3: there are 212727 particles with all coordinates zero
Last frame 3 time 1164800.000
Item #frames Timestep (ps)
Step 4 800
Time 4 800
Lambda 4 800
Coords 4 800
Velocities 4 800
Forces 0
Box 4 800

Now I understand that the issue lies in the original data and is unrelated to the trjcat process.

I sincerely appreciate your prompt response and assistance amidst your busy schedule.

I’m glad the problem was identified. I hope you can work around it.