Post run folder cleanup to free space

GROMACS version: All
GROMACS modification: No

Hi all,
I’m the sysadmin in charge of the servers for a group of computational chemists in an University.
I have a couple of folders to archive and some of them are insanely heavy.
As the user left the place, I’m wondering if there are files we can anyway delete without compromising the experiment result?

For instance, I have a directory containing “frames-xxxx-xxxx” folders from
frames_0-100 to frames_9901-10000 they all contains framexxxx.gro files
This represents 100 folders of 1.2Gb each
Yes it’s only 120Gb for this run but I have 14Tb in total and I’m not really excited to gzip that :-)

So, is there a documentation somewhere or a list of files we can erase before archiving things?

Many thanks in advance for your help

Quick precision, I’m not chemist at all and I don’t know how to use GROMACS.

It would be invaluable to have the cooperation of the ex-user, if possible.

I suspect these frames are coming from a single trajectory file, which will be much smaller than the sum of all the files, and that they might have been written individually for analysis. In that case, it would be perfectly fine to delete them. However, in the absence of the original trajectory file and instruction on how to re-generate them if necessary, it would hurt reproducibility to just delete them. Is there a .xtc and script/instruction somewhere by any chance? (Note, collection of .gro file should compress relatively nicely though)

Thanks for your answer!
It looks like I have other files
You can re-create the .gro files from the .xtc?

-rw-r--r-- 1 1071 1013 9.1M Mar 10  2022 index-ecosystem-his225p.ndx
-rw-r--r-- 1 1071 1013 9.3G Mar 10  2022 trj-1us-dt100ps.xtc

and some other files in each “frames_xxx” folders

./Asp40-Asp115/frames_9901-10000/topol-ecodmt-system-his225p.top
./Asp40-Asp115/frames_9901-10000/topo-ecodmt-his225p.itp
./Asp40-Asp115/frames_9901-10000/posre-ecodmt-his225p.itp
./Asp40-Asp115/frames_9901-10000/CHOL.itp
./Asp40-Asp115/frames_9901-10000/POPC.itp
./Asp40-Asp115/frames_9901-10000/POPE.itp
./Asp40-Asp115/frames_9901-10000/amberslipid.ff
./Asp40-Asp115/frames_9901-10000/amberslipid.ff/spc.itp
./Asp40-Asp115/frames_9901-10000/amberslipid.ff/ffnonbonded-orig.itp
./Asp40-Asp115/frames_9901-10000/amberslipid.ff/forcefield.doc
./Asp40-Asp115/frames_9901-10000/amberslipid.ff/readme
./Asp40-Asp115/frames_9901-10000/amberslipid.ff/ffbonded.itp
./Asp40-Asp115/frames_9901-10000/amberslipid.ff/ffnonbonded-old.itp
./Asp40-Asp115/frames_9901-10000/amberslipid.ff/tip3p.itp
./Asp40-Asp115/frames_9901-10000/amberslipid.ff/atomtypes.atp
./Asp40-Asp115/frames_9901-10000/amberslipid.ff/test.pdb
./Asp40-Asp115/frames_9901-10000/amberslipid.ff/spce.itp
./Asp40-Asp115/frames_9901-10000/amberslipid.ff/lipids.rtp
./Asp40-Asp115/frames_9901-10000/amberslipid.ff/ffbonded-old.itp
./Asp40-Asp115/frames_9901-10000/amberslipid.ff/ffbonded-orig.itp
./Asp40-Asp115/frames_9901-10000/amberslipid.ff/ffnonbonded.itp
./Asp40-Asp115/frames_9901-10000/amberslipid.ff/ions.itp
./Asp40-Asp115/frames_9901-10000/amberslipid.ff/watermodels.dat
./Asp40-Asp115/frames_9901-10000/amberslipid.ff/forcefield.itp
./Asp40-Asp115/frames_9901-10000/amberslipid.ff/tip4p.itp
./Asp40-Asp115/frames_9901-10000/wb_asp40-asp115.py

But as you said, it’s probably better first to contact the user to see what and how :-/

The frames folder go up to 10000, which is the number of frame likely to be in trj-1us-dt100ps.xtc (1000000 ps, saved every 100 ps, so 10000 total ).

To check it:

gmx dump -f trj-1us-dt100ps.xtc | head

Look for time=0.0000000e+00 . It could be another number.

Then use this command to output the very first frame, replacing 0 with the time found in the previous step, and 10 with time+10:

gmx trjconv -f trj-1us-dt100ps.xtc -o test-frame_1.gro -b 0 -e 10

Similarly for the second frame (its at t=100ps if the first time is 0, so the interval 80-110 will contain it only)

gmx trjconv -f trj-1us-dt100ps.xtc -o test-frame_2.gro -b 80 -e 110

You can then diff test-frame_1.gro and 2 with and the first/second frame you have in the first folder. If they match, it’s quite likely that the .xtc file is the source for the frames. Then save one of the folder (need all this .top and .itp files, which I suspect are the same for each folder), except for the gros, and make a text file explaining the setup.

But yeah, user should probably do this kind of cleanup ideally before leaving, if storage space is limited.

1 Like

Quick update of the situation after talking with the user, she cleaned “a bit” her files…
We jumped from 14Tb to 720Gb :-D

Thanks for your time, we can close this thread.

1 Like