Garbage Collection¶
Garbage collection is used to remove data from a repository that is no longer referenced.
Generally this involves locking the repository and scanning all its branches then generating a new repository with less data.
Least work we can hope to perform¶
Read all branches to get initial references - tips + tags.
Read through the revision graph to find unreferenced revisions. A cheap HEADS list might help here by allowing comparison of the initial references to the HEADS - any unreferenced head is garbage.
Walk out via inventory deltas to get the full set of texts and signatures to preserve.
Copy to a new repository
Bait and switch back to the original
Remove the old repository.
A possibility to reduce this would be to have a set of grouped ‘known garbage free’ data - ‘ancient history’ which can be preserved in total should its HEADS be fully referenced - and where the HEADS list is deliberate cheap (e.g. at the top of some index).
possibly - null data in place without saving size.