index-pack: retain child bases in delta cache#2131
Conversation
When resolving a delta whose result has children of its own, index-pack adds the result to work_head, accounts its data in base_cache_used, and calls prune_base_data(). It then immediately frees that same data. This bypasses the existing delta base cache policy and can force later descendants to reconstruct the queued base again. Let the existing delta_base_cache_limit pruning policy decide whether to keep or evict the data instead. Signed-off-by: Arijit Banerjee <arijit@effectiveailabs.com>
Welcome to GitGitGadgetHi @arijit91, and welcome to GitGitGadget, the GitHub App to send patch series to the Git mailing list from GitHub Pull Requests. Please make sure that either:
You can CC potential reviewers by adding a footer to the PR description with the following syntax: NOTE: DO NOT copy/paste your CC list from a previous GGG PR's description, Also, it is a good idea to review the commit messages one last time, as the Git project expects them in a quite specific form:
It is in general a good idea to await the automated test ("Checks") in this Pull Request before contributing the patches, e.g. to avoid trivial issues such as unportable code. Contributing the patchesBefore you can contribute the patches, your GitHub username needs to be added to the list of permitted users. Any already-permitted user can do that, by adding a comment to your PR of the form Both the person who commented An alternative is the channel Once on the list of permitted usernames, you can contribute the patches to the Git mailing list by adding a PR comment If you want to see what email(s) would be sent for a After you submit, GitGitGadget will respond with another comment that contains the link to the cover letter mail in the Git mailing list archive. Please make sure to monitor the discussion in that thread and to address comments and suggestions (while the comments and suggestions will be mirrored into the PR by GitGitGadget, you will still want to reply via mail). If you do not want to subscribe to the Git mailing list just to be able to respond to a mail, you can download the mbox from the Git mailing list archive (click the curl -g --user "<EMailAddress>:<Password>" \
--url "imaps://imap.gmail.com/INBOX" -T /path/to/raw.txtTo iterate on your change, i.e. send a revised patch or patch series, you will first want to (force-)push to the same branch. You probably also want to modify your Pull Request description (or title). It is a good idea to summarize the revision by adding something like this to the cover letter (read: by editing the first comment on the PR, i.e. the PR description): To send a new iteration, just add another PR comment with the contents: Need help?New contributors who want advice are encouraged to join git-mentoring@googlegroups.com, where volunteers who regularly contribute to Git are willing to answer newbie questions, give advice, or otherwise provide mentoring to interested contributors. You must join in order to post or view messages, but anyone can join. You may also be able to find help in real time in the developer IRC channel, |
|
/submit |
|
Error: User arijit91 is not yet permitted to use GitGitGadget |
|
/allow |
|
User arijit91 is now allowed to use GitGitGadget. |
|
/submit |
|
Submitted as pull.2131.git.1780070763044.gitgitgadget@gmail.com To fetch this version into To fetch this version to local tag |
Speed up the local pack indexing phase of clone/fetch for large
delta-compressed packs by keeping reconstructed delta bases available for
reuse when they are queued for later delta resolution.
When
index-packreconstructs a child base and queues it for resolvingdescendant deltas, it currently frees that data immediately. This can force
the same base to be reconstructed again. Instead, keep it in the existing
delta base cache and let the existing
delta_base_cache_limitpolicy decidewhether to retain or evict it.
This does not add a new cache or increase the cache limit. The object data is
already accounted in
base_cache_used, andprune_base_data()is alreadycalled at this point.
Correctness:
t/t5302-pack-index.shpassed all 36 tests.Benchmarks on a quiet Ubuntu 24.04 VM, 16 vCPU, 32 GiB RAM, local SSD:
Five-repeat public-repo medians also improved: git.git 13.1%, libgit2
14.0%, redis 13.5%, cpython 4.8%.
Perf on the linux blobless pack showed the same direction under profiling:
76.64s baseline vs 61.09s patched, with similar RSS.
CC: Ævar Arnfjörð Bjarmason avarab@gmail.com, Junio C Hamano gitster@pobox.com, Derrick Stolee stolee@gmail.com