Skip to content

repository: add replace_pack helper#9798

Open
mr-raj12 wants to merge 3 commits into
borgbackup:masterfrom
mr-raj12:pack-files-remove-objects-helper
Open

repository: add replace_pack helper#9798
mr-raj12 wants to merge 3 commits into
borgbackup:masterfrom
mr-raj12:pack-files-remove-objects-helper

Conversation

@mr-raj12

Copy link
Copy Markdown
Contributor

Description

Adds Repository.replace_pack(old_pack_id, keep_ids): rewrites a pack to keep only the given objects and deletes the old one.

  • Kept objects are copied into a new pack with store.defrag (runs store/server-side, so their bytes are not pulled over the network on remote repos).
  • The survivors' chunk index entries are repointed at the new pack; index entries of the dropped objects are removed by the caller.
  • An empty keep set drops the pack whole, without writing a replacement.

This is the pack-level primitive that compact, check and delete will use to remove objects once a pack holds more than one (N greater than 1). No callers are wired up yet; that follows in separate PRs.

Tests cover the copy-forward of a mixed pack (forced max_count greater than 1) and the whole-pack drop.

refs #8572 #8514

Checklist

  • PR is against master
  • New code has tests
  • Tests pass
  • Commit messages are clean and reference related issues

borgbackup#8572 borgbackup#8514

Copies the kept objects into a new pack via store.defrag, repoints their chunk
index entries, and deletes the old pack; an empty keep set drops the pack whole.
@codecov

codecov Bot commented Jun 21, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 84.41%. Comparing base (ab2d78b) to head (a56518e).
⚠️ Report is 3 commits behind head on master.
✅ All tests successful. No failed tests found.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #9798      +/-   ##
==========================================
- Coverage   84.68%   84.41%   -0.28%     
==========================================
  Files          92       92              
  Lines       15216    15253      +37     
  Branches     2279     2287       +8     
==========================================
- Hits        12886    12876      -10     
- Misses       1631     1671      +40     
- Partials      699      706       +7     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

Document the old_pack_id return and the caller's index-durability duty, reword
the delete-old-last rationale, and test the reproduced-pack skip and survivor sort.

@ThomasWaldmann ThomasWaldmann left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some stuff i found.

Comment thread src/borg/testsuite/repository_test.py Outdated
Comment thread src/borg/testsuite/repository_test.py Outdated
Comment thread src/borg/testsuite/repository_test.py Outdated
Comment thread src/borg/testsuite/repository_test.py Outdated
Comment thread src/borg/testsuite/repository_test.py Outdated
Comment thread src/borg/repository.py Outdated
Comment thread src/borg/repository.py Outdated
Comment thread src/borg/repository.py Outdated
Rename replace_pack to compact_pack; it now repoints kept and deletes dropped
index entries, asserting keep + drop tile the whole pack. refs borgbackup#8572 borgbackup#8514
@mr-raj12 mr-raj12 marked this pull request as ready for review June 22, 2026 12:14
# Bundle several objects (N>1) into one pack: max_count == object count makes the last put flush the
# whole batch as a single pack. Caller reopens afterwards to read it back from disk.
with repository:
repository._pack_writer.max_count = len(objects)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you don't need that, it flushes on close (CM exit) anyway.

chunk2 = fchunk(b"DATA2", chunk_id=H(2))
repository = get_repository_from_fixture(repo_fixtures, request)
build_one_pack(repository, [(H(0), chunk0), (H(1), chunk1), (H(2), chunk2)])
with reopen(repository) as repository:

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, did you check what reopen does exactly? would the normal CM also work?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

normal == "with repository: ..."

Comment thread src/borg/repository.py
Comment on lines 831 to +839
for keep_id in keep_ids:
entry = self.chunks[keep_id]
assert entry.pack_id == old_pack_id, f"{bin_to_hex(keep_id)} is not in pack {bin_to_hex(old_pack_id)}"
survivors.append((entry.obj_offset, keep_id, entry.obj_size))
survivors.sort()
assert entry.pack_id == pack_id, f"{bin_to_hex(keep_id)} is not in pack {bin_to_hex(pack_id)}"
located.append((entry.obj_offset, keep_id, entry.obj_size, True))
for drop_id in drop_ids:
entry = self.chunks[drop_id]
assert entry.pack_id == pack_id, f"{bin_to_hex(drop_id)} is not in pack {bin_to_hex(pack_id)}"
located.append((entry.obj_offset, drop_id, entry.obj_size, False))
located.sort()

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's duplicate code. How about:

for obj_id in keep_ids | drop_ids: ...
    keep = obj_id in keep_ids
    info = entry.obj_offset, obj_id, entry.obj_size
    ...

For better efficiency, func params should be sets of ids.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another assertion could be that the intersection of both sets is empty.

Comment thread src/borg/repository.py
located.sort()

# keep + drop must tile the whole pack; pick out the survivors in the same pass.
survivors = [] # (obj_offset, obj_id, obj_size), offset-ordered

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

less dramatic? :-)

Comment thread src/borg/repository.py
sources = [(bin_to_hex(old_pack_id), offset, size) for offset, _, size in survivors]
# copy survivors into a new pack (named sha256 of its content)
sources = [(bin_to_hex(pack_id), offset, size) for offset, _, size in survivors]
new_pack_id = hex_to_bin(self.store.defrag(sources, algorithm="sha256", namespace="packs"))

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did you check the return value of defrag? is it a single id?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants