repository: add replace_pack helper#9798
Conversation
borgbackup#8572 borgbackup#8514 Copies the kept objects into a new pack via store.defrag, repoints their chunk index entries, and deletes the old pack; an empty keep set drops the pack whole.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #9798 +/- ##
==========================================
- Coverage 84.68% 84.41% -0.28%
==========================================
Files 92 92
Lines 15216 15253 +37
Branches 2279 2287 +8
==========================================
- Hits 12886 12876 -10
- Misses 1631 1671 +40
- Partials 699 706 +7 ☔ View full report in Codecov by Harness. |
Document the old_pack_id return and the caller's index-durability duty, reword the delete-old-last rationale, and test the reproduced-pack skip and survivor sort.
Rename replace_pack to compact_pack; it now repoints kept and deletes dropped index entries, asserting keep + drop tile the whole pack. refs borgbackup#8572 borgbackup#8514
| # Bundle several objects (N>1) into one pack: max_count == object count makes the last put flush the | ||
| # whole batch as a single pack. Caller reopens afterwards to read it back from disk. | ||
| with repository: | ||
| repository._pack_writer.max_count = len(objects) |
There was a problem hiding this comment.
you don't need that, it flushes on close (CM exit) anyway.
| chunk2 = fchunk(b"DATA2", chunk_id=H(2)) | ||
| repository = get_repository_from_fixture(repo_fixtures, request) | ||
| build_one_pack(repository, [(H(0), chunk0), (H(1), chunk1), (H(2), chunk2)]) | ||
| with reopen(repository) as repository: |
There was a problem hiding this comment.
hmm, did you check what reopen does exactly? would the normal CM also work?
There was a problem hiding this comment.
normal == "with repository: ..."
| for keep_id in keep_ids: | ||
| entry = self.chunks[keep_id] | ||
| assert entry.pack_id == old_pack_id, f"{bin_to_hex(keep_id)} is not in pack {bin_to_hex(old_pack_id)}" | ||
| survivors.append((entry.obj_offset, keep_id, entry.obj_size)) | ||
| survivors.sort() | ||
| assert entry.pack_id == pack_id, f"{bin_to_hex(keep_id)} is not in pack {bin_to_hex(pack_id)}" | ||
| located.append((entry.obj_offset, keep_id, entry.obj_size, True)) | ||
| for drop_id in drop_ids: | ||
| entry = self.chunks[drop_id] | ||
| assert entry.pack_id == pack_id, f"{bin_to_hex(drop_id)} is not in pack {bin_to_hex(pack_id)}" | ||
| located.append((entry.obj_offset, drop_id, entry.obj_size, False)) | ||
| located.sort() |
There was a problem hiding this comment.
That's duplicate code. How about:
for obj_id in keep_ids | drop_ids: ...
keep = obj_id in keep_ids
info = entry.obj_offset, obj_id, entry.obj_size
...
For better efficiency, func params should be sets of ids.
There was a problem hiding this comment.
another assertion could be that the intersection of both sets is empty.
| located.sort() | ||
|
|
||
| # keep + drop must tile the whole pack; pick out the survivors in the same pass. | ||
| survivors = [] # (obj_offset, obj_id, obj_size), offset-ordered |
| sources = [(bin_to_hex(old_pack_id), offset, size) for offset, _, size in survivors] | ||
| # copy survivors into a new pack (named sha256 of its content) | ||
| sources = [(bin_to_hex(pack_id), offset, size) for offset, _, size in survivors] | ||
| new_pack_id = hex_to_bin(self.store.defrag(sources, algorithm="sha256", namespace="packs")) |
There was a problem hiding this comment.
did you check the return value of defrag? is it a single id?
Description
Adds
Repository.replace_pack(old_pack_id, keep_ids): rewrites a pack to keep only the given objects and deletes the old one.store.defrag(runs store/server-side, so their bytes are not pulled over the network on remote repos).This is the pack-level primitive that compact, check and delete will use to remove objects once a pack holds more than one (N greater than 1). No callers are wired up yet; that follows in separate PRs.
Tests cover the copy-forward of a mixed pack (forced max_count greater than 1) and the whole-pack drop.
refs #8572 #8514
Checklist
master