Skip to content

Fix storage byte-range, logging, zip-close, getsize_prefix, and codec-order validation bugs#182

Closed
d-v-b wants to merge 8 commits into
mainfrom
claude/storage-codec-bugfixes
Closed

Fix storage byte-range, logging, zip-close, getsize_prefix, and codec-order validation bugs#182
d-v-b wants to merge 8 commits into
mainfrom
claude/storage-codec-bugfixes

Conversation

@d-v-b

@d-v-b d-v-b commented Jun 12, 2026

Copy link
Copy Markdown
Owner

Summary

This PR addresses five pre-verified bugs (B1–B5 of #181):

B1 — MemoryStore SuffixByteRequest clamp (src/zarr/storage/_utils.py)
When SuffixByteRequest.suffix > len(data), the computed start index went
negative, causing numpy slicing to return too few bytes. Fixed by clamping:
start = max(0, len(data) - byte_range.suffix). This matches how ZipStore
and LocalStore already handle this case.

B2 — LoggingStore.get_partial_values generator exhaustion (src/zarr/storage/_logging.py)
The log-hint string was built by iterating over key_ranges with a list
comprehension, which exhausted a one-shot generator before passing it to
the wrapped store. Fixed by materialising key_ranges = list(key_ranges)
first.

B3 — getsize_prefix substring over-count (src/zarr/abc/store.py)
getsize_prefix("foo") called list_prefix("foo"), which matched both
"foo/a" and "foobar/b" in prefix-matching stores. Fixed by normalising
the prefix to end with "/" (matching delete_dir / is_empty), so only
keys under the exact directory are counted.

B4 — ZipStore.close() AttributeError when never opened (src/zarr/storage/_zip.py)
_lock is only created in _sync_open(). If close() was called on a
store that was never opened (e.g. with ZipStore(...): pass), it raised
AttributeError: _lock. Fixed by returning early from close() when
_is_open is False.

B5 — Missing raise in codecs_from_list (src/zarr/core/codec_pipeline.py)
In the BytesBytesCodec branch, when prev_codec is an ArrayArrayCodec,
the error message was assigned to msg but never raised. The sibling
branches (ArrayArrayCodec and ArrayBytesCodec cases) both do
raise TypeError(msg); this one was missing the raise. Fixed by adding
raise TypeError(msg).

Test results

All five fixes have regression tests added to the relevant existing test
files. Observed test outcomes:

tests/test_store/test_utils.py            18 passed
tests/test_store/test_logging.py          75 passed, 5 skipped
tests/test_store/test_memory.py           (getsize_prefix test: 1 passed)
tests/test_store/test_zip.py              70 passed, 8 skipped
tests/test_codec_pipeline.py             4 passed, 4 skipped
tests/test_store/ (full suite)           828 passed, 177 skipped
mypy (5 changed source files)            Success: no issues found

Notes

🤖 Generated with Claude Code

… close, codec order validation

- storage/_utils.py: clamp SuffixByteRequest start to max(0, ...) so
  suffix > len(data) returns all available bytes (B1)
- storage/_logging.py: materialise key_ranges into a list before building
  the log hint string, preventing one-shot generator exhaustion (B2)
- abc/store.py: normalise getsize_prefix argument to end with "/" before
  calling list_prefix, matching delete_dir/is_empty behaviour so sibling
  keys are not over-counted (B3)
- storage/_zip.py: guard ZipStore.close() with an early return when the
  store was never opened, avoiding AttributeError on missing _lock (B4)
- core/codec_pipeline.py: add the missing raise TypeError(msg) in the
  BytesBytesCodec-after-ArrayArrayCodec branch of codecs_from_list (B5)

Add regression tests for all five fixes.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
d-v-b and others added 6 commits June 16, 2026 14:23
Augment the existing property-based test infrastructure (and convert two
example-based regression tests to property tests) so that each of the five
bugs fixed in this PR is caught by a property/stateful test, verified
red-green against the pre-fix source:

- B1 (suffix clamp): broaden the `key_ranges` strategy to emit
  SuffixByteRequest / OffsetByteRequest / None with bounds that can exceed
  the value length, and give the stateful `get_partial_values` rule an
  independent oracle for every ByteRequest variant.
- B2 (generator exhaustion): the shared `StoreTests.test_get_partial_values`
  now passes `key_ranges` as a one-shot generator and asserts one result per
  request, so a store/wrapper that exhausts the iterable early is caught
  (previously masked because the expected loop was driven by the observed
  count). The stateful rule likewise passes a generator.
- B3 (getsize_prefix sibling over-counting): add a `getsize_prefix` rule to
  the store state machine that asserts only keys under `node/` are counted.
- B4 (ZipStore.close before open): replace the two example-based regression
  tests with a stateful lifecycle machine asserting `close()` never raises,
  regardless of open/IO history.
- B5 (missing raise in codec order validation): replace the single example
  with a property test over random {AA,AB,BB} codec orderings whose expected
  outcome (TypeError / ValueError / ok) is modelled independently.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Make each of the five bugs fixed in this PR catchable by a property/stateful
or shared-harness test (verified red-green against the pre-fix source), and
remove the per-store special-case tests whose coverage is now subsumed:

- B1 (suffix clamp): broaden the `key_ranges` strategy to emit
  SuffixByteRequest / OffsetByteRequest / None with bounds that can exceed
  the value length, and give the stateful `get_partial_values` rule an
  independent oracle for every ByteRequest variant. The pure-function unit
  test `TestNormalizeByteRangeIndex` is kept as a fast deterministic guard.
- B2 (generator exhaustion): the shared `StoreTests.test_get_partial_values`
  now passes `key_ranges` as a one-shot generator and asserts one result per
  request, so any store/wrapper that exhausts the iterable early is caught
  across all stores. The logging-only regression test is removed as redundant.
- B3 (getsize_prefix sibling over-counting): the shared
  `StoreTests.test_getsize_prefix` now includes a sibling key ("cc/0") that
  must be excluded, covering every store deterministically. Add a matching
  `getsize_prefix` rule to the store state machine. The memory-only
  regression test is removed as redundant.
- B4 (ZipStore.close before open): replace the two example-based regression
  tests with a stateful lifecycle machine asserting `close()` never raises,
  regardless of open/IO history.
- B5 (missing raise in codec order validation): replace the single example
  with a property test over random {AA,AB,BB} codec orderings whose expected
  outcome (TypeError / ValueError / ok) is modelled independently.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
FsspecStore.get_partial_values guarded its body with a bare `if key_ranges:`
before materialising the argument. Because `key_ranges` is typed as a generic
Iterable, a one-shot generator is always truthy even when empty, so the
`else: return []` fast path was unreachable for generator inputs (it happened
to be harmless only because fsspec's _cat_ranges returns [] for empty input).

Materialise `key_ranges` into a list first, then check `if not key_ranges`,
matching the iterable contract for any Iterable (list or generator).

Also enable mypy's `truthy-iterable` error code, which flags exactly this
class of bug (truthiness test on a non-Collection Iterable). There are no
existing violations in src/zarr or tests, so this is a zero-cost guard against
recurrence.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The pre-fix MemoryStore behavior returned incorrect data (a wrong slice via a
negative start index), not merely fewer bytes; note the HTTP suffix-range
semantics that define the correct behavior.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@d-v-b

d-v-b commented Jun 18, 2026

Copy link
Copy Markdown
Owner Author

these changes were merged upstream via zarr-developers#4074

@d-v-b d-v-b closed this Jun 18, 2026
@d-v-b d-v-b deleted the claude/storage-codec-bugfixes branch June 18, 2026 14:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant