From 8da13336dfa60c2b4412097bc5e58488aebbb66c Mon Sep 17 00:00:00 2001 From: "dependabot[bot]" <49699333+dependabot[bot]@users.noreply.github.com> Date: Sun, 31 May 2026 19:28:07 +0200 Subject: [PATCH 1/7] chore(deps): bump the actions group across 1 directory with 8 updates (#176) Bumps the actions group with 8 updates in the / directory: | Package | From | To | | --- | --- | --- | | [prefix-dev/setup-pixi](https://github.com/prefix-dev/setup-pixi) | `0.9.5` | `0.9.6` | | [codecov/codecov-action](https://github.com/codecov/codecov-action) | `6.0.0` | `6.0.1` | | [github/issue-metrics](https://github.com/github/issue-metrics) | `4.2.2` | `4.2.7` | | [j178/prek-action](https://github.com/j178/prek-action) | `2.0.3` | `2.0.4` | | [actions/upload-artifact](https://github.com/actions/upload-artifact) | `7.0.0` | `7.0.1` | | [actions/download-artifact](https://github.com/actions/download-artifact) | `7.0.0` | `8.0.1` | | [pypa/gh-action-pypi-publish](https://github.com/pypa/gh-action-pypi-publish) | `1.13.0` | `1.14.0` | | [zizmorcore/zizmor-action](https://github.com/zizmorcore/zizmor-action) | `0.5.3` | `0.5.6` | Updates `prefix-dev/setup-pixi` from 0.9.5 to 0.9.6 - [Release notes](https://github.com/prefix-dev/setup-pixi/releases) - [Commits](https://github.com/prefix-dev/setup-pixi/compare/1b2de7f3351f171c8b4dfeb558c639cb58ed4ec0...5185adfbffb4bd703da3010310260805d89ebb11) Updates `codecov/codecov-action` from 6.0.0 to 6.0.1 - [Release notes](https://github.com/codecov/codecov-action/releases) - [Changelog](https://github.com/codecov/codecov-action/blob/main/CHANGELOG.md) - [Commits](https://github.com/codecov/codecov-action/compare/57e3a136b779b570ffcdbf80b3bdc90e7fab3de2...e79a6962e0d4c0c17b229090214935d2e33f8354) Updates `github/issue-metrics` from 4.2.2 to 4.2.7 - [Release notes](https://github.com/github/issue-metrics/releases) - [Commits](https://github.com/github/issue-metrics/compare/c9e9838147fd355dace335ba787f01b6641a400a...1e38d5e62363e14db8019ed7d106b9855bdba6cc) Updates `j178/prek-action` from 2.0.3 to 2.0.4 - [Release notes](https://github.com/j178/prek-action/releases) - [Commits](https://github.com/j178/prek-action/compare/6ad80277337ad479fe43bd70701c3f7f8aa74db3...bdca6f102f98e2b4c7029491a53dfd366469e33d) Updates `actions/upload-artifact` from 7.0.0 to 7.0.1 - [Release notes](https://github.com/actions/upload-artifact/releases) - [Commits](https://github.com/actions/upload-artifact/compare/v7...043fb46d1a93c77aae656e7c1c64a875d1fc6a0a) Updates `actions/download-artifact` from 7.0.0 to 8.0.1 - [Release notes](https://github.com/actions/download-artifact/releases) - [Commits](https://github.com/actions/download-artifact/compare/v7...3e5f45b2cfb9172054b4087a40e8e0b5a5461e7c) Updates `pypa/gh-action-pypi-publish` from 1.13.0 to 1.14.0 - [Release notes](https://github.com/pypa/gh-action-pypi-publish/releases) - [Commits](https://github.com/pypa/gh-action-pypi-publish/compare/v1.13.0...cef221092ed1bacb1cc03d23a2d87d1d172e277b) Updates `zizmorcore/zizmor-action` from 0.5.3 to 0.5.6 - [Release notes](https://github.com/zizmorcore/zizmor-action/releases) - [Commits](https://github.com/zizmorcore/zizmor-action/compare/b1d7e1fb5de872772f31590499237e7cce841e8e...5f14fd08f7cf1cb1609c1e344975f152c7ee938d) --- updated-dependencies: - dependency-name: prefix-dev/setup-pixi dependency-version: 0.9.6 dependency-type: direct:production update-type: version-update:semver-patch dependency-group: actions - dependency-name: codecov/codecov-action dependency-version: 6.0.1 dependency-type: direct:production update-type: version-update:semver-patch dependency-group: actions - dependency-name: github/issue-metrics dependency-version: 4.2.7 dependency-type: direct:production update-type: version-update:semver-patch dependency-group: actions - dependency-name: j178/prek-action dependency-version: 2.0.4 dependency-type: direct:production update-type: version-update:semver-patch dependency-group: actions - dependency-name: actions/upload-artifact dependency-version: 7.0.1 dependency-type: direct:production update-type: version-update:semver-patch dependency-group: actions - dependency-name: actions/download-artifact dependency-version: 8.0.1 dependency-type: direct:production update-type: version-update:semver-major dependency-group: actions - dependency-name: pypa/gh-action-pypi-publish dependency-version: 1.14.0 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: actions - dependency-name: zizmorcore/zizmor-action dependency-version: 0.5.6 dependency-type: direct:production update-type: version-update:semver-patch dependency-group: actions ... Signed-off-by: dependabot[bot] Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> --- .github/workflows/downstream.yml | 2 +- .github/workflows/gpu_test.yml | 2 +- .github/workflows/hypothesis.yaml | 2 +- .github/workflows/issue-metrics.yml | 2 +- .github/workflows/lint.yml | 2 +- .github/workflows/test.yml | 4 ++-- .github/workflows/zarr-metadata-release.yml | 12 ++++++------ .github/workflows/zizmor.yml | 2 +- 8 files changed, 14 insertions(+), 14 deletions(-) diff --git a/.github/workflows/downstream.yml b/.github/workflows/downstream.yml index 74026233c4..3eb6898895 100644 --- a/.github/workflows/downstream.yml +++ b/.github/workflows/downstream.yml @@ -34,7 +34,7 @@ jobs: persist-credentials: false - name: Set up pixi - uses: prefix-dev/setup-pixi@1b2de7f3351f171c8b4dfeb558c639cb58ed4ec0 # v0.9.5 + uses: prefix-dev/setup-pixi@5185adfbffb4bd703da3010310260805d89ebb11 # v0.9.6 with: manifest-path: xarray/pixi.toml diff --git a/.github/workflows/gpu_test.yml b/.github/workflows/gpu_test.yml index 403441b306..333769cb9e 100644 --- a/.github/workflows/gpu_test.yml +++ b/.github/workflows/gpu_test.yml @@ -76,7 +76,7 @@ jobs: hatch env run --env "$HATCH_ENV" run-coverage - name: Upload coverage - uses: codecov/codecov-action@57e3a136b779b570ffcdbf80b3bdc90e7fab3de2 # v6.0.0 + uses: codecov/codecov-action@e79a6962e0d4c0c17b229090214935d2e33f8354 # v6.0.1 with: token: ${{ secrets.CODECOV_TOKEN }} flags: gpu diff --git a/.github/workflows/hypothesis.yaml b/.github/workflows/hypothesis.yaml index 4f9467be7d..a456b2aa0a 100644 --- a/.github/workflows/hypothesis.yaml +++ b/.github/workflows/hypothesis.yaml @@ -93,7 +93,7 @@ jobs: key: cache-hypothesis-${{ runner.os }}-${{ github.run_id }} - name: Upload coverage - uses: codecov/codecov-action@57e3a136b779b570ffcdbf80b3bdc90e7fab3de2 # v6.0.0 + uses: codecov/codecov-action@e79a6962e0d4c0c17b229090214935d2e33f8354 # v6.0.1 with: token: ${{ secrets.CODECOV_TOKEN }} flags: tests diff --git a/.github/workflows/issue-metrics.yml b/.github/workflows/issue-metrics.yml index 14fba5b9ec..510849ef3e 100644 --- a/.github/workflows/issue-metrics.yml +++ b/.github/workflows/issue-metrics.yml @@ -33,7 +33,7 @@ jobs: echo "last_month=$first_day..$last_day" >> "$GITHUB_ENV" - name: Run issue-metrics tool - uses: github/issue-metrics@c9e9838147fd355dace335ba787f01b6641a400a # v4.2.2 + uses: github/issue-metrics@1e38d5e62363e14db8019ed7d106b9855bdba6cc # v4.2.7 env: GH_TOKEN: ${{ secrets.GITHUB_TOKEN }} SEARCH_QUERY: 'repo:zarr-developers/zarr-python is:issue created:${{ env.last_month }} -reason:"not planned"' diff --git a/.github/workflows/lint.yml b/.github/workflows/lint.yml index 768e660ec2..fec211b4dd 100644 --- a/.github/workflows/lint.yml +++ b/.github/workflows/lint.yml @@ -30,4 +30,4 @@ jobs: uses: astral-sh/setup-uv@08807647e7069bb48b6ef5acd8ec9567f424441b # v8.1.0 with: enable-cache: true - - uses: j178/prek-action@6ad80277337ad479fe43bd70701c3f7f8aa74db3 # v2.0.3 + - uses: j178/prek-action@bdca6f102f98e2b4c7029491a53dfd366469e33d # v2.0.4 diff --git a/.github/workflows/test.yml b/.github/workflows/test.yml index 03143d3e5b..62e571856b 100644 --- a/.github/workflows/test.yml +++ b/.github/workflows/test.yml @@ -78,7 +78,7 @@ jobs: hatch env run --env "$HATCH_ENV" run-coverage - name: Upload coverage if: ${{ matrix.dependency-set == 'optional' && matrix.os == 'ubuntu-latest' }} - uses: codecov/codecov-action@57e3a136b779b570ffcdbf80b3bdc90e7fab3de2 # v6.0.0 + uses: codecov/codecov-action@e79a6962e0d4c0c17b229090214935d2e33f8354 # v6.0.1 with: token: ${{ secrets.CODECOV_TOKEN }} flags: tests @@ -125,7 +125,7 @@ jobs: run: | hatch env run --env "$HATCH_ENV" run-coverage - name: Upload coverage - uses: codecov/codecov-action@57e3a136b779b570ffcdbf80b3bdc90e7fab3de2 # v6.0.0 + uses: codecov/codecov-action@e79a6962e0d4c0c17b229090214935d2e33f8354 # v6.0.1 with: token: ${{ secrets.CODECOV_TOKEN }} flags: tests diff --git a/.github/workflows/zarr-metadata-release.yml b/.github/workflows/zarr-metadata-release.yml index 809d502f16..9639fcfdd3 100644 --- a/.github/workflows/zarr-metadata-release.yml +++ b/.github/workflows/zarr-metadata-release.yml @@ -35,7 +35,7 @@ jobs: - name: Build run: hatch build - - uses: actions/upload-artifact@bbbca2ddaa5d8feaa63e36b76fdaad77386f024f # v7.0.0 + - uses: actions/upload-artifact@043fb46d1a93c77aae656e7c1c64a875d1fc6a0a # v7.0.1 with: name: zarr-metadata-dist path: packages/zarr-metadata/dist @@ -45,7 +45,7 @@ jobs: needs: [build] runs-on: ubuntu-latest steps: - - uses: actions/download-artifact@37930b1c2abaa49bbe596cd826c3c89aef350131 # v7.0.0 + - uses: actions/download-artifact@3e5f45b2cfb9172054b4087a40e8e0b5a5461e7c # v8.0.1 with: name: zarr-metadata-dist path: dist @@ -76,7 +76,7 @@ jobs: id-token: write # required for OIDC trusted publishing attestations: write # required for artifact attestations steps: - - uses: actions/download-artifact@37930b1c2abaa49bbe596cd826c3c89aef350131 # v7.0.0 + - uses: actions/download-artifact@3e5f45b2cfb9172054b4087a40e8e0b5a5461e7c # v8.0.1 with: name: zarr-metadata-dist path: dist @@ -87,7 +87,7 @@ jobs: subject-path: dist/* - name: Publish package to PyPI - uses: pypa/gh-action-pypi-publish@ed0c53931b1dc9bd32cbe73a98c7f6766f8a527e # v1.13.0 + uses: pypa/gh-action-pypi-publish@cef221092ed1bacb1cc03d23a2d87d1d172e277b # v1.14.0 upload_testpypi: name: Upload to TestPyPI @@ -101,7 +101,7 @@ jobs: id-token: write attestations: write steps: - - uses: actions/download-artifact@37930b1c2abaa49bbe596cd826c3c89aef350131 # v7.0.0 + - uses: actions/download-artifact@3e5f45b2cfb9172054b4087a40e8e0b5a5461e7c # v8.0.1 with: name: zarr-metadata-dist path: dist @@ -112,6 +112,6 @@ jobs: subject-path: dist/* - name: Publish package to TestPyPI - uses: pypa/gh-action-pypi-publish@ed0c53931b1dc9bd32cbe73a98c7f6766f8a527e # v1.13.0 + uses: pypa/gh-action-pypi-publish@cef221092ed1bacb1cc03d23a2d87d1d172e277b # v1.14.0 with: repository-url: https://test.pypi.org/legacy/ diff --git a/.github/workflows/zizmor.yml b/.github/workflows/zizmor.yml index da19f22421..7ac4fe5d0e 100644 --- a/.github/workflows/zizmor.yml +++ b/.github/workflows/zizmor.yml @@ -32,4 +32,4 @@ jobs: persist-credentials: false - name: Run zizmor - uses: zizmorcore/zizmor-action@b1d7e1fb5de872772f31590499237e7cce841e8e # v0.5.3 + uses: zizmorcore/zizmor-action@5f14fd08f7cf1cb1609c1e344975f152c7ee938d # v0.5.6 From 78c99230cd30eb5ed6b1d9c18dad26e57965cf24 Mon Sep 17 00:00:00 2001 From: Davis Vann Bennett Date: Fri, 12 Jun 2026 13:31:14 +0200 Subject: [PATCH 2/7] fix: storage byte-range clamp, logging generator, getsize_prefix, zip close, codec order validation - storage/_utils.py: clamp SuffixByteRequest start to max(0, ...) so suffix > len(data) returns all available bytes (B1) - storage/_logging.py: materialise key_ranges into a list before building the log hint string, preventing one-shot generator exhaustion (B2) - abc/store.py: normalise getsize_prefix argument to end with "/" before calling list_prefix, matching delete_dir/is_empty behaviour so sibling keys are not over-counted (B3) - storage/_zip.py: guard ZipStore.close() with an early return when the store was never opened, avoiding AttributeError on missing _lock (B4) - core/codec_pipeline.py: add the missing raise TypeError(msg) in the BytesBytesCodec-after-ArrayArrayCodec branch of codecs_from_list (B5) Add regression tests for all five fixes. Co-Authored-By: Claude Fable 5 --- src/zarr/abc/store.py | 2 ++ src/zarr/core/codec_pipeline.py | 1 + src/zarr/storage/_logging.py | 1 + src/zarr/storage/_utils.py | 2 +- src/zarr/storage/_zip.py | 2 ++ tests/test_codec_pipeline.py | 17 +++++++++++++++- tests/test_store/test_logging.py | 26 +++++++++++++++++++++++++ tests/test_store/test_memory.py | 19 ++++++++++++++++++ tests/test_store/test_utils.py | 33 +++++++++++++++++++++++++++++++- tests/test_store/test_zip.py | 14 ++++++++++++++ 10 files changed, 114 insertions(+), 3 deletions(-) diff --git a/src/zarr/abc/store.py b/src/zarr/abc/store.py index 304d0cddb5..7c187594df 100644 --- a/src/zarr/abc/store.py +++ b/src/zarr/abc/store.py @@ -536,6 +536,8 @@ async def getsize_prefix(self, prefix: str) -> int: from zarr.core.common import concurrent_map from zarr.core.config import config + if prefix != "" and not prefix.endswith("/"): + prefix += "/" keys = [(x,) async for x in self.list_prefix(prefix)] limit = config.get("async.concurrency") sizes = await concurrent_map(keys, self.getsize, limit=limit) diff --git a/src/zarr/core/codec_pipeline.py b/src/zarr/core/codec_pipeline.py index 5c26681d6b..032703fc03 100644 --- a/src/zarr/core/codec_pipeline.py +++ b/src/zarr/core/codec_pipeline.py @@ -679,6 +679,7 @@ def codecs_from_list( "must be preceded by either another BytesBytesCodec, or an ArrayBytesCodec. " f"Got {type(prev_codec)} instead." ) + raise TypeError(msg) bytes_bytes += (cur_codec,) else: raise TypeError diff --git a/src/zarr/storage/_logging.py b/src/zarr/storage/_logging.py index 5de300c144..c6f58ccd61 100644 --- a/src/zarr/storage/_logging.py +++ b/src/zarr/storage/_logging.py @@ -179,6 +179,7 @@ async def get_partial_values( key_ranges: Iterable[tuple[str, ByteRequest | None]], ) -> list[Buffer | None]: # docstring inherited + key_ranges = list(key_ranges) keys = ",".join([k[0] for k in key_ranges]) with self.log(keys): return await self._store.get_partial_values(prototype=prototype, key_ranges=key_ranges) diff --git a/src/zarr/storage/_utils.py b/src/zarr/storage/_utils.py index 1f8e9b0a29..b100f862cf 100644 --- a/src/zarr/storage/_utils.py +++ b/src/zarr/storage/_utils.py @@ -153,7 +153,7 @@ def _normalize_byte_range_index(data: Buffer, byte_range: ByteRequest | None) -> start = byte_range.offset stop = len(data) + 1 elif isinstance(byte_range, SuffixByteRequest): - start = len(data) - byte_range.suffix + start = max(0, len(data) - byte_range.suffix) stop = len(data) + 1 else: raise ValueError(f"Unexpected byte_range, got {byte_range}.") diff --git a/src/zarr/storage/_zip.py b/src/zarr/storage/_zip.py index 897797e999..430b0c3e2a 100644 --- a/src/zarr/storage/_zip.py +++ b/src/zarr/storage/_zip.py @@ -120,6 +120,8 @@ def __setstate__(self, state: dict[str, Any]) -> None: def close(self) -> None: # docstring inherited + if not self._is_open: + return super().close() with self._lock: self._zf.close() diff --git a/tests/test_codec_pipeline.py b/tests/test_codec_pipeline.py index fa41c2867b..780f7c98bd 100644 --- a/tests/test_codec_pipeline.py +++ b/tests/test_codec_pipeline.py @@ -4,9 +4,10 @@ import pytest import zarr -from zarr.codecs import BytesCodec, CastValue +from zarr.codecs import BytesCodec, CastValue, GzipCodec, TransposeCodec from zarr.core.array import _get_chunk_spec from zarr.core.buffer.core import default_buffer_prototype +from zarr.core.codec_pipeline import codecs_from_list from zarr.core.indexing import BasicIndexer from zarr.storage import MemoryStore @@ -120,3 +121,17 @@ def test_codec_pipeline_threads_dtype_through_evolve(source_dtype: str, target_d ) arr[:] = np.asarray([0, 1, 2, 3], dtype=source_dtype) np.testing.assert_array_equal(arr[:], np.asarray([0, 1, 2, 3], dtype=source_dtype)) + + +def test_codecs_from_list_bytes_bytes_after_array_array_raises() -> None: + """Regression: placing a BytesBytesCodec immediately after an ArrayArrayCodec + (with no ArrayBytesCodec in between) must raise TypeError. Previously the error + message was assigned to ``msg`` but never raised, so the function silently + continued and later raised the unrelated ValueError "Required ArrayBytesCodec + was not found." instead of the descriptive TypeError.""" + aa_codec = TransposeCodec(order=(0, 1)) + bb_codec = GzipCodec() + + # [ArrayArrayCodec, BytesBytesCodec] — no ArrayBytesCodec between them. + with pytest.raises(TypeError, match="Invalid codec order"): + codecs_from_list([aa_codec, bb_codec]) diff --git a/tests/test_store/test_logging.py b/tests/test_store/test_logging.py index 96cd184938..e08f8958b9 100644 --- a/tests/test_store/test_logging.py +++ b/tests/test_store/test_logging.py @@ -168,3 +168,29 @@ async def test_logging_store_counter(store: Store) -> None: else: assert wrapped.counter["get"] == 1 assert wrapped.counter["delete_dir"] == 0 + + +async def test_logging_store_get_partial_values_generator() -> None: + """Regression: get_partial_values must not exhaust a one-shot generator when + building the log string, leaving the wrapped store with an empty iterable.""" + from zarr.storage import MemoryStore + + inner = MemoryStore() + proto = default_buffer_prototype() + await inner.set("a", proto.buffer.from_bytes(b"hello")) + await inner.set("b", proto.buffer.from_bytes(b"world")) + + wrapped = LoggingStore(store=inner, log_level="DEBUG") + + # Pass a generator (one-shot iterator) rather than a list. + key_ranges_gen = ((key, None) for key in ("a", "b")) + results = await wrapped.get_partial_values( + prototype=proto, + key_ranges=key_ranges_gen, + ) + + assert len(results) == 2 + assert results[0] is not None + assert results[1] is not None + assert results[0].to_bytes() == b"hello" + assert results[1].to_bytes() == b"world" diff --git a/tests/test_store/test_memory.py b/tests/test_store/test_memory.py index 1e3ee89e92..9435f470c0 100644 --- a/tests/test_store/test_memory.py +++ b/tests/test_store/test_memory.py @@ -410,3 +410,22 @@ def test_garbage_collection(self) -> None: # URL should no longer resolve with pytest.raises(ValueError, match="garbage collected"): ManagedMemoryStore.from_url(url) + + +async def test_getsize_prefix_no_sibling_overcounting() -> None: + """Regression: getsize_prefix("foo") must not count keys under "foobar/" because + list_prefix matches any key that starts with the given prefix string, not just + those under that directory. The fix normalizes the prefix to end with "/".""" + from zarr.core.buffer.core import default_buffer_prototype + + store = MemoryStore() + proto = default_buffer_prototype() + + # "foo/a" belongs to the "foo" prefix tree; "foobar/b" does not. + await store.set("foo/a", proto.buffer.from_bytes(b"hello")) # 5 bytes + await store.set("foobar/b", proto.buffer.from_bytes(b"world!")) # 6 bytes + + size_foo = await store.getsize_prefix("foo") + assert size_foo == 5, ( + f"getsize_prefix('foo') should count only 'foo/a' (5 bytes), got {size_foo}" + ) diff --git a/tests/test_store/test_utils.py b/tests/test_store/test_utils.py index b1934e7eae..291526fab8 100644 --- a/tests/test_store/test_utils.py +++ b/tests/test_store/test_utils.py @@ -5,7 +5,9 @@ import pytest -from zarr.storage._utils import ParsedStoreUrl, parse_store_url +from zarr.abc.store import SuffixByteRequest +from zarr.core.buffer.core import default_buffer_prototype +from zarr.storage._utils import ParsedStoreUrl, _normalize_byte_range_index, parse_store_url class TestParseStoreUrl: @@ -95,3 +97,32 @@ def test_drive_letter_not_special_on_non_windows(self, url: str) -> None: result = parse_store_url(url) # urlparse interprets the drive letter as a scheme assert result.scheme == "c" + + +class TestNormalizeByteRangeIndex: + """Tests for _normalize_byte_range_index.""" + + def test_suffix_larger_than_data_returns_all_bytes(self) -> None: + """Regression: SuffixByteRequest with suffix > len(data) must not produce a + negative start index that causes numpy to return fewer bytes than available.""" + prototype = default_buffer_prototype() + data = prototype.buffer.from_bytes(b"hello") # 5 bytes + byte_range = SuffixByteRequest(suffix=7) + start, stop = _normalize_byte_range_index(data, byte_range) + assert start == 0, f"start should be 0 (clamped), got {start}" + result = data[start:stop] + assert len(result) == 5, f"expected all 5 bytes, got {len(result)}" + + def test_suffix_exact_length(self) -> None: + """SuffixByteRequest with suffix == len(data) returns all bytes.""" + prototype = default_buffer_prototype() + data = prototype.buffer.from_bytes(b"hello") + start, _stop = _normalize_byte_range_index(data, SuffixByteRequest(suffix=5)) + assert start == 0 + + def test_suffix_shorter_than_data(self) -> None: + """SuffixByteRequest with suffix < len(data) returns the last n bytes.""" + prototype = default_buffer_prototype() + data = prototype.buffer.from_bytes(b"hello") + start, _stop = _normalize_byte_range_index(data, SuffixByteRequest(suffix=3)) + assert start == 2 diff --git a/tests/test_store/test_zip.py b/tests/test_store/test_zip.py index be51bcedcb..3e4b2b10d2 100644 --- a/tests/test_store/test_zip.py +++ b/tests/test_store/test_zip.py @@ -177,3 +177,17 @@ async def test_move(self, tmp_path: Path) -> None: assert destination.exists() assert not origin.exists() assert np.array_equal(array[...], np.arange(10)) + + def test_close_without_open_does_not_raise(self, tmp_path: Path) -> None: + """Regression: ZipStore.close() must not raise AttributeError when the store + was never opened (i.e., _sync_open was never called and _lock does not exist).""" + store = ZipStore(tmp_path / "never_opened.zip", mode="w") + assert not store._is_open + # Should complete without raising AttributeError. + store.close() + + def test_context_manager_without_io_does_not_raise(self, tmp_path: Path) -> None: + """Regression: using ZipStore as a context manager without doing any I/O + must not raise AttributeError on __exit__ (which calls close()).""" + with ZipStore(tmp_path / "ctx_manager.zip", mode="w"): + pass # no I/O; store is never explicitly opened From 68365b6c2e64a0fae8a5a0bc0eab55fe37583c48 Mon Sep 17 00:00:00 2001 From: Davis Vann Bennett Date: Thu, 18 Jun 2026 12:27:25 +0200 Subject: [PATCH 3/7] test: strengthen property/stateful tests to catch the five bugfixes Augment the existing property-based test infrastructure (and convert two example-based regression tests to property tests) so that each of the five bugs fixed in this PR is caught by a property/stateful test, verified red-green against the pre-fix source: - B1 (suffix clamp): broaden the `key_ranges` strategy to emit SuffixByteRequest / OffsetByteRequest / None with bounds that can exceed the value length, and give the stateful `get_partial_values` rule an independent oracle for every ByteRequest variant. - B2 (generator exhaustion): the shared `StoreTests.test_get_partial_values` now passes `key_ranges` as a one-shot generator and asserts one result per request, so a store/wrapper that exhausts the iterable early is caught (previously masked because the expected loop was driven by the observed count). The stateful rule likewise passes a generator. - B3 (getsize_prefix sibling over-counting): add a `getsize_prefix` rule to the store state machine that asserts only keys under `node/` are counted. - B4 (ZipStore.close before open): replace the two example-based regression tests with a stateful lifecycle machine asserting `close()` never raises, regardless of open/IO history. - B5 (missing raise in codec order validation): replace the single example with a property test over random {AA,AB,BB} codec orderings whose expected outcome (TypeError / ValueError / ok) is modelled independently. Co-Authored-By: Claude Opus 4.8 (1M context) --- src/zarr/testing/stateful.py | 53 ++++++++++++++++++--- src/zarr/testing/store.py | 15 ++++-- src/zarr/testing/strategies.py | 31 +++++++++---- tests/test_codec_pipeline.py | 84 +++++++++++++++++++++++++++++----- tests/test_store/test_zip.py | 82 ++++++++++++++++++++++++++++----- 5 files changed, 221 insertions(+), 44 deletions(-) diff --git a/src/zarr/testing/stateful.py b/src/zarr/testing/stateful.py index d6c43f4ecc..9817ebd618 100644 --- a/src/zarr/testing/stateful.py +++ b/src/zarr/testing/stateful.py @@ -1,6 +1,6 @@ import builtins import functools -from collections.abc import Callable +from collections.abc import Callable, Iterable from typing import Any, cast import hypothesis.extra.numpy as npst @@ -18,7 +18,12 @@ import zarr from zarr import Array -from zarr.abc.store import Store +from zarr.abc.store import ( + OffsetByteRequest, + RangeByteRequest, + Store, + SuffixByteRequest, +) from zarr.codecs.bytes import BytesCodec from zarr.core.buffer import Buffer, BufferPrototype, cpu, default_buffer_prototype from zarr.core.sync import SyncMixin @@ -460,7 +465,7 @@ def get(self, key: str, prototype: BufferPrototype) -> Buffer | None: return self._sync(self.store.get(key, prototype=prototype)) def get_partial_values( - self, key_ranges: builtins.list[Any], prototype: BufferPrototype + self, key_ranges: Iterable[Any], prototype: BufferPrototype ) -> builtins.list[Buffer | None]: return self._sync(self.store.get_partial_values(prototype=prototype, key_ranges=key_ranges)) @@ -476,6 +481,9 @@ def clear(self) -> None: def exists(self, key: str) -> bool: return self._sync(self.store.exists(key)) + def getsize_prefix(self, prefix: str) -> int: + return self._sync(self.store.getsize_prefix(prefix)) + def list_dir(self, prefix: str) -> None: raise NotImplementedError @@ -555,7 +563,9 @@ def get_partial_values(self, data: DataObject) -> None: key_ranges(keys=st.sampled_from(sorted(self.model.keys())), max_size=MAX_BINARY_SIZE) ) note(f"(get partial) {key_range=}") - obs_maybe = self.store.get_partial_values(key_range, self.prototype) + # Pass a one-shot generator rather than a list: stores (and wrappers such + # as LoggingStore) must not exhaust the iterable before using it. + obs_maybe = self.store.get_partial_values((kr for kr in key_range), self.prototype) observed = [] for obs in obs_maybe: @@ -565,9 +575,23 @@ def get_partial_values(self, data: DataObject) -> None: model_vals_ls = [] for key, byte_range in key_range: - start = byte_range.start - stop = byte_range.end - model_vals_ls.append(self.model[key][start:stop]) + # Independently model each ByteRequest variant (do NOT reuse the + # store's _normalize_byte_range_index helper, so this stays an + # independent oracle). Bounds may exceed the value length. + value = self.model[key] + n = len(value) + if byte_range is None: + expected = value[:] + elif isinstance(byte_range, RangeByteRequest): + expected = value[byte_range.start : byte_range.end] + elif isinstance(byte_range, OffsetByteRequest): + expected = value[byte_range.offset :] + elif isinstance(byte_range, SuffixByteRequest): + # "last suffix bytes"; suffix > n means the whole value. + expected = value[max(0, n - byte_range.suffix) :] + else: + raise AssertionError(f"unexpected byte_range {byte_range!r}") + model_vals_ls.append(expected) assert all( obs == exp.to_bytes() for obs, exp in zip(observed, model_vals_ls, strict=True) @@ -612,6 +636,21 @@ def exists(self, key: str) -> None: assert self.store.exists(key) == (key in self.model) + @precondition(lambda self: len(self.model.keys()) > 0) + @rule(data=st.data()) + def getsize_prefix(self, data: DataObject) -> None: + # Measure the size under the first path segment of some existing key. + # getsize_prefix(node) must count only keys under the directory "node/", + # not sibling keys that merely share the string prefix (e.g. measuring + # "a" must not include a sibling key "ab/..."). + key = data.draw(st.sampled_from(sorted(self.model.keys()))) + node = key.split("/")[0] + note(f"(getsize_prefix) {node=}") + + observed = self.store.getsize_prefix(node) + expected = sum(len(value) for k, value in self.model.items() if k.startswith(node + "/")) + assert observed == expected, (observed, expected, node) + @invariant() def check_paths_equal(self) -> None: note("Checking that paths are equal") diff --git a/src/zarr/testing/store.py b/src/zarr/testing/store.py index 81024c85c8..f0211dbc45 100644 --- a/src/zarr/testing/store.py +++ b/src/zarr/testing/store.py @@ -370,11 +370,19 @@ async def test_get_partial_values( for key, _ in key_ranges: await self.set(store, key, self.buffer_cls.from_bytes(bytes(key, encoding="utf-8"))) - # read back just part of it + # read back just part of it. Pass key_ranges as a one-shot generator + # (a valid Iterable per the method signature) to ensure stores and + # wrappers do not exhaust the iterable before handing it to the backend. observed_maybe = await store.get_partial_values( - prototype=default_buffer_prototype(), key_ranges=key_ranges + prototype=default_buffer_prototype(), + key_ranges=(kr for kr in key_ranges), ) + # One result must be returned per requested key range. Checking this + # explicitly guards against a store/wrapper exhausting the key_ranges + # iterable early and silently returning fewer (or no) results. + assert len(observed_maybe) == len(key_ranges) + observed: list[Buffer] = [] expected: list[Buffer] = [] @@ -382,8 +390,7 @@ async def test_get_partial_values( assert obs is not None observed.append(obs) - for idx in range(len(observed)): - key, byte_range = key_ranges[idx] + for key, byte_range in key_ranges: result = await store.get( key, prototype=default_buffer_prototype(), byte_range=byte_range ) diff --git a/src/zarr/testing/strategies.py b/src/zarr/testing/strategies.py index 7d6556a359..0ef1ba99bb 100644 --- a/src/zarr/testing/strategies.py +++ b/src/zarr/testing/strategies.py @@ -11,7 +11,13 @@ from hypothesis.strategies import SearchStrategy import zarr -from zarr.abc.store import RangeByteRequest, Store +from zarr.abc.store import ( + ByteRequest, + OffsetByteRequest, + RangeByteRequest, + Store, + SuffixByteRequest, +) from zarr.codecs.bytes import BytesCodec from zarr.codecs.crc32c_ import Crc32cCodec from zarr.codecs.sharding import SUBCHUNK_WRITE_ORDER, ShardingCodec, SubchunkWriteOrder @@ -654,22 +660,29 @@ def predicate(value: tuple[Any, ...]) -> bool: def key_ranges( keys: SearchStrategy[str] = node_names, max_size: int = sys.maxsize -) -> SearchStrategy[list[tuple[str, RangeByteRequest]]]: +) -> SearchStrategy[list[tuple[str, ByteRequest | None]]]: """ Function to generate key_ranges strategy for get_partial_values() returns list strategy w/ form:: - [(key, (range_start, range_end)), - (key, (range_start, range_end)),...] + [(key, byte_request), + (key, byte_request),...] + + where ``byte_request`` is ``None`` or any of the concrete ``ByteRequest`` + subtypes. The bounds are drawn independently of each value's length, so the + offsets/suffixes routinely exceed the data and exercise the clamping logic + in ``_normalize_byte_range_index``. """ - def make_request(start: int, length: int) -> RangeByteRequest: + def make_range(start: int, length: int) -> RangeByteRequest: return RangeByteRequest(start, end=min(start + length, max_size)) - byte_ranges = st.builds( - make_request, - start=st.integers(min_value=0, max_value=max_size), - length=st.integers(min_value=0, max_value=max_size), + bound = st.integers(min_value=0, max_value=max_size) + byte_ranges: SearchStrategy[ByteRequest | None] = st.one_of( + st.none(), + st.builds(make_range, start=bound, length=bound), + st.builds(OffsetByteRequest, offset=bound), + st.builds(SuffixByteRequest, suffix=bound), ) key_tuple = st.tuples(keys, byte_ranges) return st.lists(key_tuple, min_size=1, max_size=10) diff --git a/tests/test_codec_pipeline.py b/tests/test_codec_pipeline.py index 780f7c98bd..4d596164db 100644 --- a/tests/test_codec_pipeline.py +++ b/tests/test_codec_pipeline.py @@ -1,8 +1,15 @@ from __future__ import annotations +from typing import TYPE_CHECKING + import numpy as np import pytest +pytest.importorskip("hypothesis") + +import hypothesis.strategies as st +from hypothesis import given + import zarr from zarr.codecs import BytesCodec, CastValue, GzipCodec, TransposeCodec from zarr.core.array import _get_chunk_spec @@ -11,6 +18,11 @@ from zarr.core.indexing import BasicIndexer from zarr.storage import MemoryStore +if TYPE_CHECKING: + from collections.abc import Callable + + from zarr.abc.codec import Codec + @pytest.mark.parametrize( ("write_slice", "read_slice", "expected_statuses"), @@ -123,15 +135,63 @@ def test_codec_pipeline_threads_dtype_through_evolve(source_dtype: str, target_d np.testing.assert_array_equal(arr[:], np.asarray([0, 1, 2, 3], dtype=source_dtype)) -def test_codecs_from_list_bytes_bytes_after_array_array_raises() -> None: - """Regression: placing a BytesBytesCodec immediately after an ArrayArrayCodec - (with no ArrayBytesCodec in between) must raise TypeError. Previously the error - message was assigned to ``msg`` but never raised, so the function silently - continued and later raised the unrelated ValueError "Required ArrayBytesCodec - was not found." instead of the descriptive TypeError.""" - aa_codec = TransposeCodec(order=(0, 1)) - bb_codec = GzipCodec() - - # [ArrayArrayCodec, BytesBytesCodec] — no ArrayBytesCodec between them. - with pytest.raises(TypeError, match="Invalid codec order"): - codecs_from_list([aa_codec, bb_codec]) +# Property-based check of codecs_from_list ordering validation. +# +# Valid codec orderings are exactly: (ArrayArrayCodec)* (ArrayBytesCodec) +# (BytesBytesCodec)*. codecs_from_list walks adjacent pairs and must raise +# TypeError the moment a codec appears in a structurally invalid position -- +# notably, a BytesBytesCodec immediately following an ArrayArrayCodec with no +# ArrayBytesCodec in between (which previously built an error message but never +# raised it, falling through to an unrelated ValueError instead). +_AA = "AA" # ArrayArrayCodec -> TransposeCodec +_AB = "AB" # ArrayBytesCodec -> BytesCodec +_BB = "BB" # BytesBytesCodec -> GzipCodec + +_CODEC_FACTORY: dict[str, Callable[[], Codec]] = { + _AA: lambda: TransposeCodec(order=(0, 1)), + _AB: BytesCodec, + _BB: GzipCodec, +} + + +def _expected_codec_order_outcome(labels: list[str]) -> str: + """Independently predict codecs_from_list's outcome: 'TypeError', + 'ValueError' or 'ok', mirroring its left-to-right scan and the order in + which it checks ordering violations (TypeError) vs. the ArrayBytes-count + constraints (ValueError).""" + prev = None + seen_array_bytes = False + for cur in labels: + if cur == _AA: + if prev in (_AB, _BB): + return "TypeError" + elif cur == _AB: + if prev == _BB: + return "TypeError" + if seen_array_bytes: + return "ValueError" # two ArrayBytesCodecs + seen_array_bytes = True + else: # _BB + if prev == _AA: + return "TypeError" + prev = cur + if not seen_array_bytes: + return "ValueError" # Required ArrayBytesCodec was not found + return "ok" + + +@given(labels=st.lists(st.sampled_from([_AA, _AB, _BB]), min_size=1, max_size=5)) +def test_codecs_from_list_outcome_matches_order_rules(labels: list[str]) -> None: + codecs = [_CODEC_FACTORY[label]() for label in labels] + expected = _expected_codec_order_outcome(labels) + if expected == "TypeError": + with pytest.raises(TypeError): + codecs_from_list(codecs) + elif expected == "ValueError": + with pytest.raises(ValueError): + codecs_from_list(codecs) + else: + # Valid ordering: must classify without raising. + aa, _ab, bb = codecs_from_list(codecs) + assert labels.count(_AA) == len(aa) + assert labels.count(_BB) == len(bb) diff --git a/tests/test_store/test_zip.py b/tests/test_store/test_zip.py index 3e4b2b10d2..ed69114b51 100644 --- a/tests/test_store/test_zip.py +++ b/tests/test_store/test_zip.py @@ -8,11 +8,20 @@ import numpy as np import pytest +from hypothesis import settings +from hypothesis.stateful import ( + RuleBasedStateMachine, + initialize, + precondition, + rule, + run_state_machine_as_test, +) import zarr from zarr import create_array from zarr.core.buffer import Buffer, cpu, default_buffer_prototype from zarr.core.group import Group +from zarr.core.sync import sync from zarr.storage import ZipStore from zarr.testing.store import StoreTests @@ -178,16 +187,65 @@ async def test_move(self, tmp_path: Path) -> None: assert not origin.exists() assert np.array_equal(array[...], np.arange(10)) - def test_close_without_open_does_not_raise(self, tmp_path: Path) -> None: - """Regression: ZipStore.close() must not raise AttributeError when the store - was never opened (i.e., _sync_open was never called and _lock does not exist).""" - store = ZipStore(tmp_path / "never_opened.zip", mode="w") - assert not store._is_open - # Should complete without raising AttributeError. - store.close() - def test_context_manager_without_io_does_not_raise(self, tmp_path: Path) -> None: - """Regression: using ZipStore as a context manager without doing any I/O - must not raise AttributeError on __exit__ (which calls close()).""" - with ZipStore(tmp_path / "ctx_manager.zip", mode="w"): - pass # no I/O; store is never explicitly opened +class ZipStoreLifecycleMachine(RuleBasedStateMachine): + """Drive a ZipStore through construct / open / write / close transitions. + + Invariant under test: a constructed ZipStore can always be closed without + raising, regardless of whether it was ever opened or did any I/O. This is a + property-based generalization of the former example-based regression tests + for ZipStore.close() being called on a never-opened store (which raised + AttributeError because ``_lock`` is created lazily in ``_sync_open``). + """ + + def __init__(self, tmp_path: Path) -> None: + super().__init__() + self._tmp_path = tmp_path + self._counter = 0 + self.store: ZipStore | None = None + self._opened = False + + @initialize() + def start(self) -> None: + self.store = None + self._opened = False + + @precondition(lambda self: self.store is None) + @rule() + def construct(self) -> None: + # Fresh path each time so mode="w" never clobbers a closed archive. + self._counter += 1 + self.store = ZipStore(self._tmp_path / f"s{self._counter}.zip", mode="w") + self._opened = False + + @precondition(lambda self: self.store is not None and not self._opened) + @rule() + def open(self) -> None: + assert self.store is not None + self.store._sync_open() + self._opened = True + + @precondition(lambda self: self.store is not None and not self._opened) + @rule() + def write(self) -> None: + assert self.store is not None + # store.set auto-opens the store. + sync(self.store.set("a", cpu.Buffer.from_bytes(b"hi"))) + self._opened = True + + @precondition(lambda self: self.store is not None) + @rule() + def close(self) -> None: + assert self.store is not None + # The property under test: close() must never raise, even with no + # prior open or I/O. + self.store.close() + self.store = None + self._opened = False + + +def test_zipstore_close_lifecycle(tmp_path: Path) -> None: + run_state_machine_as_test( # type: ignore[no-untyped-call] + lambda: ZipStoreLifecycleMachine(tmp_path), + settings=settings(max_examples=50, deadline=None), + ) From daa124c218a8f04dc1e45405ef8cc8676e0d0d90 Mon Sep 17 00:00:00 2001 From: Davis Vann Bennett Date: Thu, 18 Jun 2026 12:27:25 +0200 Subject: [PATCH 4/7] test: consolidate the five bugfix regressions into property/shared tests Make each of the five bugs fixed in this PR catchable by a property/stateful or shared-harness test (verified red-green against the pre-fix source), and remove the per-store special-case tests whose coverage is now subsumed: - B1 (suffix clamp): broaden the `key_ranges` strategy to emit SuffixByteRequest / OffsetByteRequest / None with bounds that can exceed the value length, and give the stateful `get_partial_values` rule an independent oracle for every ByteRequest variant. The pure-function unit test `TestNormalizeByteRangeIndex` is kept as a fast deterministic guard. - B2 (generator exhaustion): the shared `StoreTests.test_get_partial_values` now passes `key_ranges` as a one-shot generator and asserts one result per request, so any store/wrapper that exhausts the iterable early is caught across all stores. The logging-only regression test is removed as redundant. - B3 (getsize_prefix sibling over-counting): the shared `StoreTests.test_getsize_prefix` now includes a sibling key ("cc/0") that must be excluded, covering every store deterministically. Add a matching `getsize_prefix` rule to the store state machine. The memory-only regression test is removed as redundant. - B4 (ZipStore.close before open): replace the two example-based regression tests with a stateful lifecycle machine asserting `close()` never raises, regardless of open/IO history. - B5 (missing raise in codec order validation): replace the single example with a property test over random {AA,AB,BB} codec orderings whose expected outcome (TypeError / ValueError / ok) is modelled independently. Co-Authored-By: Claude Opus 4.8 (1M context) --- src/zarr/testing/stateful.py | 53 +++++++++++++++++--- src/zarr/testing/store.py | 23 +++++++-- src/zarr/testing/strategies.py | 31 ++++++++---- tests/test_codec_pipeline.py | 84 +++++++++++++++++++++++++++----- tests/test_store/test_logging.py | 26 ---------- tests/test_store/test_memory.py | 19 -------- tests/test_store/test_zip.py | 82 ++++++++++++++++++++++++++----- 7 files changed, 228 insertions(+), 90 deletions(-) diff --git a/src/zarr/testing/stateful.py b/src/zarr/testing/stateful.py index d6c43f4ecc..9817ebd618 100644 --- a/src/zarr/testing/stateful.py +++ b/src/zarr/testing/stateful.py @@ -1,6 +1,6 @@ import builtins import functools -from collections.abc import Callable +from collections.abc import Callable, Iterable from typing import Any, cast import hypothesis.extra.numpy as npst @@ -18,7 +18,12 @@ import zarr from zarr import Array -from zarr.abc.store import Store +from zarr.abc.store import ( + OffsetByteRequest, + RangeByteRequest, + Store, + SuffixByteRequest, +) from zarr.codecs.bytes import BytesCodec from zarr.core.buffer import Buffer, BufferPrototype, cpu, default_buffer_prototype from zarr.core.sync import SyncMixin @@ -460,7 +465,7 @@ def get(self, key: str, prototype: BufferPrototype) -> Buffer | None: return self._sync(self.store.get(key, prototype=prototype)) def get_partial_values( - self, key_ranges: builtins.list[Any], prototype: BufferPrototype + self, key_ranges: Iterable[Any], prototype: BufferPrototype ) -> builtins.list[Buffer | None]: return self._sync(self.store.get_partial_values(prototype=prototype, key_ranges=key_ranges)) @@ -476,6 +481,9 @@ def clear(self) -> None: def exists(self, key: str) -> bool: return self._sync(self.store.exists(key)) + def getsize_prefix(self, prefix: str) -> int: + return self._sync(self.store.getsize_prefix(prefix)) + def list_dir(self, prefix: str) -> None: raise NotImplementedError @@ -555,7 +563,9 @@ def get_partial_values(self, data: DataObject) -> None: key_ranges(keys=st.sampled_from(sorted(self.model.keys())), max_size=MAX_BINARY_SIZE) ) note(f"(get partial) {key_range=}") - obs_maybe = self.store.get_partial_values(key_range, self.prototype) + # Pass a one-shot generator rather than a list: stores (and wrappers such + # as LoggingStore) must not exhaust the iterable before using it. + obs_maybe = self.store.get_partial_values((kr for kr in key_range), self.prototype) observed = [] for obs in obs_maybe: @@ -565,9 +575,23 @@ def get_partial_values(self, data: DataObject) -> None: model_vals_ls = [] for key, byte_range in key_range: - start = byte_range.start - stop = byte_range.end - model_vals_ls.append(self.model[key][start:stop]) + # Independently model each ByteRequest variant (do NOT reuse the + # store's _normalize_byte_range_index helper, so this stays an + # independent oracle). Bounds may exceed the value length. + value = self.model[key] + n = len(value) + if byte_range is None: + expected = value[:] + elif isinstance(byte_range, RangeByteRequest): + expected = value[byte_range.start : byte_range.end] + elif isinstance(byte_range, OffsetByteRequest): + expected = value[byte_range.offset :] + elif isinstance(byte_range, SuffixByteRequest): + # "last suffix bytes"; suffix > n means the whole value. + expected = value[max(0, n - byte_range.suffix) :] + else: + raise AssertionError(f"unexpected byte_range {byte_range!r}") + model_vals_ls.append(expected) assert all( obs == exp.to_bytes() for obs, exp in zip(observed, model_vals_ls, strict=True) @@ -612,6 +636,21 @@ def exists(self, key: str) -> None: assert self.store.exists(key) == (key in self.model) + @precondition(lambda self: len(self.model.keys()) > 0) + @rule(data=st.data()) + def getsize_prefix(self, data: DataObject) -> None: + # Measure the size under the first path segment of some existing key. + # getsize_prefix(node) must count only keys under the directory "node/", + # not sibling keys that merely share the string prefix (e.g. measuring + # "a" must not include a sibling key "ab/..."). + key = data.draw(st.sampled_from(sorted(self.model.keys()))) + node = key.split("/")[0] + note(f"(getsize_prefix) {node=}") + + observed = self.store.getsize_prefix(node) + expected = sum(len(value) for k, value in self.model.items() if k.startswith(node + "/")) + assert observed == expected, (observed, expected, node) + @invariant() def check_paths_equal(self) -> None: note("Checking that paths are equal") diff --git a/src/zarr/testing/store.py b/src/zarr/testing/store.py index 81024c85c8..11ceeee83a 100644 --- a/src/zarr/testing/store.py +++ b/src/zarr/testing/store.py @@ -299,10 +299,16 @@ async def test_getsize(self, store: S, key: str, data: bytes) -> None: async def test_getsize_prefix(self, store: S) -> None: """ Test the result of store.getsize_prefix(). + + Includes a sibling key ("cc/0") that shares the string prefix "c" but + belongs to a different directory: getsize_prefix("c") must not count it, + i.e. the prefix is matched as a directory ("c/...") not a raw substring. """ data_buf = self.buffer_cls.from_bytes(b"\x01\x02\x03\x04") keys = ["c/0/0", "c/0/1", "c/1/0", "c/1/1"] - keys_values = [(k, data_buf) for k in keys] + # Sibling directory sharing the "c" string prefix; must be excluded. + sibling_keys = ["cc/0"] + keys_values = [(k, data_buf) for k in keys + sibling_keys] await store._set_many(keys_values) expected = len(data_buf) * len(keys) observed = await store.getsize_prefix("c") @@ -370,11 +376,19 @@ async def test_get_partial_values( for key, _ in key_ranges: await self.set(store, key, self.buffer_cls.from_bytes(bytes(key, encoding="utf-8"))) - # read back just part of it + # read back just part of it. Pass key_ranges as a one-shot generator + # (a valid Iterable per the method signature) to ensure stores and + # wrappers do not exhaust the iterable before handing it to the backend. observed_maybe = await store.get_partial_values( - prototype=default_buffer_prototype(), key_ranges=key_ranges + prototype=default_buffer_prototype(), + key_ranges=(kr for kr in key_ranges), ) + # One result must be returned per requested key range. Checking this + # explicitly guards against a store/wrapper exhausting the key_ranges + # iterable early and silently returning fewer (or no) results. + assert len(observed_maybe) == len(key_ranges) + observed: list[Buffer] = [] expected: list[Buffer] = [] @@ -382,8 +396,7 @@ async def test_get_partial_values( assert obs is not None observed.append(obs) - for idx in range(len(observed)): - key, byte_range = key_ranges[idx] + for key, byte_range in key_ranges: result = await store.get( key, prototype=default_buffer_prototype(), byte_range=byte_range ) diff --git a/src/zarr/testing/strategies.py b/src/zarr/testing/strategies.py index 7d6556a359..0ef1ba99bb 100644 --- a/src/zarr/testing/strategies.py +++ b/src/zarr/testing/strategies.py @@ -11,7 +11,13 @@ from hypothesis.strategies import SearchStrategy import zarr -from zarr.abc.store import RangeByteRequest, Store +from zarr.abc.store import ( + ByteRequest, + OffsetByteRequest, + RangeByteRequest, + Store, + SuffixByteRequest, +) from zarr.codecs.bytes import BytesCodec from zarr.codecs.crc32c_ import Crc32cCodec from zarr.codecs.sharding import SUBCHUNK_WRITE_ORDER, ShardingCodec, SubchunkWriteOrder @@ -654,22 +660,29 @@ def predicate(value: tuple[Any, ...]) -> bool: def key_ranges( keys: SearchStrategy[str] = node_names, max_size: int = sys.maxsize -) -> SearchStrategy[list[tuple[str, RangeByteRequest]]]: +) -> SearchStrategy[list[tuple[str, ByteRequest | None]]]: """ Function to generate key_ranges strategy for get_partial_values() returns list strategy w/ form:: - [(key, (range_start, range_end)), - (key, (range_start, range_end)),...] + [(key, byte_request), + (key, byte_request),...] + + where ``byte_request`` is ``None`` or any of the concrete ``ByteRequest`` + subtypes. The bounds are drawn independently of each value's length, so the + offsets/suffixes routinely exceed the data and exercise the clamping logic + in ``_normalize_byte_range_index``. """ - def make_request(start: int, length: int) -> RangeByteRequest: + def make_range(start: int, length: int) -> RangeByteRequest: return RangeByteRequest(start, end=min(start + length, max_size)) - byte_ranges = st.builds( - make_request, - start=st.integers(min_value=0, max_value=max_size), - length=st.integers(min_value=0, max_value=max_size), + bound = st.integers(min_value=0, max_value=max_size) + byte_ranges: SearchStrategy[ByteRequest | None] = st.one_of( + st.none(), + st.builds(make_range, start=bound, length=bound), + st.builds(OffsetByteRequest, offset=bound), + st.builds(SuffixByteRequest, suffix=bound), ) key_tuple = st.tuples(keys, byte_ranges) return st.lists(key_tuple, min_size=1, max_size=10) diff --git a/tests/test_codec_pipeline.py b/tests/test_codec_pipeline.py index 780f7c98bd..4d596164db 100644 --- a/tests/test_codec_pipeline.py +++ b/tests/test_codec_pipeline.py @@ -1,8 +1,15 @@ from __future__ import annotations +from typing import TYPE_CHECKING + import numpy as np import pytest +pytest.importorskip("hypothesis") + +import hypothesis.strategies as st +from hypothesis import given + import zarr from zarr.codecs import BytesCodec, CastValue, GzipCodec, TransposeCodec from zarr.core.array import _get_chunk_spec @@ -11,6 +18,11 @@ from zarr.core.indexing import BasicIndexer from zarr.storage import MemoryStore +if TYPE_CHECKING: + from collections.abc import Callable + + from zarr.abc.codec import Codec + @pytest.mark.parametrize( ("write_slice", "read_slice", "expected_statuses"), @@ -123,15 +135,63 @@ def test_codec_pipeline_threads_dtype_through_evolve(source_dtype: str, target_d np.testing.assert_array_equal(arr[:], np.asarray([0, 1, 2, 3], dtype=source_dtype)) -def test_codecs_from_list_bytes_bytes_after_array_array_raises() -> None: - """Regression: placing a BytesBytesCodec immediately after an ArrayArrayCodec - (with no ArrayBytesCodec in between) must raise TypeError. Previously the error - message was assigned to ``msg`` but never raised, so the function silently - continued and later raised the unrelated ValueError "Required ArrayBytesCodec - was not found." instead of the descriptive TypeError.""" - aa_codec = TransposeCodec(order=(0, 1)) - bb_codec = GzipCodec() - - # [ArrayArrayCodec, BytesBytesCodec] — no ArrayBytesCodec between them. - with pytest.raises(TypeError, match="Invalid codec order"): - codecs_from_list([aa_codec, bb_codec]) +# Property-based check of codecs_from_list ordering validation. +# +# Valid codec orderings are exactly: (ArrayArrayCodec)* (ArrayBytesCodec) +# (BytesBytesCodec)*. codecs_from_list walks adjacent pairs and must raise +# TypeError the moment a codec appears in a structurally invalid position -- +# notably, a BytesBytesCodec immediately following an ArrayArrayCodec with no +# ArrayBytesCodec in between (which previously built an error message but never +# raised it, falling through to an unrelated ValueError instead). +_AA = "AA" # ArrayArrayCodec -> TransposeCodec +_AB = "AB" # ArrayBytesCodec -> BytesCodec +_BB = "BB" # BytesBytesCodec -> GzipCodec + +_CODEC_FACTORY: dict[str, Callable[[], Codec]] = { + _AA: lambda: TransposeCodec(order=(0, 1)), + _AB: BytesCodec, + _BB: GzipCodec, +} + + +def _expected_codec_order_outcome(labels: list[str]) -> str: + """Independently predict codecs_from_list's outcome: 'TypeError', + 'ValueError' or 'ok', mirroring its left-to-right scan and the order in + which it checks ordering violations (TypeError) vs. the ArrayBytes-count + constraints (ValueError).""" + prev = None + seen_array_bytes = False + for cur in labels: + if cur == _AA: + if prev in (_AB, _BB): + return "TypeError" + elif cur == _AB: + if prev == _BB: + return "TypeError" + if seen_array_bytes: + return "ValueError" # two ArrayBytesCodecs + seen_array_bytes = True + else: # _BB + if prev == _AA: + return "TypeError" + prev = cur + if not seen_array_bytes: + return "ValueError" # Required ArrayBytesCodec was not found + return "ok" + + +@given(labels=st.lists(st.sampled_from([_AA, _AB, _BB]), min_size=1, max_size=5)) +def test_codecs_from_list_outcome_matches_order_rules(labels: list[str]) -> None: + codecs = [_CODEC_FACTORY[label]() for label in labels] + expected = _expected_codec_order_outcome(labels) + if expected == "TypeError": + with pytest.raises(TypeError): + codecs_from_list(codecs) + elif expected == "ValueError": + with pytest.raises(ValueError): + codecs_from_list(codecs) + else: + # Valid ordering: must classify without raising. + aa, _ab, bb = codecs_from_list(codecs) + assert labels.count(_AA) == len(aa) + assert labels.count(_BB) == len(bb) diff --git a/tests/test_store/test_logging.py b/tests/test_store/test_logging.py index e08f8958b9..96cd184938 100644 --- a/tests/test_store/test_logging.py +++ b/tests/test_store/test_logging.py @@ -168,29 +168,3 @@ async def test_logging_store_counter(store: Store) -> None: else: assert wrapped.counter["get"] == 1 assert wrapped.counter["delete_dir"] == 0 - - -async def test_logging_store_get_partial_values_generator() -> None: - """Regression: get_partial_values must not exhaust a one-shot generator when - building the log string, leaving the wrapped store with an empty iterable.""" - from zarr.storage import MemoryStore - - inner = MemoryStore() - proto = default_buffer_prototype() - await inner.set("a", proto.buffer.from_bytes(b"hello")) - await inner.set("b", proto.buffer.from_bytes(b"world")) - - wrapped = LoggingStore(store=inner, log_level="DEBUG") - - # Pass a generator (one-shot iterator) rather than a list. - key_ranges_gen = ((key, None) for key in ("a", "b")) - results = await wrapped.get_partial_values( - prototype=proto, - key_ranges=key_ranges_gen, - ) - - assert len(results) == 2 - assert results[0] is not None - assert results[1] is not None - assert results[0].to_bytes() == b"hello" - assert results[1].to_bytes() == b"world" diff --git a/tests/test_store/test_memory.py b/tests/test_store/test_memory.py index 9435f470c0..1e3ee89e92 100644 --- a/tests/test_store/test_memory.py +++ b/tests/test_store/test_memory.py @@ -410,22 +410,3 @@ def test_garbage_collection(self) -> None: # URL should no longer resolve with pytest.raises(ValueError, match="garbage collected"): ManagedMemoryStore.from_url(url) - - -async def test_getsize_prefix_no_sibling_overcounting() -> None: - """Regression: getsize_prefix("foo") must not count keys under "foobar/" because - list_prefix matches any key that starts with the given prefix string, not just - those under that directory. The fix normalizes the prefix to end with "/".""" - from zarr.core.buffer.core import default_buffer_prototype - - store = MemoryStore() - proto = default_buffer_prototype() - - # "foo/a" belongs to the "foo" prefix tree; "foobar/b" does not. - await store.set("foo/a", proto.buffer.from_bytes(b"hello")) # 5 bytes - await store.set("foobar/b", proto.buffer.from_bytes(b"world!")) # 6 bytes - - size_foo = await store.getsize_prefix("foo") - assert size_foo == 5, ( - f"getsize_prefix('foo') should count only 'foo/a' (5 bytes), got {size_foo}" - ) diff --git a/tests/test_store/test_zip.py b/tests/test_store/test_zip.py index 3e4b2b10d2..ed69114b51 100644 --- a/tests/test_store/test_zip.py +++ b/tests/test_store/test_zip.py @@ -8,11 +8,20 @@ import numpy as np import pytest +from hypothesis import settings +from hypothesis.stateful import ( + RuleBasedStateMachine, + initialize, + precondition, + rule, + run_state_machine_as_test, +) import zarr from zarr import create_array from zarr.core.buffer import Buffer, cpu, default_buffer_prototype from zarr.core.group import Group +from zarr.core.sync import sync from zarr.storage import ZipStore from zarr.testing.store import StoreTests @@ -178,16 +187,65 @@ async def test_move(self, tmp_path: Path) -> None: assert not origin.exists() assert np.array_equal(array[...], np.arange(10)) - def test_close_without_open_does_not_raise(self, tmp_path: Path) -> None: - """Regression: ZipStore.close() must not raise AttributeError when the store - was never opened (i.e., _sync_open was never called and _lock does not exist).""" - store = ZipStore(tmp_path / "never_opened.zip", mode="w") - assert not store._is_open - # Should complete without raising AttributeError. - store.close() - def test_context_manager_without_io_does_not_raise(self, tmp_path: Path) -> None: - """Regression: using ZipStore as a context manager without doing any I/O - must not raise AttributeError on __exit__ (which calls close()).""" - with ZipStore(tmp_path / "ctx_manager.zip", mode="w"): - pass # no I/O; store is never explicitly opened +class ZipStoreLifecycleMachine(RuleBasedStateMachine): + """Drive a ZipStore through construct / open / write / close transitions. + + Invariant under test: a constructed ZipStore can always be closed without + raising, regardless of whether it was ever opened or did any I/O. This is a + property-based generalization of the former example-based regression tests + for ZipStore.close() being called on a never-opened store (which raised + AttributeError because ``_lock`` is created lazily in ``_sync_open``). + """ + + def __init__(self, tmp_path: Path) -> None: + super().__init__() + self._tmp_path = tmp_path + self._counter = 0 + self.store: ZipStore | None = None + self._opened = False + + @initialize() + def start(self) -> None: + self.store = None + self._opened = False + + @precondition(lambda self: self.store is None) + @rule() + def construct(self) -> None: + # Fresh path each time so mode="w" never clobbers a closed archive. + self._counter += 1 + self.store = ZipStore(self._tmp_path / f"s{self._counter}.zip", mode="w") + self._opened = False + + @precondition(lambda self: self.store is not None and not self._opened) + @rule() + def open(self) -> None: + assert self.store is not None + self.store._sync_open() + self._opened = True + + @precondition(lambda self: self.store is not None and not self._opened) + @rule() + def write(self) -> None: + assert self.store is not None + # store.set auto-opens the store. + sync(self.store.set("a", cpu.Buffer.from_bytes(b"hi"))) + self._opened = True + + @precondition(lambda self: self.store is not None) + @rule() + def close(self) -> None: + assert self.store is not None + # The property under test: close() must never raise, even with no + # prior open or I/O. + self.store.close() + self.store = None + self._opened = False + + +def test_zipstore_close_lifecycle(tmp_path: Path) -> None: + run_state_machine_as_test( # type: ignore[no-untyped-call] + lambda: ZipStoreLifecycleMachine(tmp_path), + settings=settings(max_examples=50, deadline=None), + ) From 3435afa5a8767edb3e85faca22841257a6e4c28c Mon Sep 17 00:00:00 2001 From: Davis Vann Bennett Date: Thu, 18 Jun 2026 13:10:33 +0200 Subject: [PATCH 5/7] fix(fsspec): make get_partial_values robust to one-shot key_ranges FsspecStore.get_partial_values guarded its body with a bare `if key_ranges:` before materialising the argument. Because `key_ranges` is typed as a generic Iterable, a one-shot generator is always truthy even when empty, so the `else: return []` fast path was unreachable for generator inputs (it happened to be harmless only because fsspec's _cat_ranges returns [] for empty input). Materialise `key_ranges` into a list first, then check `if not key_ranges`, matching the iterable contract for any Iterable (list or generator). Also enable mypy's `truthy-iterable` error code, which flags exactly this class of bug (truthiness test on a non-Collection Iterable). There are no existing violations in src/zarr or tests, so this is a zero-cost guard against recurrence. Co-Authored-By: Claude Opus 4.8 (1M context) --- pyproject.toml | 2 +- src/zarr/storage/_fsspec.py | 47 +++++++++++++++++++------------------ 2 files changed, 25 insertions(+), 24 deletions(-) diff --git a/pyproject.toml b/pyproject.toml index 9b372192e9..6f6c7265a5 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -393,7 +393,7 @@ show_error_code_links = true show_error_context = true strict = true warn_unreachable = true -enable_error_code = ["ignore-without-code", "redundant-expr", "truthy-bool"] +enable_error_code = ["ignore-without-code", "redundant-expr", "truthy-bool", "truthy-iterable"] [[tool.mypy.overrides]] module = [ diff --git a/src/zarr/storage/_fsspec.py b/src/zarr/storage/_fsspec.py index 29201a6fee..89d788af1a 100644 --- a/src/zarr/storage/_fsspec.py +++ b/src/zarr/storage/_fsspec.py @@ -424,30 +424,31 @@ async def get_partial_values( key_ranges: Iterable[tuple[str, ByteRequest | None]], ) -> list[Buffer | None]: # docstring inherited - if key_ranges: - # _cat_ranges expects a list of paths, start, and end ranges, so we need to reformat each ByteRequest. - key_ranges = list(key_ranges) - paths: list[str] = [] - starts: list[int | None] = [] - stops: list[int | None] = [] - for key, byte_range in key_ranges: - paths.append(_dereference_path(self.path, key)) - if byte_range is None: - starts.append(None) - stops.append(None) - elif isinstance(byte_range, RangeByteRequest): - starts.append(byte_range.start) - stops.append(byte_range.end) - elif isinstance(byte_range, OffsetByteRequest): - starts.append(byte_range.offset) - stops.append(None) - elif isinstance(byte_range, SuffixByteRequest): - starts.append(-byte_range.suffix) - stops.append(None) - else: - raise ValueError(f"Unexpected byte_range, got {byte_range}.") - else: + # Materialise first: key_ranges may be a one-shot iterable, so a bare + # truthiness check (e.g. `if key_ranges`) would be unreliable for an + # empty generator. _cat_ranges also expects lists of paths/starts/stops. + key_ranges = list(key_ranges) + if not key_ranges: return [] + paths: list[str] = [] + starts: list[int | None] = [] + stops: list[int | None] = [] + for key, byte_range in key_ranges: + paths.append(_dereference_path(self.path, key)) + if byte_range is None: + starts.append(None) + stops.append(None) + elif isinstance(byte_range, RangeByteRequest): + starts.append(byte_range.start) + stops.append(byte_range.end) + elif isinstance(byte_range, OffsetByteRequest): + starts.append(byte_range.offset) + stops.append(None) + elif isinstance(byte_range, SuffixByteRequest): + starts.append(-byte_range.suffix) + stops.append(None) + else: + raise ValueError(f"Unexpected byte_range, got {byte_range}.") # TODO: expectations for exceptions or missing keys? res = await self.fs._cat_ranges(paths, starts, stops, on_error="return") # the following is an s3-specific condition we probably don't want to leak From 2e3559c52f8b8beed3b6fb5a5b222fa2f7efb10e Mon Sep 17 00:00:00 2001 From: Davis Vann Bennett Date: Thu, 18 Jun 2026 13:19:34 +0200 Subject: [PATCH 6/7] docs: add changelog fragment for the storage and codec bugfixes Co-Authored-By: Claude Opus 4.8 (1M context) --- changes/4074.bugfix.md | 7 +++++++ 1 file changed, 7 insertions(+) create mode 100644 changes/4074.bugfix.md diff --git a/changes/4074.bugfix.md b/changes/4074.bugfix.md new file mode 100644 index 0000000000..43ca46e42f --- /dev/null +++ b/changes/4074.bugfix.md @@ -0,0 +1,7 @@ +Fixed several storage and codec bugs: + +- A `SuffixByteRequest` asking for more bytes than a value contains now returns the whole value, instead of fewer bytes than requested. +- `LoggingStore.get_partial_values` and `FsspecStore.get_partial_values` no longer return empty results when `key_ranges` is passed as a one-shot iterable (e.g. a generator). +- `Store.getsize_prefix` no longer over-counts sibling keys that merely share a string prefix (e.g. `getsize_prefix("foo")` no longer includes keys under `foobar/`). +- `ZipStore.close()` no longer raises `AttributeError` when the store was created but never opened (including when used as a context manager without any I/O). +- `codecs_from_list` now raises a descriptive `TypeError` when a `BytesBytesCodec` immediately follows an `ArrayArrayCodec`, instead of a misleading "Required ArrayBytesCodec was not found" `ValueError`. From 591ef453e69f9cdbad21599bc2903fa30fd25d5f Mon Sep 17 00:00:00 2001 From: Davis Vann Bennett Date: Thu, 18 Jun 2026 13:22:17 +0200 Subject: [PATCH 7/7] docs: clarify SuffixByteRequest changelog entry The pre-fix MemoryStore behavior returned incorrect data (a wrong slice via a negative start index), not merely fewer bytes; note the HTTP suffix-range semantics that define the correct behavior. Co-Authored-By: Claude Opus 4.8 (1M context) --- changes/4074.bugfix.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/changes/4074.bugfix.md b/changes/4074.bugfix.md index 43ca46e42f..d55a52b887 100644 --- a/changes/4074.bugfix.md +++ b/changes/4074.bugfix.md @@ -1,6 +1,6 @@ Fixed several storage and codec bugs: -- A `SuffixByteRequest` asking for more bytes than a value contains now returns the whole value, instead of fewer bytes than requested. +- Reading a value with a `SuffixByteRequest` larger than the value now correctly returns the whole value (matching HTTP `bytes=-N` suffix-range semantics), instead of silently returning incorrect data for `MemoryStore`. - `LoggingStore.get_partial_values` and `FsspecStore.get_partial_values` no longer return empty results when `key_ranges` is passed as a one-shot iterable (e.g. a generator). - `Store.getsize_prefix` no longer over-counts sibling keys that merely share a string prefix (e.g. `getsize_prefix("foo")` no longer includes keys under `foobar/`). - `ZipStore.close()` no longer raises `AttributeError` when the store was created but never opened (including when used as a context manager without any I/O).