Skip to content

Fix S3 suffix byte-range requests when reading sharded chunks#80

Open
konstibob wants to merge 2 commits into
zarr-developers:mainfrom
konstibob:fix/sub_shard_read
Open

Fix S3 suffix byte-range requests when reading sharded chunks#80
konstibob wants to merge 2 commits into
zarr-developers:mainfrom
konstibob:fix/sub_shard_read

Conversation

@konstibob

Copy link
Copy Markdown

Reading a single chunk from a sharded array stored on S3 failed with a checksum
error. To locate a chunk inside a shard, the sharding codec first reads the shard
index at the end of the shard object via a suffix read
(StoreHandle.read(-indexByteLength)).

S3Store.get(long start, ...) always formatted the HTTP range header as
bytes=<start>-. For a negative start this produced an invalid header like
bytes=-16389-, which S3 does not accept as a suffix range, causing the request
to fail.

Fix

When start is negative, emit the correct suffix-range form:

bytes=-<N>

instead of bytes=<start>-.

Testing

Added testReadSuffix to StoreTest, verifying that a suffix read returns
exactly the last 10 bytes of the object.

@konstibob konstibob force-pushed the fix/sub_shard_read branch 2 times, most recently from 726dea6 to 4b3311d Compare July 2, 2026 14:01
@sbesson

sbesson commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

@konstibob thanks for opening this. Is this fix also addressing the bug described in #79 ?
We are happy to help with the functional testing but could you exclude the large code refactoring and keep the changes atomic and restricted to 4b3311d.

#78 will need to be reviewed separately especially given the backwards-incompatible implications for downstream libraries consuming zarr-java

@konstibob konstibob force-pushed the fix/sub_shard_read branch from 4b3311d to deb1de1 Compare July 3, 2026 09:23
@konstibob

konstibob commented Jul 3, 2026

Copy link
Copy Markdown
Author

@sbesson
Thanks for the quick review!

Yes it this PR fixes #79 only! this fixes exactly that.

Regarding keeping it atomic: Agreed. I've restructured this PR so it now contains only the checksum fix on top of main, with the module-extraction refactoring removed entirely. The fix doesn't depend on that refactor, so this PR is now fully standalone and safe to merge on its own.

@sbesson

sbesson commented Jul 3, 2026

Copy link
Copy Markdown
Contributor

Thanks @konstibob but deb1de1 still contains the whole module refactoring history

@konstibob konstibob force-pushed the fix/sub_shard_read branch from deb1de1 to 726dea6 Compare July 3, 2026 14:47
@konstibob

Copy link
Copy Markdown
Author

fixed it!

@konstibob

Copy link
Copy Markdown
Author

Move testReadSuffix to OnlineS3StoreTest; it only applies to S3Store's suffix-range read, not every store.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants