blosc2.compress2() followed by blosc2.decompress2() fails to round-trip an
all-zeros input whose byte length is not a multiple of typesize.
compress2 emits a 32-byte "all-zeros special value" frame that decompress2
then refuses to decode, raising:
ValueError: Error while decompressing, check the src data and/or the dparams
The data is silently lost at compress time (the 32-byte frame does not encode
the true length correctly), so this is a data-corruption bug, not merely a
decode-side error.
Environment
- python-blosc2: 4.5.1
- c-blosc2: 3.1.4 (2026-06-17)
- Python: 3.12.11
- numpy: 2.1.2 (not required to reproduce)
- OS: Linux 6.17 x86_64 (glibc 2.39)
Also reproduced on python-blosc2 4.3.3 / c-blosc2 earlier, so this is not a
recent regression.
Minimal reproduction (no numpy)
import blosc2
data = bytes(707658) # all zeros; 707658 % 8 == 2 (NOT a multiple of typesize 8)
c = blosc2.compress2(data, typesize=8)
print(len(c)) # -> 32 (all-zeros special-value frame)
blosc2.decompress2(c) # -> ValueError: Error while decompressing, ...
Trigger conditions (all three required)
- The input is all zeros (triggers blosc2's zero special-value frame; the
compressed output is 32 bytes regardless of input size).
- The input byte length is not a multiple of
typesize.
- Any codec — reproduced with
ZSTD, LZ4, and BLOSCLZ.
If any of these does not hold, the round-trip succeeds.
Controls (all behave correctly)
import blosc2
# Length IS a multiple of typesize -> OK
blosc2.decompress2(blosc2.compress2(bytes(707656), typesize=8)) # OK (707656 % 8 == 0)
# typesize=1 -> every length is a multiple -> OK
blosc2.decompress2(blosc2.compress2(bytes(707658), typesize=1)) # OK
# Non-zero data at the same (non-multiple) length -> OK
blosc2.decompress2(blosc2.compress2(b"\x07" * 707658, typesize=8)) # OK (clen=86, not the 32-byte zero frame)
# Random/incompressible data at the same length -> OK
Divisibility sweep (all-zeros, typesize=8)
| length |
length % 8 |
result |
| 80000 |
0 |
OK |
| 80001 |
1 |
FAIL |
| 80007 |
7 |
FAIL |
| 80008 |
0 |
OK |
| 707656 |
0 |
OK |
| 707658 |
2 |
FAIL |
| 707664 |
0 |
OK |
Same pattern for typesize=4 (fails unless len % 4 == 0) and typesize=2
(fails unless len % 2 == 0). typesize=1 always succeeds.
Related observation
The blosc1-compatibility API guards against this by rejecting non-multiple
lengths up front:
blosc2.compress(bytes(707658), typesize=8)
# ValueError: len(src) can only be a multiple of typesize (8).
compress2 instead accepts the same input and produces a frame that cannot be
decompressed. It should either apply the same validation, or (preferably)
correctly handle a trailing partial element in the all-zeros special-value path.
Impact
Real-world hit: we compress arbitrary numpy arrays as raw byte streams. An
all-zeros region (e.g. a cleared/blank segmentation tile) of 166*49*87 = 707658 bytes silently produced an undecodable frame, surfacing only at
decompress time on the receiving end. Passing typesize=1 is a safe workaround
for byte-stream payloads, but the underlying compress2/decompress2
inconsistency looks like a genuine bug.
Workaround
Pass typesize=1 when compressing a raw byte stream (or otherwise ensure the
length is a multiple of typesize).
---
*Repo to file against: https://github.com/Blosc/python-blosc2 (route to
c-blosc2 if the fault is in the special-value frame codec).*
blosc2.compress2()followed byblosc2.decompress2()fails to round-trip anall-zeros input whose byte length is not a multiple of
typesize.compress2emits a 32-byte "all-zeros special value" frame thatdecompress2then refuses to decode, raising:
The data is silently lost at compress time (the 32-byte frame does not encode
the true length correctly), so this is a data-corruption bug, not merely a
decode-side error.
Environment
Also reproduced on python-blosc2 4.3.3 / c-blosc2 earlier, so this is not a
recent regression.
Minimal reproduction (no numpy)
Trigger conditions (all three required)
compressed output is 32 bytes regardless of input size).
typesize.ZSTD,LZ4, andBLOSCLZ.If any of these does not hold, the round-trip succeeds.
Controls (all behave correctly)
Divisibility sweep (all-zeros,
typesize=8)Same pattern for
typesize=4(fails unlesslen % 4 == 0) andtypesize=2(fails unless
len % 2 == 0).typesize=1always succeeds.Related observation
The blosc1-compatibility API guards against this by rejecting non-multiple
lengths up front:
compress2instead accepts the same input and produces a frame that cannot bedecompressed. It should either apply the same validation, or (preferably)
correctly handle a trailing partial element in the all-zeros special-value path.
Impact
Real-world hit: we compress arbitrary numpy arrays as raw byte streams. An
all-zeros region (e.g. a cleared/blank segmentation tile) of
166*49*87 = 707658bytes silently produced an undecodable frame, surfacing only atdecompress time on the receiving end. Passing
typesize=1is a safe workaroundfor byte-stream payloads, but the underlying
compress2/decompress2inconsistency looks like a genuine bug.
Workaround
Pass
typesize=1when compressing a raw byte stream (or otherwise ensure thelength is a multiple of
typesize).