MINOR: Fix IOOBE in ListVector/LargeListVector.getFieldBuffers() for never-allocated offset buffer#27
Conversation
apacheGH-343 (Jan 2026) changed setReaderAndWriterIndex() to unconditionally set offsetBuffer.writerIndex((valueCount + 1) * OFFSET_WIDTH) so that even empty lists export the trailing-zero offset required by the Arrow IPC spec. However, unlike the analogous fix in BaseVariableWidthVector, this change did not include a realloc guard — when the offset buffer has never been allocated (capacity == 0), the writerIndex is set to 4 on a zero-capacity buffer, producing the inconsistent state: readerIndex: 0, writerIndex: 4, capacity(0) Netty's AbstractByteBuf.writerIndex() bounds-checks against capacity and throws IndexOutOfBoundsException. This crashes Dremio's FragmentWritableBatch serialization path (SingleSenderOperator, PartitionSenderOperator, etc.) when an empty ListVector reaches the VectorUnloader -> NettyArrowBuf.unwrapBuffer() chain. The fix adds the same realloc guard that BaseVariableWidthVector already uses: grow the offset buffer to the required size before setting writerIndex. This mirrors what exportCDataBuffers() already does for the capacity == 0 case. Affects both ListVector (OFFSET_WIDTH = 4) and LargeListVector (OFFSET_WIDTH = 8). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Change-Id: Ib5ca53064ea5bc69c43fd6cc09692b07199c586c
Tests exercise the exact bug path: setValueCount(0) on a vector where allocateNew() was never called, leaving the offset buffer at capacity 0. - testEmptyListOffsetBufferWithoutAllocate: verifies getFieldBuffers() produces a valid offset buffer after the realloc guard fires - testEmptyListGetBuffersWithoutAllocate: exercises getBuffers(false), the IPC serialization entry point that produced the original Netty IOOBE via VectorUnloader -> NettyArrowBuf.unwrapBuffer() - Analogous tests for LargeListVector The existing testEmptyListOffsetBuffer / testEmptyLargeListOffsetBuffer tests call allocateNew() before setValueCount(0), so the offset buffer always has nonzero capacity and the realloc guard is never entered. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Change-Id: I1972ad3d80f405ed9b41103e88902571b63defb0
The previous guard (capacity < required) could replace an already-allocated offset buffer with a smaller one when setReaderAndWriterIndex() was called before the vector was populated (e.g., during validateFull() on an empty vector). The replacement buffer was too small for subsequent writes. Narrowing to capacity==0 targets only the never-allocated empty singleton from allocator.getEmpty(), which is the exact state that causes the IOOBE. Buffers with capacity>0 were properly allocated and should not be replaced. Also removes the dead copy branch (capacity>0 is always false when we enter the guard). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Change-Id: I28ee94d2a2b431f7699ce8c3ce133f184ede5458
The previous approach mutated this.offsetBuffer inside setReaderAndWriterIndex(), which is called from getFieldBuffers(), getBuffers(), and indirectly from validation. Replacing the offset buffer during validation broke subsequent writes — tests that called validateFull() on an empty vector before populating data got a 4-byte buffer that was too small for the actual data. The fix now mirrors exportCDataBuffers() exactly: getFieldBuffers() detects the inconsistent state (capacity==0 but writerIndex>0, produced by setReaderAndWriterIndex on a never-allocated buffer) and substitutes a properly-sized temporary buffer for serialization. The vector's own offsetBuffer is never mutated, so subsequent allocateNew/setValueCount calls work normally. setReaderAndWriterIndex() is reverted to upstream Arrow 19 behavior. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Change-Id: I696163fac4fcf5ada1af0edb5da2f4e8ba4e3e84
Tests now check the buffer RETURNED by getFieldBuffers() (which is a temp allocation when the vector's offset buffer has capacity 0) instead of checking the vector's own getOffsetBuffer() which is intentionally not mutated. Remove testEmptyListGetBuffersWithoutAllocate / LargeList variant — getBuffers(false) returns an empty array for valueCount==0 (via the getBufferSize()==0 guard), so the offset buffer is never exposed through that path and doesn't need testing. Release the temp buffer after assertions to avoid memory leak detection. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Change-Id: I73877ba98047b6bda6fcf7737e53778deaa7cb74
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Change-Id: I47527f1d5934a122c7c7381c037c4f951b672592
|
Thank you for opening a pull request! Please label the PR with one or more of:
Also, add the 'breaking-change' label if appropriate. See CONTRIBUTING.md for details. |
The previous approach returned a temp buffer that was not owned by the vector. VectorUnloader.retain() added a ref but the original ref was never released, leaking OFFSET_WIDTH bytes per empty list serialized. Now getFieldBuffers() assigns the new buffer to this.offsetBuffer so the vector owns it. Uses allocateOffsetBuffer(offsetAllocationSizeInBytes) to get a full-sized buffer (no side effect since the argument matches the field value), which is large enough for subsequent writes if the vector is populated after validation — fixing the TestValidateVector regression. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Change-Id: I4d5caed0bf2d323215556d6a1a90330b4ab399e8
…Bytes Addresses review feedback from bodduv: LargeListVector.clear() sets offsetAllocationSizeInBytes = offsetBuffer.capacity(), which is 0 after the buffer is released. Using offsetAllocationSizeInBytes would allocate a 0-byte buffer in that case. (valueCount + 1) * OFFSET_WIDTH is always the correct minimum size (at least OFFSET_WIDTH when valueCount == 0) and matches the writerIndex that setReaderAndWriterIndex() just set. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Change-Id: Iae6b0a2578deb0f7fd8ca7ae36a0a7e3fc595765
…ression (valueCount + 1) * OFFSET_WIDTH produces a 4-byte buffer for valueCount=0, which is too small when validateFull() triggers getFieldBuffers() before the vector is populated — subsequent setVector() writes at index 4+ fail. Use max(required, INITIAL_VALUE_ALLOCATION * OFFSET_WIDTH) to match the default allocateNew() size. Use allocator.buffer() directly instead of allocateOffsetBuffer() to avoid mutating offsetAllocationSizeInBytes. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Change-Id: Iccb00d4775b196f7978b34cbc98234f50ede370a
| List<ArrowBuf> result = new ArrayList<>(2); | ||
| setReaderAndWriterIndex(); | ||
| result.add(validityBuffer); | ||
| if (offsetBuffer.capacity() == 0 && offsetBuffer.writerIndex() > 0) { |
There was a problem hiding this comment.
The only further tightening I could think of is to check offsetBuffer.capacity() < OFFSET_WIDTH, but I think the main issue was indeed with offsetBuffer.capacity() == 0.
There was a problem hiding this comment.
If we are dealing with emptyVector issue, wouldn't be better to check if valueCount == 0 here. This seems to be fixing the unrelated issue where offset.capacity == 0 but with writerIndex was set incorrectly somewhere else.
There was a problem hiding this comment.
Fixed in 2cd5472. The guard is now scoped to valueCount == 0, so this only handles the empty-vector serialization case. For non-empty vectors with a zero-capacity offset buffer we no longer synthesize zero offsets, since that would be an ambiguous/corrupt state rather than this bug.
There was a problem hiding this comment.
I think this approach may not fix the issue if writeIndex was set incorrectly.
🤔 were there profiles of failed queries other than capacity = 0, writerIndex = 4?
Edit:
If writerIndex is set incorrectly and capacity is not zero (but say lesser than writerIndex), then I am not sure what else we could do here.
There was a problem hiding this comment.
Exactly. Then it is a different issue in Apache Arrow. That I believe is beyond the scope of this fix
There was a problem hiding this comment.
From the customer profiles, I see this signature..
IndexOutOfBoundsException: readerIndex: 0, writerIndex: 4
Have to check in observe if all the profiles have it.. I checked in 10 profiles which Dmitry had given us
There was a problem hiding this comment.
Looks like this route is not the right way to fix.. there are more and more edge cases that we might hit.
| List<ArrowBuf> result = new ArrayList<>(2); | ||
| setReaderAndWriterIndex(); | ||
| result.add(validityBuffer); | ||
| if (offsetBuffer.capacity() == 0 && offsetBuffer.writerIndex() > 0) { |
There was a problem hiding this comment.
If we are dealing with emptyVector issue, wouldn't be better to check if valueCount == 0 here. This seems to be fixing the unrelated issue where offset.capacity == 0 but with writerIndex was set incorrectly somewhere else.
| List<ArrowBuf> result = new ArrayList<>(2); | ||
| setReaderAndWriterIndex(); | ||
| result.add(validityBuffer); | ||
| if (offsetBuffer.capacity() == 0 && offsetBuffer.writerIndex() > 0) { |
There was a problem hiding this comment.
Same comment as above. If it is not emptyVector issue, we may not know what should be the value in the offsetBuffer. Is it ok to be 0 in all indexes.
There was a problem hiding this comment.
Fixed in 2cd5472. Same tightening applied here: valueCount == 0 && offsetBuffer.capacity() == 0 && offsetBuffer.writerIndex() > 0. That keeps the fix limited to the empty-vector trailing-offset case, where zero is the correct offset value.
Summary
ListVector.setReaderAndWriterIndex()andLargeListVector.setReaderAndWriterIndex()to grow the offset buffer before settingwriterIndex, mirroring whatBaseVariableWidthVectoralready doesIndexOutOfBoundsException: readerIndex: 0, writerIndex: 4 (expected: 0 <= readerIndex <= writerIndex <= capacity(0))when serializing emptyListVectors through NettyRoot cause
apacheGH-343 (Jan 2026) changed
setReaderAndWriterIndex()to unconditionally setoffsetBuffer.writerIndex((valueCount + 1) * OFFSET_WIDTH)so that even empty lists export the trailing-zero offset required by the Arrow IPC spec. However, unlike the analogous fix inBaseVariableWidthVector(lines 402-411), this change did not include a realloc guard.When the offset buffer has never been allocated (
capacity == 0) — which is the case for a freshly-constructedListVectorviaListVector.empty()orallocator.getEmpty()— the writerIndex is set to 4 (or 8 forLargeListVector) on a zero-capacity buffer. This is an inconsistent state that crashesNettyArrowBuf.unwrapBuffer()because Netty'sAbstractByteBuf.writerIndex()bounds-checks against capacity.ListVector.exportCDataBuffers()(line 252) already handles thecapacity == 0case correctly —setReaderAndWriterIndex()was the only path missing the guard.Production impact
This caused
SYSTEM ERROR: IndexOutOfBoundsExceptionin Dremio Cloud DC-231 (Arrow Java 19.0.0) duringREFRESH REFLECTIONjobs on tables withFLATTEN(CONVERT_FROM(..., 'json'))producing empty inner lists. DC-229 (Arrow Java 18.3.0) was not affected because the pre-apacheGH-343 code setoffsetBuffer.writerIndex(0)forvalueCount == 0.Test plan
TestFragmentWritableBatch.emptyListVectorOffsetBufferIsInconsistentAfterUnloadconfirms the Arrow regressionTestFragmentWritableBatch.fragmentWritableBatchCreateSucceedsForEmptyListVectorconfirms the fix🤖 Generated with Claude Code