Fix/list vector offset buffer ioobe#29
Open
lriggs wants to merge 6 commits into
Open
Conversation
apacheGH-343 (Jan 2026) changed setReaderAndWriterIndex() to unconditionally set offsetBuffer.writerIndex((valueCount + 1) * OFFSET_WIDTH) so that even empty lists export the trailing-zero offset required by the Arrow IPC spec. However, unlike the analogous fix in BaseVariableWidthVector, this change did not include a realloc guard — when the offset buffer has never been allocated (capacity == 0), the writerIndex is set to 4 on a zero-capacity buffer, producing the inconsistent state: readerIndex: 0, writerIndex: 4, capacity(0) Netty's AbstractByteBuf.writerIndex() bounds-checks against capacity and throws IndexOutOfBoundsException. This crashes Dremio's FragmentWritableBatch serialization path (SingleSenderOperator, PartitionSenderOperator, etc.) when an empty ListVector reaches the VectorUnloader -> NettyArrowBuf.unwrapBuffer() chain. The fix adds the same realloc guard that BaseVariableWidthVector already uses: grow the offset buffer to the required size before setting writerIndex. This mirrors what exportCDataBuffers() already does for the capacity == 0 case. Affects both ListVector (OFFSET_WIDTH = 4) and LargeListVector (OFFSET_WIDTH = 8). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Change-Id: Ib5ca53064ea5bc69c43fd6cc09692b07199c586c
Tests exercise the exact bug path: setValueCount(0) on a vector where allocateNew() was never called, leaving the offset buffer at capacity 0. - testEmptyListOffsetBufferWithoutAllocate: verifies getFieldBuffers() produces a valid offset buffer after the realloc guard fires - testEmptyListGetBuffersWithoutAllocate: exercises getBuffers(false), the IPC serialization entry point that produced the original Netty IOOBE via VectorUnloader -> NettyArrowBuf.unwrapBuffer() - Analogous tests for LargeListVector The existing testEmptyListOffsetBuffer / testEmptyLargeListOffsetBuffer tests call allocateNew() before setValueCount(0), so the offset buffer always has nonzero capacity and the realloc guard is never entered. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Change-Id: I1972ad3d80f405ed9b41103e88902571b63defb0
The previous guard (capacity < required) could replace an already-allocated offset buffer with a smaller one when setReaderAndWriterIndex() was called before the vector was populated (e.g., during validateFull() on an empty vector). The replacement buffer was too small for subsequent writes. Narrowing to capacity==0 targets only the never-allocated empty singleton from allocator.getEmpty(), which is the exact state that causes the IOOBE. Buffers with capacity>0 were properly allocated and should not be replaced. Also removes the dead copy branch (capacity>0 is always false when we enter the guard). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Change-Id: I28ee94d2a2b431f7699ce8c3ce133f184ede5458
The previous approach mutated this.offsetBuffer inside setReaderAndWriterIndex(), which is called from getFieldBuffers(), getBuffers(), and indirectly from validation. Replacing the offset buffer during validation broke subsequent writes — tests that called validateFull() on an empty vector before populating data got a 4-byte buffer that was too small for the actual data. The fix now mirrors exportCDataBuffers() exactly: getFieldBuffers() detects the inconsistent state (capacity==0 but writerIndex>0, produced by setReaderAndWriterIndex on a never-allocated buffer) and substitutes a properly-sized temporary buffer for serialization. The vector's own offsetBuffer is never mutated, so subsequent allocateNew/setValueCount calls work normally. setReaderAndWriterIndex() is reverted to upstream Arrow 19 behavior. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Change-Id: I696163fac4fcf5ada1af0edb5da2f4e8ba4e3e84
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What's Changed
Please fill in a description of the changes here.
This contains breaking changes.
Closes #NNN.