Describe the bug
BaseVariableWidthViewVector.handleSafe(int index, int dataLength) sizes the view buffer using the value's content length instead of the fixed per-element view size, so an empty (zero-length) value written at a full view-buffer boundary is not reallocated and the subsequent setBytes reads/writes one element past the end.
// BaseVariableWidthViewVector#handleSafe
protected final void handleSafe(int index, int dataLength) {
final long targetCapacity = roundUpToMultipleOf16((long) index * ELEMENT_SIZE + dataLength);
if (viewBuffer.capacity() < targetCapacity) {
reallocViewBuffer(targetCapacity);
}
...
}
Writing view slot index requires (index + 1) * ELEMENT_SIZE bytes in the view buffer. But the target is index * ELEMENT_SIZE + dataLength. For an empty value (dataLength == 0) this rounds to index * ELEMENT_SIZE, omitting the slot's own 16 bytes. When the view buffer is exactly full (capacity() == index * ELEMENT_SIZE, e.g. 65536 at index == 4096), the capacity() < targetCapacity check is false, no reallocation happens, and setBytes then executes viewBuffer.getLong(index * ELEMENT_SIZE) past the end.
Short non-empty values (1..INLINE_SIZE) round up to index*ELEMENT_SIZE + ELEMENT_SIZE and reallocate correctly, so the defect is specific to zero-length values (e.g. empty strings, and ViewVarCharWriter-style null-slot writes that use EMPTY_BYTES).
Reproduction
try (BufferAllocator allocator = new RootAllocator();
ViewVarCharVector v = new ViewVarCharVector("s", allocator)) {
v.allocateNew(); // initial 4096-slot / 65536-byte view buffer
for (int i = 0; i <= 4096; i++) {
v.setSafe(i, new byte[0]); // empty value
}
}
Throws on i == 4096:
java.lang.IndexOutOfBoundsException: index: 65536, length: 8 (expected: range(0, 65536))
at org.apache.arrow.memory.ArrowBuf.checkIndexD(ArrowBuf.java:299)
at org.apache.arrow.memory.ArrowBuf.getLong(ArrowBuf.java:312)
at org.apache.arrow.vector.BaseVariableWidthViewVector.setBytes(BaseVariableWidthViewVector.java:1379)
at org.apache.arrow.vector.BaseVariableWidthViewVector.setSafe(BaseVariableWidthViewVector.java:1183)
Component(s)
Java
Affected versions
handleSafe is byte-identical (and affected) in 18.3.0, 19.0.0, and main — not a regression of a fixed release.
Expected behavior
The view buffer should always reserve a full ELEMENT_SIZE per slot regardless of content length, i.e. ensure capacity for at least (index + 1) * ELEMENT_SIZE. The out-of-line data buffer is sized separately in setBytes (allocateOrGetLastDataBuffer), so dataLength is not needed for the view buffer capacity check.
For reference, the C++ and Rust implementations always reserve one fixed-size view per element regardless of content length:
- C++
BinaryViewBuilder: Reserve(1) then data_builder_.UnsafeAppend(BinaryViewType::c_type{}) (and AppendNull/AppendEmptyValue likewise append a full zeroed 16-byte view).
- Rust
GenericByteViewBuilder: views_buffer.push(view) onto a growable Vec<u128> (one 16-byte view per value).
A minimal fix would size the view-buffer target to roundUpToMultipleOf16((long) (index + 1) * ELEMENT_SIZE) (independent of dataLength).
Describe the bug
BaseVariableWidthViewVector.handleSafe(int index, int dataLength)sizes the view buffer using the value's content length instead of the fixed per-element view size, so an empty (zero-length) value written at a full view-buffer boundary is not reallocated and the subsequentsetBytesreads/writes one element past the end.Writing view slot
indexrequires(index + 1) * ELEMENT_SIZEbytes in the view buffer. But the target isindex * ELEMENT_SIZE + dataLength. For an empty value (dataLength == 0) this rounds toindex * ELEMENT_SIZE, omitting the slot's own 16 bytes. When the view buffer is exactly full (capacity() == index * ELEMENT_SIZE, e.g. 65536 atindex == 4096), thecapacity() < targetCapacitycheck is false, no reallocation happens, andsetBytesthen executesviewBuffer.getLong(index * ELEMENT_SIZE)past the end.Short non-empty values (
1..INLINE_SIZE) round up toindex*ELEMENT_SIZE + ELEMENT_SIZEand reallocate correctly, so the defect is specific to zero-length values (e.g. empty strings, andViewVarCharWriter-style null-slot writes that useEMPTY_BYTES).Reproduction
Throws on
i == 4096:Component(s)
Java
Affected versions
handleSafeis byte-identical (and affected) in 18.3.0, 19.0.0, and main — not a regression of a fixed release.Expected behavior
The view buffer should always reserve a full
ELEMENT_SIZEper slot regardless of content length, i.e. ensure capacity for at least(index + 1) * ELEMENT_SIZE. The out-of-line data buffer is sized separately insetBytes(allocateOrGetLastDataBuffer), sodataLengthis not needed for the view buffer capacity check.For reference, the C++ and Rust implementations always reserve one fixed-size view per element regardless of content length:
BinaryViewBuilder:Reserve(1)thendata_builder_.UnsafeAppend(BinaryViewType::c_type{})(andAppendNull/AppendEmptyValuelikewise append a full zeroed 16-byte view).GenericByteViewBuilder:views_buffer.push(view)onto a growableVec<u128>(one 16-byte view per value).A minimal fix would size the view-buffer target to
roundUpToMultipleOf16((long) (index + 1) * ELEMENT_SIZE)(independent ofdataLength).