Skip to content

[Java] BaseVariableWidthViewVector.handleSafe under-allocates the view buffer for empty (zero-length) values → IndexOutOfBoundsException #1190

@liming-ye

Description

@liming-ye

Describe the bug

BaseVariableWidthViewVector.handleSafe(int index, int dataLength) sizes the view buffer using the value's content length instead of the fixed per-element view size, so an empty (zero-length) value written at a full view-buffer boundary is not reallocated and the subsequent setBytes reads/writes one element past the end.

// BaseVariableWidthViewVector#handleSafe
protected final void handleSafe(int index, int dataLength) {
    final long targetCapacity = roundUpToMultipleOf16((long) index * ELEMENT_SIZE + dataLength);
    if (viewBuffer.capacity() < targetCapacity) {
      reallocViewBuffer(targetCapacity);
    }
    ...
}

Writing view slot index requires (index + 1) * ELEMENT_SIZE bytes in the view buffer. But the target is index * ELEMENT_SIZE + dataLength. For an empty value (dataLength == 0) this rounds to index * ELEMENT_SIZE, omitting the slot's own 16 bytes. When the view buffer is exactly full (capacity() == index * ELEMENT_SIZE, e.g. 65536 at index == 4096), the capacity() < targetCapacity check is false, no reallocation happens, and setBytes then executes viewBuffer.getLong(index * ELEMENT_SIZE) past the end.

Short non-empty values (1..INLINE_SIZE) round up to index*ELEMENT_SIZE + ELEMENT_SIZE and reallocate correctly, so the defect is specific to zero-length values (e.g. empty strings, and ViewVarCharWriter-style null-slot writes that use EMPTY_BYTES).

Reproduction

try (BufferAllocator allocator = new RootAllocator();
     ViewVarCharVector v = new ViewVarCharVector("s", allocator)) {
    v.allocateNew(); // initial 4096-slot / 65536-byte view buffer
    for (int i = 0; i <= 4096; i++) {
        v.setSafe(i, new byte[0]); // empty value
    }
}

Throws on i == 4096:

java.lang.IndexOutOfBoundsException: index: 65536, length: 8 (expected: range(0, 65536))
    at org.apache.arrow.memory.ArrowBuf.checkIndexD(ArrowBuf.java:299)
    at org.apache.arrow.memory.ArrowBuf.getLong(ArrowBuf.java:312)
    at org.apache.arrow.vector.BaseVariableWidthViewVector.setBytes(BaseVariableWidthViewVector.java:1379)
    at org.apache.arrow.vector.BaseVariableWidthViewVector.setSafe(BaseVariableWidthViewVector.java:1183)

Component(s)

Java

Affected versions

handleSafe is byte-identical (and affected) in 18.3.0, 19.0.0, and main — not a regression of a fixed release.

Expected behavior

The view buffer should always reserve a full ELEMENT_SIZE per slot regardless of content length, i.e. ensure capacity for at least (index + 1) * ELEMENT_SIZE. The out-of-line data buffer is sized separately in setBytes (allocateOrGetLastDataBuffer), so dataLength is not needed for the view buffer capacity check.

For reference, the C++ and Rust implementations always reserve one fixed-size view per element regardless of content length:

  • C++ BinaryViewBuilder: Reserve(1) then data_builder_.UnsafeAppend(BinaryViewType::c_type{}) (and AppendNull/AppendEmptyValue likewise append a full zeroed 16-byte view).
  • Rust GenericByteViewBuilder: views_buffer.push(view) onto a growable Vec<u128> (one 16-byte view per value).

A minimal fix would size the view-buffer target to roundUpToMultipleOf16((long) (index + 1) * ELEMENT_SIZE) (independent of dataLength).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions