Skip to content

GH-44345: [C++][Parquet] Add Decimal32/64 support to Parquet#47427

Merged
pitrou merged 16 commits into
apache:mainfrom
HuaHuaY:fix_gh_44345
Sep 3, 2025
Merged

GH-44345: [C++][Parquet] Add Decimal32/64 support to Parquet#47427
pitrou merged 16 commits into
apache:mainfrom
HuaHuaY:fix_gh_44345

Conversation

@HuaHuaY

@HuaHuaY HuaHuaY commented Aug 26, 2025

Copy link
Copy Markdown
Contributor

Rationale for this change

As described in #44345, Decimal32/Decimal64 have been implemented but Parquet has poor support. This change allows to write Decimal32/Decimal64 into Parquet file the same way as Decimal128/Decimal256 and to read Decimal32/Decimal64 from an existing Parquet file.

What changes are included in this PR?

  1. Support writing Decimal32/Decimal64 as INT32/INT64/BYTE_ARRAY/FIXED_LEN_BYTE_ARRAY into Parquet file.
  2. Support reading Parquet column with logical type Decimal. Either reading type from metadata or infering Arrow Decimal type is supported.

Are these changes tested?

Yes.

Are there any user-facing changes?

Yes. A flag named smallest_decimal_enabled_ is added in ArrowReaderProperties. To maintain backward compatibility, only when the flag is true, Arrow will infer Decimal with small precision to Decimal32/Decimal64 instead of Decimal128.

Copilot AI review requested due to automatic review settings August 26, 2025 11:18
@HuaHuaY HuaHuaY requested a review from wgtmac as a code owner August 26, 2025 11:18

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds support for reading and writing Arrow Decimal32 and Decimal64 types in Parquet files. The implementation extends the existing Decimal128/256 support to include smaller decimal types, allowing for more efficient storage of decimal values with lower precision.

  • Extends Parquet I/O to support Decimal32/64 alongside existing Decimal128/256 types
  • Adds reader property for enabling smallest decimal type inference from Parquet
  • Consolidates decimal serialization logic to support all decimal types uniformly

Reviewed Changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
cpp/src/parquet/properties.h Adds smallest_decimal_enabled_ flag to ArrowReaderProperties for backward compatibility
cpp/src/parquet/column_writer.cc Extends decimal serialization to support Decimal32/64 with unified template logic
cpp/src/parquet/arrow/test_util.h Refactors decimal test utilities to be generic across all decimal types
cpp/src/parquet/arrow/schema_internal.h Updates function signatures to accept ArrowReaderProperties parameter
cpp/src/parquet/arrow/schema_internal.cc Implements smallest decimal type selection logic using new reader property
cpp/src/parquet/arrow/schema.cc Adds Decimal32/64 cases to schema conversion and metadata restoration
cpp/src/parquet/arrow/reader_internal.cc Extends decimal reading logic to support all decimal types through generic templates
cpp/src/parquet/arrow/arrow_reader_writer_test.cc Adds comprehensive test coverage for Decimal32/64 roundtrip scenarios

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Comment thread cpp/src/parquet/properties.h Outdated
Comment thread cpp/src/parquet/column_writer.cc Outdated
Comment thread cpp/src/parquet/column_writer.cc Outdated
Comment thread cpp/src/parquet/arrow/test_util.h Outdated
@HuaHuaY HuaHuaY changed the title GH-44345: [C++] arrow Decimal32/64 read/write parquet GH-44345: [C++][Parquet] arrow Decimal32/64 read/write parquet Aug 26, 2025
Comment thread cpp/src/parquet/properties.h Outdated
Comment thread cpp/src/parquet/properties.h Outdated
Comment thread cpp/src/parquet/arrow/schema_internal.h Outdated
Comment thread cpp/src/parquet/arrow/schema_internal.h Outdated
Comment thread cpp/src/parquet/column_writer.cc Outdated
Comment thread cpp/src/parquet/arrow/arrow_reader_writer_test.cc Outdated
@github-actions github-actions Bot added awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels Aug 26, 2025
@HuaHuaY

HuaHuaY commented Aug 27, 2025

Copy link
Copy Markdown
Contributor Author

@pitrou @mapleFU Could you spare some time to review this PR?

@wgtmac wgtmac changed the title GH-44345: [C++][Parquet] arrow Decimal32/64 read/write parquet GH-44345: [C++][Parquet] Add Decimal32/64 support to Parquet Aug 28, 2025

@wgtmac wgtmac left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Left some nits. Thanks @HuaHuaY!

Comment thread cpp/src/parquet/arrow/arrow_reader_writer_test.cc Outdated
Comment thread cpp/src/parquet/properties.h Outdated
Comment thread cpp/src/parquet/properties.h Outdated

@pitrou pitrou left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing this @HuaHuaY . Can we also update https://arrow.apache.org/docs/dev/cpp/parquet.html#logical-types ?

Comment thread cpp/src/parquet/properties.h Outdated
Comment thread cpp/src/parquet/properties.h Outdated
Comment thread cpp/src/parquet/properties.h Outdated
Comment thread cpp/src/parquet/arrow/schema.cc Outdated
Comment thread cpp/src/parquet/arrow/reader_internal.cc Outdated
Comment thread cpp/src/parquet/arrow/arrow_reader_writer_test.cc Outdated
Comment thread cpp/src/parquet/arrow/arrow_reader_writer_test.cc Outdated
Comment thread cpp/src/parquet/arrow/arrow_reader_writer_test.cc Outdated
Comment thread cpp/src/parquet/arrow/arrow_reader_writer_test.cc Outdated
@HuaHuaY

HuaHuaY commented Aug 28, 2025

Copy link
Copy Markdown
Contributor Author

Can we also update https://arrow.apache.org/docs/dev/cpp/parquet.html#logical-types ?

Is this web page generated by docs/source/cpp/parquet.rst? I have pushed a commit to add Decimal32/Decimal64.

Comment thread docs/source/cpp/parquet.rst Outdated
@wgtmac

wgtmac commented Sep 3, 2025

Copy link
Copy Markdown
Member

All CI failures are unrelated for the same reason below:

CMake Error at /opt/conda/envs/arrow/share/cmake-4.1/Modules/FindPackageHandleStandardArgs.cmake:227 (message):
  Could NOT find LLVMAlt (missing: LLVM_PACKAGE_VERSION CLANG_EXECUTABLE
  LLVM_FOUND LLVM_LINK_EXECUTABLE)

I think it is ready to merge. Do you have more comments? @pitrou

@pitrou pitrou left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but can you please rebase or merge from main to avoid the current CI failures @HuaHuaY ?

@HuaHuaY

HuaHuaY commented Sep 3, 2025

Copy link
Copy Markdown
Contributor Author

I rebased the branch.

@pitrou pitrou merged commit a444380 into apache:main Sep 3, 2025
34 checks passed
@pitrou pitrou removed the awaiting committer review Awaiting committer review label Sep 3, 2025
@HuaHuaY HuaHuaY deleted the fix_gh_44345 branch September 3, 2025 08:44
@conbench-apache-arrow

Copy link
Copy Markdown

After merging your PR, Conbench analyzed the 4 benchmarking runs that have been run so far on merge-commit a444380.

There weren't enough matching historic benchmark results to make a call on whether there were regressions.

The full Conbench report has more details.

Mottl pushed a commit to Mottl/arrow that referenced this pull request May 26, 2026
…pache#47427)

### Rationale for this change
As described in apache#44345, `Decimal32`/`Decimal64` have been implemented but Parquet has poor support. This change allows to write `Decimal32`/`Decimal64` into Parquet file the same way as `Decimal128`/`Decimal256` and to read `Decimal32`/`Decimal64` from an existing Parquet file.

### What changes are included in this PR?
1. Support writing `Decimal32`/`Decimal64` as `INT32`/`INT64`/`BYTE_ARRAY`/`FIXED_LEN_BYTE_ARRAY` into Parquet file.
2. Support reading Parquet column with logical type Decimal. Either reading type from metadata or infering Arrow Decimal type is supported.

### Are these changes tested?
Yes.

### Are there any user-facing changes?
Yes. A flag named `smallest_decimal_enabled_` is added in `ArrowReaderProperties`. To maintain backward compatibility, only when the flag is `true`, Arrow will infer Decimal with small precision to `Decimal32`/`Decimal64` instead of `Decimal128`.

* GitHub Issue: apache#44345

Authored-by: Zehua Zou <41586196+HuaHuaY@users.noreply.github.com>
Signed-off-by: Antoine Pitrou <antoine@python.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants