Refactor Spark `format_string` integer conversion dispatch by kosiew · Pull Request #22388 · apache/datafusion

kosiew · 2026-05-20T08:14:51Z

Which issue does this PR close?

Closes Refactor: Centralize numeric %c formatting dispatch in format_string.rs #22163

Rationale for this change

ConversionSpecifier::format contained substantial duplication across integer ScalarValue variants for %d, %x, %o, %s, and %c handling. Each integer width repeated nearly identical conversion logic, making the code harder to maintain and increasing the risk of inconsistent behavior across integer types.

This change consolidates integer formatting behavior into shared internal helpers while preserving existing Spark-compatible semantics.

What changes are included in this PR?

Introduced a local IntegerValue enum to normalize signed and unsigned integer handling while preserving width-specific unsigned bit behavior for %x and %o.
Replaced repeated per-variant integer dispatch branches in ConversionSpecifier::format with a shared format_integer helper.
Added shared helper methods for:
- decimal formatting
- unsigned bit formatting
- %c conversion
- decimal string conversion
Added small macro_rules! helpers to generate From<T> for IntegerValue implementations for signed and unsigned integer families, reducing repetitive conversion boilerplate while preserving width-specific unsigned formatting semantics.
Added invalid_integer_conversion helper to centralize integer conversion error generation.
Added table-driven regression coverage for integer formatting behavior across:
- signed integer widths
- unsigned integer widths
- %d, %x, %o, %s, and %c
- null handling behavior

Are these changes tested?

Yes.

Added test_integer_formatting_across_widths covering:

Signed integer formatting across Int8, Int16, Int32, and Int64
Unsigned integer formatting across UInt8, UInt16, UInt32, and UInt64
%d, %x, %o, %s, and %c formatting behavior
Null integer formatting behavior

Are there any user-facing changes?

No intended user-facing behavior changes. This PR is a structural refactor intended to preserve existing Spark-compatible integer formatting semantics.

LLM-generated code disclosure

This PR includes LLM-generated code and comments. All LLM-generated content has been manually reviewed and tested.

…ize match arms - Added a local IntegerValue adapter to format_string functionality. - Collapsed duplicated match arms for %d, %x, %o, %s, %c to improve code efficiency. - Ensured preservation of signed width behavior for negative hex and octal values. - Introduced a regression table test covering integer widths and null values.

- Removed IntegerFormatValue. - Added direct IntegerValue helpers. - Introduced private From implementations via local macros. - Shortened integer ScalarValue arms for clarity. - Deduplicated invalid integer conversion error handling. - Made test argument counts explicit for better readability.

Jefffrey · 2026-05-25T09:28:12Z

+/// signed values format as decimal for `%d` / `%s` / `%c`, but use their original
+/// bit width for `%x` / `%o` via `unsigned_bits`.
+#[derive(Debug, Clone, Copy)]
+enum IntegerValue {


I wonder if we can achieve this without an enum; perhaps a trait thats implemented directly on i8/u8/u16 etc.

I find this intermediary enum a bit confusing with the indirection it introduces, as all its methods are essentially delegations between signed & unsigned versions which leaves the question of why these are trying to be unified under an enum 🤔

💡 Good idea!

I will replace this with a narrow local trait implemented directly for the primitive integer types (i8/i16/i32/i64 and u8/u16/u32/u64). The shared format_integer helper can then be generic over that trait, while each primitive supplies the few operations where signedness/width matter: decimal formatting, %x/%o unsigned bit representation, %c, and %s string rendering.

This should preserve the current behavior and the added width/null regression coverage, but avoid the intermediary enum and make the dispatch intent more direct.

- Removed the IntegerValue enum. - Added a local IntegerFormatValue trait. - Implemented the trait directly for i8, i16, i32, i64, u8, u16, u32, and u64 types. - Maintained existing behavior and tests.

github-actions Bot added the spark label May 20, 2026

kosiew added 2 commits May 20, 2026 16:44

docs: add IntegerValue invariant doc comment in format_string.rs

0820964

kosiew marked this pull request as ready for review May 20, 2026 09:43

Jefffrey reviewed May 25, 2026

View reviewed changes

feat: remove IntegerValue enum and introduce IntegerFormatValue trait

94b37b0

- Removed the IntegerValue enum. - Added a local IntegerFormatValue trait. - Implemented the trait directly for i8, i16, i32, i64, u8, u16, u32, and u64 types. - Maintained existing behavior and tests.

kosiew marked this pull request as draft May 26, 2026 07:04

kosiew marked this pull request as ready for review May 26, 2026 08:22

kosiew requested a review from Jefffrey May 30, 2026 06:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor Spark `format_string` integer conversion dispatch#22388

Refactor Spark `format_string` integer conversion dispatch#22388
kosiew wants to merge 4 commits into
apache:mainfrom
kosiew:refactor-duplication-02-22163

kosiew commented May 20, 2026 •

edited

Loading

Uh oh!

Jefffrey May 25, 2026

Uh oh!

kosiew May 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

kosiew commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

LLM-generated code disclosure

Uh oh!

Jefffrey May 25, 2026

Choose a reason for hiding this comment

Uh oh!

kosiew May 26, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

kosiew commented May 20, 2026 •

edited

Loading