Refactor Spark format_string integer conversion dispatch#22388
Conversation
…ize match arms - Added a local IntegerValue adapter to format_string functionality. - Collapsed duplicated match arms for %d, %x, %o, %s, %c to improve code efficiency. - Ensured preservation of signed width behavior for negative hex and octal values. - Introduced a regression table test covering integer widths and null values.
- Removed IntegerFormatValue. - Added direct IntegerValue helpers. - Introduced private From implementations via local macros. - Shortened integer ScalarValue arms for clarity. - Deduplicated invalid integer conversion error handling. - Made test argument counts explicit for better readability.
| /// signed values format as decimal for `%d` / `%s` / `%c`, but use their original | ||
| /// bit width for `%x` / `%o` via `unsigned_bits`. | ||
| #[derive(Debug, Clone, Copy)] | ||
| enum IntegerValue { |
There was a problem hiding this comment.
I wonder if we can achieve this without an enum; perhaps a trait thats implemented directly on i8/u8/u16 etc.
I find this intermediary enum a bit confusing with the indirection it introduces, as all its methods are essentially delegations between signed & unsigned versions which leaves the question of why these are trying to be unified under an enum 🤔
There was a problem hiding this comment.
💡 Good idea!
I will replace this with a narrow local trait implemented directly for the primitive integer types (i8/i16/i32/i64 and u8/u16/u32/u64). The shared format_integer helper can then be generic over that trait, while each primitive supplies the few operations where signedness/width matter: decimal formatting, %x/%o unsigned bit representation, %c, and %s string rendering.
This should preserve the current behavior and the added width/null regression coverage, but avoid the intermediary enum and make the dispatch intent more direct.
- Removed the IntegerValue enum. - Added a local IntegerFormatValue trait. - Implemented the trait directly for i8, i16, i32, i64, u8, u16, u32, and u64 types. - Maintained existing behavior and tests.
Which issue does this PR close?
%cformatting dispatch in format_string.rs #22163Rationale for this change
ConversionSpecifier::formatcontained substantial duplication across integerScalarValuevariants for%d,%x,%o,%s, and%chandling. Each integer width repeated nearly identical conversion logic, making the code harder to maintain and increasing the risk of inconsistent behavior across integer types.This change consolidates integer formatting behavior into shared internal helpers while preserving existing Spark-compatible semantics.
What changes are included in this PR?
Introduced a local
IntegerValueenum to normalize signed and unsigned integer handling while preserving width-specific unsigned bit behavior for%xand%o.Replaced repeated per-variant integer dispatch branches in
ConversionSpecifier::formatwith a sharedformat_integerhelper.Added shared helper methods for:
%cconversionAdded small
macro_rules!helpers to generateFrom<T> for IntegerValueimplementations for signed and unsigned integer families, reducing repetitive conversion boilerplate while preserving width-specific unsigned formatting semantics.Added
invalid_integer_conversionhelper to centralize integer conversion error generation.Added table-driven regression coverage for integer formatting behavior across:
%d,%x,%o,%s, and%cAre these changes tested?
Yes.
Added
test_integer_formatting_across_widthscovering:Int8,Int16,Int32, andInt64UInt8,UInt16,UInt32, andUInt64%d,%x,%o,%s, and%cformatting behaviorAre there any user-facing changes?
No intended user-facing behavior changes. This PR is a structural refactor intended to preserve existing Spark-compatible integer formatting semantics.
LLM-generated code disclosure
This PR includes LLM-generated code and comments. All LLM-generated content has been manually reviewed and tested.