Skip to content

fix factorial(21) should not overflow when PostgreSQL returns a numeric answer#22306

Open
xiedeyantu wants to merge 2 commits into
apache:mainfrom
xiedeyantu:factorial
Open

fix factorial(21) should not overflow when PostgreSQL returns a numeric answer#22306
xiedeyantu wants to merge 2 commits into
apache:mainfrom
xiedeyantu:factorial

Conversation

@xiedeyantu
Copy link
Copy Markdown
Member

Which issue does this PR close?

Rationale for this change

factorial was previously implemented using an Int64 lookup table, which caused factorial(21) to fail with an overflow error. This differs from PostgreSQL, which returns an exact numeric result for this input.

This change improves PostgreSQL compatibility by allowing factorial to return exact results beyond the i64 boundary, while still preserving overflow behavior once the supported decimal range is exceeded.

What changes are included in this PR?

  • Change factorial to return Decimal256(76, 0) instead of Int64.
  • Preserve the existing lookup-table fast path for values up to 20!.
  • Extend factorial computation beyond the lookup-table range using checked i256 multiplication.
  • Keep overflow behavior for inputs whose factorial exceeds the supported Decimal256 range.
  • Add and update sqllogictest coverage for the new result and return type.

Are these changes tested?

Yes.

This PR updates sqllogictest coverage for factorial, including:

  • exact result validation for factorial(21)
  • return type validation for the new decimal result
  • updated expectations for existing scalar and columnar factorial queries

Validated with:

cargo test -p datafusion-sqllogictest --test sqllogictests scalar

Are there any user-facing changes?

Yes.

factorial(21) now returns 51090942171709440000 instead of raising an overflow error, and the return type of factorial is now Decimal256(76, 0) rather than Int64.

@github-actions github-actions Bot added sqllogictest SQL Logic Tests (.slt) functions Changes to functions implementation labels May 17, 2026
Copy link
Copy Markdown
Contributor

@kosiew kosiew left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xiedeyantu
Thanks for the fix.
I found one blocking issue around Decimal256 precision handling in factorial.


let mut result = i256::from(FACTORIALS[FACTORIALS.len() - 1]);
for value in FACTORIALS.len() as i64..=n {
result = result.checked_mul(i256::from(value)).ok_or_else(|| {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

compute_factorial currently checks for i256 overflow, but it does not enforce the declared Decimal256(76, 0) precision limit.

For example, 57! has 77 digits, so it exceeds the supported decimal range even though it still fits in i256. That means the scalar path can produce ScalarValue::Decimal256(Some(result), 76, 0) values that violate the return type, while the array path correctly errors later in with_precision_and_scale.

Could we enforce the Decimal256 precision bound in compute_factorial, or in a shared helper used by both the scalar and array paths, so the behavior is consistent?

It would also be great to add a regression test around the boundary, for example verifying that 56! succeeds and 57! returns an overflow error.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kosiew I've added the tests, but it seems there's a problem with CI.

Copy link
Copy Markdown
Contributor

@kosiew kosiew left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xiedeyantu
Thanks for addressing the Decimal256 precision concerns. The shared i256 factorial path and the new overflow coverage look good.

I did notice one regression introduced by the refactor.

Ok(FACTORIALS[n as usize])
} else {
exec_err!("Overflow happened on FACTORIAL({n})")
return Ok(i256::from(1));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this change accidentally introduced a behavioral regression for negative inputs.

Previously, factorial(-1) returned an execution error with factorial of a negative number is undefined, which matches both the documented behavior and the existing PostgreSQL-compatible SLT coverage.

With the new branch returning Ok(i256::from(1)), negative factorials now silently succeed with 1.

datafusion/sqllogictest/test_files/math.slt still expects:

select factorial(-1);

to error, so this is probably the source of the CI failure mentioned in the PR.

Could we restore the negative-input error path while keeping the new i256 / Decimal256 overflow handling?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to fix it, but it seems that CI is having problems again.

@kosiew
Copy link
Copy Markdown
Contributor

kosiew commented May 27, 2026

I could reproduce the error locally with this PR

External error: 1 errors in file /__w/datafusion/datafusion/datafusion-testing/data/sqlite/random/groupby/slt_good_12.slt

1. query failed: DataFusion error: Arrow error: Divide by zero error
[SQL] SELECT DISTINCT col1 AS col1 FROM tab0 WHERE NOT + CASE + 87 WHEN col1 THEN col1 ELSE NULL END NOT BETWEEN - col1 / + col1 AND NULL GROUP BY col1 HAVING NOT col1 * 75 IS NOT NULL
at /__w/datafusion/datafusion/datafusion-testing/data/sqlite/random/groupby/slt_good_12.slt:12140

with
cargo test -q --features backtrace,parquet_encryption --profile ci-optimized --test sqllogictests -- --include-sqlite

@xiedeyantu
Copy link
Copy Markdown
Member Author

I could reproduce the error locally with this PR

External error: 1 errors in file /__w/datafusion/datafusion/datafusion-testing/data/sqlite/random/groupby/slt_good_12.slt

1. query failed: DataFusion error: Arrow error: Divide by zero error
[SQL] SELECT DISTINCT col1 AS col1 FROM tab0 WHERE NOT + CASE + 87 WHEN col1 THEN col1 ELSE NULL END NOT BETWEEN - col1 / + col1 AND NULL GROUP BY col1 HAVING NOT col1 * 75 IS NOT NULL
at /__w/datafusion/datafusion/datafusion-testing/data/sqlite/random/groupby/slt_good_12.slt:12140

with cargo test -q --features backtrace,parquet_encryption --profile ci-optimized --test sqllogictests -- --include-sqlite

@kosiew I don't know why, but after rebasing the code to the latest version, all the tests are now passing.

@xiedeyantu xiedeyantu requested a review from kosiew May 27, 2026 13:56
Copy link
Copy Markdown
Contributor

@kosiew kosiew left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xiedeyantu

this iteration looks 👍 to me

@xiedeyantu
Copy link
Copy Markdown
Member Author

@xiedeyantu

this iteration looks 👍 to me

Thanks for your detail review!

ColumnarValue::Array(array) => match array.data_type() {
Int64 => {
let result: Int64Array = array
let result = array
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can keep the try_unary which should be much simpler, e.g.

                    let result = array
                        .as_primitive::<Int64Type>()
                        .try_unary::<_, Decimal256Type, _>(compute_factorial)?
                        .with_precision_and_scale(DECIMAL256_MAX_PRECISION, 0)?;
                    Ok(ColumnarValue::Array(Arc::new(result) as ArrayRef))

})?;
}

if result.to_string().len() > DECIMAL256_MAX_PRECISION as usize {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could consider extending the lookup table to 56 entries which removes the need for the multiplication & validation path

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

functions Changes to functions implementation sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

PostgreSQL compatibility: factorial(21) should not overflow when PostgreSQL returns a numeric answer

3 participants