fix factorial(21) should not overflow when PostgreSQL returns a numeric answer#22306
fix factorial(21) should not overflow when PostgreSQL returns a numeric answer#22306xiedeyantu wants to merge 2 commits into
Conversation
There was a problem hiding this comment.
@xiedeyantu
Thanks for the fix.
I found one blocking issue around Decimal256 precision handling in factorial.
|
|
||
| let mut result = i256::from(FACTORIALS[FACTORIALS.len() - 1]); | ||
| for value in FACTORIALS.len() as i64..=n { | ||
| result = result.checked_mul(i256::from(value)).ok_or_else(|| { |
There was a problem hiding this comment.
compute_factorial currently checks for i256 overflow, but it does not enforce the declared Decimal256(76, 0) precision limit.
For example, 57! has 77 digits, so it exceeds the supported decimal range even though it still fits in i256. That means the scalar path can produce ScalarValue::Decimal256(Some(result), 76, 0) values that violate the return type, while the array path correctly errors later in with_precision_and_scale.
Could we enforce the Decimal256 precision bound in compute_factorial, or in a shared helper used by both the scalar and array paths, so the behavior is consistent?
It would also be great to add a regression test around the boundary, for example verifying that 56! succeeds and 57! returns an overflow error.
There was a problem hiding this comment.
@kosiew I've added the tests, but it seems there's a problem with CI.
There was a problem hiding this comment.
@xiedeyantu
Thanks for addressing the Decimal256 precision concerns. The shared i256 factorial path and the new overflow coverage look good.
I did notice one regression introduced by the refactor.
| Ok(FACTORIALS[n as usize]) | ||
| } else { | ||
| exec_err!("Overflow happened on FACTORIAL({n})") | ||
| return Ok(i256::from(1)); |
There was a problem hiding this comment.
I think this change accidentally introduced a behavioral regression for negative inputs.
Previously, factorial(-1) returned an execution error with factorial of a negative number is undefined, which matches both the documented behavior and the existing PostgreSQL-compatible SLT coverage.
With the new branch returning Ok(i256::from(1)), negative factorials now silently succeed with 1.
datafusion/sqllogictest/test_files/math.slt still expects:
select factorial(-1);to error, so this is probably the source of the CI failure mentioned in the PR.
Could we restore the negative-input error path while keeping the new i256 / Decimal256 overflow handling?
There was a problem hiding this comment.
I tried to fix it, but it seems that CI is having problems again.
|
I could reproduce the error locally with this PR with |
@kosiew I don't know why, but after rebasing the code to the latest version, all the tests are now passing. |
kosiew
left a comment
There was a problem hiding this comment.
this iteration looks 👍 to me
Thanks for your detail review! |
| ColumnarValue::Array(array) => match array.data_type() { | ||
| Int64 => { | ||
| let result: Int64Array = array | ||
| let result = array |
There was a problem hiding this comment.
we can keep the try_unary which should be much simpler, e.g.
let result = array
.as_primitive::<Int64Type>()
.try_unary::<_, Decimal256Type, _>(compute_factorial)?
.with_precision_and_scale(DECIMAL256_MAX_PRECISION, 0)?;
Ok(ColumnarValue::Array(Arc::new(result) as ArrayRef))| })?; | ||
| } | ||
|
|
||
| if result.to_string().len() > DECIMAL256_MAX_PRECISION as usize { |
There was a problem hiding this comment.
we could consider extending the lookup table to 56 entries which removes the need for the multiplication & validation path
Which issue does this PR close?
factorial(21)should not overflow when PostgreSQL returns a numeric answer #22259.Rationale for this change
factorialwas previously implemented using anInt64lookup table, which causedfactorial(21)to fail with an overflow error. This differs from PostgreSQL, which returns an exact numeric result for this input.This change improves PostgreSQL compatibility by allowing
factorialto return exact results beyond thei64boundary, while still preserving overflow behavior once the supported decimal range is exceeded.What changes are included in this PR?
factorialto returnDecimal256(76, 0)instead ofInt64.20!.i256multiplication.Decimal256range.Are these changes tested?
Yes.
This PR updates sqllogictest coverage for
factorial, including:factorial(21)Validated with:
cargo test -p datafusion-sqllogictest --test sqllogictests scalarAre there any user-facing changes?
Yes.
factorial(21)now returns51090942171709440000instead of raising an overflow error, and the return type offactorialis nowDecimal256(76, 0)rather thanInt64.