GH-50111: [C++][Gandiva] Improve function error messages#50112
Open
lriggs wants to merge 1 commit into
Open
Conversation
|
|
| char err_msg[160]; \ | ||
| snprintf(err_msg, sizeof(err_msg), \ | ||
| "NEXT_DAY: '%.*s' is not a recognized weekday " \ | ||
| "(expected MON|TUE|WED|THU|FRI|SAT|SUN)", \ |
Contributor
There was a problem hiding this comment.
are these the only recognizable values? Can it be "mon" or "Monday"?
Contributor
There was a problem hiding this comment.
Based on tests function accepts two character version, three character versions as well as whole day so error is not correct/complete.
Lets also add a test to show if comparison is case sensitive.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Rationale for this change
Gandiva's runtime errors are surfaced to SQL users via
ExecutionContext::set_error_msg. An audit of all ~90 call sites incpp/src/gandiva/turned up two recurring problems:"divide by zero error"or"Output buffer length can't be negative"with no indication of which SQL function produced it, making errors hard to localize in long queries.A good runtime error should answer four questions: which function, what went wrong, what value triggered it, what's the valid range. This PR moves the highest-impact messages toward that bar.
What changes are included in this PR?
All edits follow the same pattern:
Where the old message contained a load-bearing substring (e.g.
divide by zero error,Output buffer length can't be negative), the substring is preserved so existing substring-based tests still match.REGEXP_EXTRACT (the original case study)
cpp/src/gandiva/regex_functions_holder.cc:239-247 — the message
"Index to extract out of range"now reads"REGEXP_EXTRACT: invalid group_index '<N>'; must be between 0 and <max> (the number of capture groups in the pattern)", matching the level of detail of the Java path.Divide-by-zero family
Eight call sites in
arithmetic_ops.cc,decimal_ops.cc, andextended_math_ops.cc(coveringDIVIDE,DIV,MOD,PMOD, decimalDivide/Mod,LOG) now prefix the SQL function name while preserving the originaldivide by zero errorsubstring.mod_float64_float64additionally echoes the dividend.Value-echoing
The following sites now include both the SQL function name and the offending value:
CAST_BIT— echoes the invalid input string, lists expected valuesCAST_VARCHAR/CAST_VARBINARYlength checks — echoes requested lengthREPEAT— echoes count (and on overflow: count × input length)CONVERT_REPLACE_INVALID_FROM_UTF8— echoes byte countLOCATE— echoes start positionFACTORIAL— echoes input value (both negative and >20 paths)NEGATIVE(integer overflow onINT_MIN, andnegative_daytimeintervalout-of-bounds)CRC32— echoes input lengthBASE64/UNBASE64— echoes input lengthAES_ENCRYPT/AES_DECRYPT— echoes data length on negative input;AES_ENCRYPTOOM message also rewritten to name the function clearlyNEXT_DAY— echoes the unrecognized weekday string, lists expected valuesCAST_INTERVAL_YEAR— echoes the overflowing source valueCAST_*_FROM_HEX(3 error paths in the macro) — echoes the input hex stringREPLACE— buffer-overflow message prefixed withREPLACE:Are these changes tested?
EXPECT_EQ(error, "exact string")assertions converted toHasSubstr/.find()so the test contract is the SQL function name plus a stable phrase, not the exact wording."Factorial of negative"→HasSubstr("FACTORIAL") + HasSubstr("non-negative")).arithmetic_ops_test.cc,decimal_ops_test.cc,extended_math_ops_test.cc,time_test.cc,gdv_function_stubs_test.cc.All affected error paths are exercised by the existing unit tests — that is how each site was located in the audit.
Are there any user-facing changes?
Yes, improved error messages.
Example before / after