GH-47642: [C++] Catch exceptions from initial_task in AsyncTaskScheduler#49860
Conversation
…Scheduler::Make to prevent plan hang If the initial_task passed to AsyncTaskScheduler::Make throws a C++ exception (rather than returning a failed Status), OnTaskFinished is never called, leaving running_tasks_ at 1 permanently. This causes a DCHECK failure in debug builds and an indefinite hang in release builds because the scheduler's finished future is never completed. Wrap the initial_task invocation in a try-catch that converts exceptions to a failed Status::UnknownError, ensuring OnTaskFinished is always called regardless of how the initial task fails.
|
|
Why would it throw at all? Arrow uses Status-based error reporting. If we were catching exceptions from any user-provided callable, this would litter the Arrow C++ code with ad hoc For reference, our |
Thanks for the pushback, but I'd like to make a case for keeping the catch here. The failure mode is asymmetric with the rest of Arrow: everywhere else a buggy user callable surfaces as a Pushing the catch down to call sites (e.g. |
Hmm, that's a good point. |
|
|
|
After merging your PR, Conbench analyzed the 4 benchmarking runs that have been run so far on merge-commit e2b4437. There were no benchmark performance regressions. 🎉 The full Conbench report has more details. It also includes information about 3 possible false positives for unstable benchmarks that are known to sometimes produce them. |
Rationale for this change
If the
initial_taskpassed toAsyncTaskScheduler::Makethrows a C++ exception (rather than returning a failedStatus),OnTaskFinishedis never called. This leavesrunning_tasks_permanently at 1, causing aDCHECKfailure in debug builds and an indefinite hang in release builds because the scheduler'sfinishedfuture is never completed. In Acero, this manifests asDeclarationToTable(and similar APIs) hanging forever when aSourceNodegenerator throws duringStartProducing.What changes are included in this PR?
Add exception handling logic.
Are these changes tested?
Yes.
Are there any user-facing changes?
No API changes.
initial_taskin Acero plan throws an exception the plan hangs indefinitely #47642