Skip to content

GH-47642: [C++] Catch exceptions from initial_task in AsyncTaskScheduler#49860

Merged
pitrou merged 4 commits into
apache:mainfrom
egolearner:fix/initial-task-exception
Jun 1, 2026
Merged

GH-47642: [C++] Catch exceptions from initial_task in AsyncTaskScheduler#49860
pitrou merged 4 commits into
apache:mainfrom
egolearner:fix/initial-task-exception

Conversation

@egolearner

@egolearner egolearner commented Apr 26, 2026

Copy link
Copy Markdown
Contributor

Rationale for this change

If the initial_task passed to AsyncTaskScheduler::Make throws a C++ exception (rather than returning a failed Status), OnTaskFinished is never called. This leaves running_tasks_ permanently at 1, causing a DCHECK failure in debug builds and an indefinite hang in release builds because the scheduler's finished future is never completed. In Acero, this manifests as DeclarationToTable (and similar APIs) hanging forever when a SourceNode generator throws during StartProducing.

What changes are included in this PR?

Add exception handling logic.

Are these changes tested?

Yes.

Are there any user-facing changes?

No API changes.

…Scheduler::Make to prevent plan hang

If the initial_task passed to AsyncTaskScheduler::Make throws a C++
exception (rather than returning a failed Status), OnTaskFinished is
never called, leaving running_tasks_ at 1 permanently. This causes a
DCHECK failure in debug builds and an indefinite hang in release builds
because the scheduler's finished future is never completed.

Wrap the initial_task invocation in a try-catch that converts exceptions
to a failed Status::UnknownError, ensuring OnTaskFinished is always
called regardless of how the initial task fails.
@github-actions

Copy link
Copy Markdown

⚠️ GitHub issue #47642 has been automatically assigned in GitHub to PR creator.

Comment thread cpp/src/arrow/util/async_util.cc
Comment thread cpp/src/arrow/util/async_util_test.cc
@github-actions github-actions Bot added awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels Apr 27, 2026
@pitrou

pitrou commented May 13, 2026

Copy link
Copy Markdown
Member

If the initial_task passed to AsyncTaskScheduler::Make throws a C++ exception

Why would it throw at all? Arrow uses Status-based error reporting. If we were catching exceptions from any user-provided callable, this would litter the Arrow C++ code with ad hoc catch statements. We would also certainly forget to catch exceptions in some place, and users would regularly have to report issues about that.

For reference, our ThreadPool doesn't catch exceptions from user tasks either. They are not supposed to throw exceptions.

@egolearner

Copy link
Copy Markdown
Contributor Author

If the initial_task passed to AsyncTaskScheduler::Make throws a C++ exception

Why would it throw at all? Arrow uses Status-based error reporting. If we were catching exceptions from any user-provided callable, this would litter the Arrow C++ code with ad hoc catch statements. We would also certainly forget to catch exceptions in some place, and users would regularly have to report issues about that.

For reference, our ThreadPool doesn't catch exceptions from user tasks either. They are not supposed to throw exceptions.

Thanks for the pushback, but I'd like to make a case for keeping the catch here.

The failure mode is asymmetric with the rest of Arrow: everywhere else a buggy user callable surfaces as a Status, but here it silently hangs forever in release builds (running_tasks_ stays at 1, the plan's finished future never completes). The ThreadPool comparison doesn't quite apply either — a throwing task there calls std::terminate, so the user gets a stack trace. Here they get silence.

Pushing the catch down to call sites (e.g. SourceNode::StartProducing) fixes the case I hit but is whack-a-mole — any future caller that hands a throwing callable to initial_task will hang again. The centralized fix is one try/catch on the scheduler's bootstrap path, not scattered through the codebase.

@pitrou

pitrou commented Jun 1, 2026

Copy link
Copy Markdown
Member

The failure mode is asymmetric with the rest of Arrow: everywhere else a buggy user callable surfaces as a Status, but here it silently hangs forever in release builds (running_tasks_ stays at 1, the plan's finished future never completes). The ThreadPool comparison doesn't quite apply either — a throwing task there calls std::terminate, so the user gets a stack trace. Here they get silence.

Hmm, that's a good point.

@pitrou pitrou left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, thank you @egolearner

Comment thread cpp/src/arrow/util/async_util.cc
@pitrou pitrou changed the title GH-47642: [C++] Catch exceptions from initial_task in AsyncTaskSchedu… GH-47642: [C++] Catch exceptions from initial_task in AsyncTaskScheduler Jun 1, 2026
@github-actions

github-actions Bot commented Jun 1, 2026

Copy link
Copy Markdown

⚠️ GitHub issue #47642 has been automatically assigned in GitHub to PR creator.

@pitrou pitrou merged commit e2b4437 into apache:main Jun 1, 2026
55 of 57 checks passed
@pitrou pitrou removed the awaiting committer review Awaiting committer review label Jun 1, 2026
@conbench-apache-arrow

Copy link
Copy Markdown

After merging your PR, Conbench analyzed the 4 benchmarking runs that have been run so far on merge-commit e2b4437.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details. It also includes information about 3 possible false positives for unstable benchmarks that are known to sometimes produce them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants