Add opt-in parallel rule compilation for faster workflow warmup#741
Closed
benluersen wants to merge 1 commit into
Closed
Add opt-in parallel rule compilation for faster workflow warmup#741benluersen wants to merge 1 commit into
benluersen wants to merge 1 commit into
Conversation
Rule compilation during workflow registration was strictly serial. For workflows with very large rule counts (10k+), warmup is dominated by this loop even after expression parsing is fixed. Adds ReSettings.EnableParallelRuleCompilation (default false). When enabled, rules are compiled with Parallel.For and results are added to the compiled-rule dictionary in the original order. An AggregateException from the parallel loop is unwrapped so the first failing rule surfaces its original exception, preserving the serial error contract (verified by the existing ExecuteRule_MissingMethodInExpression_ReturnsRulesFailed test). Benchmark, 20,000 unique rules with local params: 16.2s serial -> 4.7s parallel on a 16-thread machine. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Contributor
Author
@microsoft-github-policy-service agree |
Contributor
|
@benluersen Please review |
YogeshPraj
added a commit
that referenced
this pull request
Jun 11, 2026
* Add opt-in parallel rule compilation for faster workflow warmup Rule compilation during workflow registration was strictly serial. For workflows with very large rule counts (10k+), warmup is dominated by this loop even after expression parsing is fixed. Adds ReSettings.EnableParallelRuleCompilation (default false). When enabled, rules are compiled with Parallel.For and results are added to the compiled-rule dictionary in the original order. An AggregateException from the parallel loop is unwrapped so the first failing rule surfaces its original exception, preserving the serial error contract (verified by the existing ExecuteRule_MissingMethodInExpression_ReturnsRulesFailed test). Benchmark, 20,000 unique rules with local params: 16.2s serial -> 4.7s parallel on a 16-thread machine. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * Guard EnableParallelRuleCompilation and add tests Builds on #741 by @benluersen. The original opt-in parallel rule compilation is sound but had two latent footguns and no test coverage for the parallel path: 1. UseFastExpressionCompiler interaction (~2.7× regression when both flags are on, per the PR description) — users would flip the flag and silently get slower. The engine now declines to parallelize when UseFastExpressionCompiler = true and falls back to serial. 2. Below ~32 rules, Parallel.For's scheduling overhead exceeds the speedup. Added a MinRulesForParallelCompilation threshold so small workflows aren't penalised by enabling the flag globally. 3. catch (AggregateException ae) accessed ae.InnerExceptions[0] without bounds-checking. Replaced with a `when` filter so the catch only matches when there's actually an inner exception to rethrow. XML doc on ReSettings.EnableParallelRuleCompilation now spells out both fallback conditions so the contract is obvious without reading the implementation. New ParallelRuleCompilationTest covers: - Parallel and serial produce identical RuleResultTree shape and outcomes - The first compile failure surfaces as a per-rule ExceptionMessage, not an AggregateException - UseFastExpressionCompiler + parallel still produces correct results (the fallback is silent, only observable in benchmarks) - Sub-threshold workflows execute correctly with the flag enabled All 174 unit tests pass on net6 / net8 / net9 / net10. Co-authored-by: Ben Luersen <ben.luersen@gmail.com> --------- Co-authored-by: Ben Luersen <ben.luersen@gmail.com> Co-authored-by: Claude Fable 5 <noreply@anthropic.com> Co-authored-by: Yogesh Prajapati <yogeshcprajapati@outlook.com>
Contributor
|
#744 is merged. Closing this one. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
Rule compilation during workflow registration is strictly serial. For workflows with very large rule counts (10k+), warmup time is dominated by this loop even after expression parsing is efficient.
Change
Adds
ReSettings.EnableParallelRuleCompilation(default false, so existing behavior is unchanged). When enabled, rules are compiled withParallel.For; compiled delegates are added to the rule dictionary in the original order, so result ordering is unaffected.An
AggregateExceptionthrown by the parallel loop is unwrapped so the first failing rule surfaces its original exception, preserving the serial error contract (verified by the existingExecuteRule_MissingMethodInExpression_ReturnsRulesFailedtest, which fails without the unwrap).Thread-safety notes:
CompileRulepaths share theRuleExpressionParser(immutable after construction aside from itsConcurrentDictionary-backedMemCache), the cachedParsingConfig(a benign last-writer-wins race on rebuild), and theLazy<RuleExpressionParameter[]>global params (defaultExecutionAndPublicationmode). Dynamic LINQ's internal caches areConcurrentDictionary-based.Results
20,000 unique rules with local params, 16-thread machine: 16.2 s serial → 4.7 s parallel.
Note:
UseFastExpressionCompilerinteracts poorly with parallel compilation in our measurements (12.9 s vs 4.7 s with the default LINQ compiler) the two options work but are not recommended together.All 170 existing unit tests pass.