Skip to content

Add opt-in parallel rule compilation for faster workflow warmup#741

Closed
benluersen wants to merge 1 commit into
microsoft:mainfrom
benluersen:perf/parallel-rule-compilation
Closed

Add opt-in parallel rule compilation for faster workflow warmup#741
benluersen wants to merge 1 commit into
microsoft:mainfrom
benluersen:perf/parallel-rule-compilation

Conversation

@benluersen

Copy link
Copy Markdown
Contributor

Problem

Rule compilation during workflow registration is strictly serial. For workflows with very large rule counts (10k+), warmup time is dominated by this loop even after expression parsing is efficient.

Change

Adds ReSettings.EnableParallelRuleCompilation (default false, so existing behavior is unchanged). When enabled, rules are compiled with Parallel.For; compiled delegates are added to the rule dictionary in the original order, so result ordering is unaffected.

An AggregateException thrown by the parallel loop is unwrapped so the first failing rule surfaces its original exception, preserving the serial error contract (verified by the existing ExecuteRule_MissingMethodInExpression_ReturnsRulesFailed test, which fails without the unwrap).

Thread-safety notes: CompileRule paths share the RuleExpressionParser (immutable after construction aside from its ConcurrentDictionary-backed MemCache), the cached ParsingConfig (a benign last-writer-wins race on rebuild), and the Lazy<RuleExpressionParameter[]> global params (default ExecutionAndPublication mode). Dynamic LINQ's internal caches are ConcurrentDictionary-based.

Results

20,000 unique rules with local params, 16-thread machine: 16.2 s serial → 4.7 s parallel.

Note: UseFastExpressionCompiler interacts poorly with parallel compilation in our measurements (12.9 s vs 4.7 s with the default LINQ compiler) the two options work but are not recommended together.

All 170 existing unit tests pass.

Rule compilation during workflow registration was strictly serial. For
workflows with very large rule counts (10k+), warmup is dominated by
this loop even after expression parsing is fixed.

Adds ReSettings.EnableParallelRuleCompilation (default false). When
enabled, rules are compiled with Parallel.For and results are added to
the compiled-rule dictionary in the original order. An
AggregateException from the parallel loop is unwrapped so the first
failing rule surfaces its original exception, preserving the serial
error contract (verified by the existing
ExecuteRule_MissingMethodInExpression_ReturnsRulesFailed test).

Benchmark, 20,000 unique rules with local params: 16.2s serial -> 4.7s
parallel on a 16-thread machine.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@benluersen

Copy link
Copy Markdown
Contributor Author

@benluersen please read the following Contributor License Agreement(CLA). If you agree with the CLA, please reply with the following information.

@microsoft-github-policy-service agree [company="{your company}"]

Options:

  • (default - no company specified) I have sole ownership of intellectual property rights to my Submissions and I am not making Submissions in the course of work for my employer.
@microsoft-github-policy-service agree
  • (when company given) I am making Submissions in the course of work for my employer (or my employer has intellectual property rights in my Submissions by contract or applicable law). I have permission from my employer to make Submissions and enter into this Agreement on behalf of my employer. By signing below, the defined term “You” includes me and my employer.
@microsoft-github-policy-service agree company="Microsoft"

Contributor License Agreement

@microsoft-github-policy-service agree

@YogeshPraj

Copy link
Copy Markdown
Contributor

@benluersen Please review

#744

YogeshPraj added a commit that referenced this pull request Jun 11, 2026
* Add opt-in parallel rule compilation for faster workflow warmup

Rule compilation during workflow registration was strictly serial. For
workflows with very large rule counts (10k+), warmup is dominated by
this loop even after expression parsing is fixed.

Adds ReSettings.EnableParallelRuleCompilation (default false). When
enabled, rules are compiled with Parallel.For and results are added to
the compiled-rule dictionary in the original order. An
AggregateException from the parallel loop is unwrapped so the first
failing rule surfaces its original exception, preserving the serial
error contract (verified by the existing
ExecuteRule_MissingMethodInExpression_ReturnsRulesFailed test).

Benchmark, 20,000 unique rules with local params: 16.2s serial -> 4.7s
parallel on a 16-thread machine.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* Guard EnableParallelRuleCompilation and add tests

Builds on #741 by @benluersen. The original opt-in parallel rule
compilation is sound but had two latent footguns and no test coverage
for the parallel path:

1. UseFastExpressionCompiler interaction (~2.7× regression when both
   flags are on, per the PR description) — users would flip the flag and
   silently get slower. The engine now declines to parallelize when
   UseFastExpressionCompiler = true and falls back to serial.

2. Below ~32 rules, Parallel.For's scheduling overhead exceeds the
   speedup. Added a MinRulesForParallelCompilation threshold so small
   workflows aren't penalised by enabling the flag globally.

3. catch (AggregateException ae) accessed ae.InnerExceptions[0]
   without bounds-checking. Replaced with a `when` filter so the catch
   only matches when there's actually an inner exception to rethrow.

XML doc on ReSettings.EnableParallelRuleCompilation now spells out both
fallback conditions so the contract is obvious without reading the
implementation.

New ParallelRuleCompilationTest covers:
- Parallel and serial produce identical RuleResultTree shape and outcomes
- The first compile failure surfaces as a per-rule ExceptionMessage, not
  an AggregateException
- UseFastExpressionCompiler + parallel still produces correct results
  (the fallback is silent, only observable in benchmarks)
- Sub-threshold workflows execute correctly with the flag enabled

All 174 unit tests pass on net6 / net8 / net9 / net10.

Co-authored-by: Ben Luersen <ben.luersen@gmail.com>

---------

Co-authored-by: Ben Luersen <ben.luersen@gmail.com>
Co-authored-by: Claude Fable 5 <noreply@anthropic.com>
Co-authored-by: Yogesh Prajapati <yogeshcprajapati@outlook.com>
@YogeshPraj

Copy link
Copy Markdown
Contributor

#744 is merged. Closing this one.

@YogeshPraj YogeshPraj closed this Jun 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants