Skip to content

fix: calcite optimization adds LITERAL_AGG(true)#963

Open
mbwhite wants to merge 2 commits into
substrait-io:mainfrom
mbwhite:isthmus-literal-agg
Open

fix: calcite optimization adds LITERAL_AGG(true)#963
mbwhite wants to merge 2 commits into
substrait-io:mainfrom
mbwhite:isthmus-literal-agg

Conversation

@mbwhite

@mbwhite mbwhite commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

fix: calcite optimization adds literal_agg

I was testing running various Calcite optimisations; TPC/H 16 specifically was optimised in such a way that it wasn't possible for the Isthmus SubstraitRelVisitor to handle it.

The issue was that calcite had added LITERAL_AGG(true).


In Apache Calcite, LITERAL_AGG(true) is an internal aggregate function injected during the decorrelation process (specifically when handling ANY, SOME, or IN subqueries).

Here is why it is there and what it does:

1. The Problem: Handling Empty Subqueries

In your "before" plan, there is a <> SOME(...) correlated subquery. By SQL standards, if a subquery inside a quantified comparison (SOME/ANY) returns zero rows, the entire condition evaluates to FALSE (or UNKNOWN), not NULL.

2. The Solution: Flagging Matches

When Calcite transforms the correlated subquery into a join (LogicalCorrelate / LogicalAggregate), it needs a foolproof way to know if the subquery actually produced any rows for a given correlated key ($cor0.L_PARTKEY).

  • LITERAL_AGG(true) evaluates the literal value true for every row entering the aggregate.
  • Because it is an aggregate function, if the subquery returns zero rows, LITERAL_AGG(true) will return NULL.
  • If the subquery returns one or more rows, it returns true.

3. How Calcite Uses It

Look closely at the massive LogicalFilter(condition=[OR(...)] right above the correlate in your "after" plan.

Calcite uses the output of LITERAL_AGG(true) (which maps to column $19 or $20 in that flattened row) to evaluate the exact short-circuiting logic of the SOME operator:

If it's NULL: The subquery was empty $\rightarrow$ handle as false/null.
If it's TRUE: The subquery returned data $\rightarrow$ proceed to check the actual value comparisons (like COUNT and MAX).

It essentially acts as a highly optimized, null-aware boolean flag for row existence during complex subquery unnesting.


🤖 built with assistance from AI

mbwhite added 2 commits June 26, 2026 14:27
Signed-off-by: matthew brian white <whitemat@uk.ibm.com>
Signed-off-by: matthew brian white <whitemat@uk.ibm.com>
@nielspardon nielspardon changed the title fix: calcite optimization adds LITERAL_AGG(true) fix: calcite optimization adds LITERAL_AGG(true) Jun 29, 2026
@nielspardon

Copy link
Copy Markdown
Member

for some reason your PR header had a trailing whitespace in it

@mbwhite

mbwhite commented Jun 29, 2026

Copy link
Copy Markdown
Contributor Author

thanks @nielspardon

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants