Support null aware hash mark-joins#21585
Conversation
41c562a to
97fbdcf
Compare
|
@AdamGS are you planning to continue on this soon? |
|
yes! I have some more changes locally that I haven't pushed. Codex also came up with a test case that I'm trying to figure out if it should be covered by this change or is a different correctness corner that DataFusion currently misses. |
|
LMK when you are ready, I can allocate some time to review |
0a42875 to
359f140
Compare
|
pushed some stuff I have locally, mostly to show a regression test that I think is relevant. |
359f140 to
26eec2d
Compare
26eec2d to
f0811c6
Compare
|
@Dandandan got distracted by a lot of other stuff, but I think its getting pretty close. I'm going to spend more time this weekend trying to review it and think through the change with the paper. |
This is still a draft, I'm putting it up because because might want to weigh in, and I find it useful to be able to see the diff clearly.
Which issue does this PR close?
Rationale for this change
This change is about correctness/sql completeness, but is also a step towards better subquery de-correlation.
Before this change, mark joins only produced boolean
true/falseresults, so queries such as(id NOT IN (...)) IS NULLcouldreturn incorrect results, especially for correlated scalar subqueries.
What changes are included in this PR?
NOT INmark joins use null-aware semantics when the predicate can be represented as hash join keys.Are these changes tested?
Are there any user-facing changes?
This PR changes planning behavior and introduces more public API around hash joins, I'll finalize this section as it gets closer to a reviewable state. It also introduces one minor public function -
build_join_schema_with_null_aware.AI Usage
AI was used in the process of developing this PR, mostly around testing and planning