Skip to content

[Compiler Refactor 6] Refactor Amber Workflow to use the new PhysicalPlan implementation#1807

Merged
zuozhiw merged 17 commits into
masterfrom
zuozhi-refactor-workflow
Feb 2, 2023
Merged

[Compiler Refactor 6] Refactor Amber Workflow to use the new PhysicalPlan implementation#1807
zuozhiw merged 17 commits into
masterfrom
zuozhi-refactor-workflow

Conversation

@zuozhiw

@zuozhiw zuozhiw commented Jan 24, 2023

Copy link
Copy Markdown
Contributor

This PR does the following changes:

  1. Refactors the Workflow class to use the new PhysicalPlan class. After this change, all helper functions related to traversing the physical plan DAG is inside PhysicalPlan.
  2. In all classes dealing with the actual physical plan, use LayerID (physical operator ID) instead of OperatorID(logical operator ID) to identify an operator. Specifically, OperatorID consists of (workflowID, operatorID) (logical operator). LayerID consists of (workflowID, operatorID, layerID), one logical operator can corresponds to multiple physical operators (such as aggregate and visualization).
  3. Changes from [Compiler Refactor 7] Migrate simple operators to use the new OpExecConfig class #1794 and [Compiler Refactor 8] Migrate all operators to use the new OpExecConfig class #1817 are also merged into master in this branch. Previously, these two PRs are merged into this base branch. In the previous two PRs, all operators are updated to adopt the new OpExecConfig API.
  4. After all operators adopt the new OpExecConfig API, the old code are all cleaned up. Specifically, many old XxxOpExecConfig classes are no longer needed and they are all unified into the new OpExecConfig` API.
  5. Refactors the WorkflowPipelinedBuilder class to use the new PhysicalPlan API. Separate the logic of deiciding a region and adding a materialization operator with a new MaterializationRewriter class.
  6. Refactors the compilation phase into a logical plan building and a physical plan building phase. In the physical plan building phase, adds a new PartitionEnforcer to decide the shuffle policies of each link. The old ad-hoc way to decide shuffle policies (DeploymentFilter) is removed.

@zuozhiw zuozhiw requested a review from shengquan-ni January 24, 2023 23:44

@shengquan-ni shengquan-ni left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments.

@zuozhiw zuozhiw requested a review from shengquan-ni January 25, 2023 23:22
Xiao-zhen-Liu pushed a commit that referenced this pull request Jan 27, 2023
…1812)

This PR fixes an issue introduced in #1793 . In some scenarios, the
`inputPortMapping` is not correctly passed into the data processor due
to an initilaization order issue of a lazily evaluated variable. This
causes the join operator to not produce results because it wrongly
thinks all the input data are from the build side.

This PR does a temporary hot-fix of this issue, this issue will be
completely solved once #1807 is merged into master.
@zuozhiw zuozhiw changed the title Refactor Amber Workflow to use the new PhysicalPlan implementation [Compiler Refactor 6] Refactor Amber Workflow to use the new PhysicalPlan implementation Jan 30, 2023
@zuozhiw zuozhiw added engine refactor Refactor the code java labels Jan 30, 2023
…onfig class (#1794)

This PR is a follow up of #1791 . In #1791 a new `OpExecConfig` class is
introduced. This PR updates the simple operators that are mostly
one-to-one to use the new class. Most changes are one-line changes that
directly map from the old API to the new API. Changes to more
complicated operators (join, aggregate, etc..) will be completed in
subsequent PRs.
@zuozhiw zuozhiw merged commit bad5e29 into master Feb 2, 2023
@zuozhiw zuozhiw deleted the zuozhi-refactor-workflow branch February 2, 2023 20:12
yangzhang75 pushed a commit to yangzhang75/texera that referenced this pull request Jun 22, 2026
…pache#1812)

This PR fixes an issue introduced in apache#1793 . In some scenarios, the
`inputPortMapping` is not correctly passed into the data processor due
to an initilaization order issue of a lazily evaluated variable. This
causes the join operator to not produce results because it wrongly
thinks all the input data are from the build side.

This PR does a temporary hot-fix of this issue, this issue will be
completely solved once apache#1807 is merged into master.
yangzhang75 pushed a commit to yangzhang75/texera that referenced this pull request Jun 22, 2026
…Plan implementation (apache#1807)

This PR does the following changes:
1. Refactors the `Workflow` class to use the new `PhysicalPlan` class.
After this change, all helper functions related to traversing the
physical plan DAG is inside `PhysicalPlan`.
2. In all classes dealing with the actual physical plan, use `LayerID`
(physical operator ID) instead of `OperatorID`(logical operator ID) to
identify an operator. Specifically, `OperatorID` consists of
`(workflowID, operatorID)` (logical operator). `LayerID` consists of
`(workflowID, operatorID, layerID)`, one logical operator can
corresponds to multiple physical operators (such as aggregate and
visualization).
3. Changes from apache#1794 and apache#1817 are also merged into master in this
branch. Previously, these two PRs are merged into this base branch. In
the previous two PRs, all operators are updated to adopt the new
`OpExecConfig` API.
4. After all operators adopt the new `OpExecConfig` API, the old code
are all cleaned up. Specifically, many old `XxxOpExecConfig` classes are
no longer needed and they are all unified into the new OpExecConfig`
API.
5. Refactors the `WorkflowPipelinedBuilder` class to use the new
`PhysicalPlan` API. Separate the logic of deiciding a region and adding
a materialization operator with a new `MaterializationRewriter` class.
6. Refactors the compilation phase into a logical plan building and a
physical plan building phase. In the physical plan building phase, adds
a new `PartitionEnforcer` to decide the shuffle policies of each link.
The old ad-hoc way to decide shuffle policies (`DeploymentFilter`) is
removed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

engine refactor Refactor the code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants