Skip to content

[Compiler Refactor 2] Change operator executor interface to use port index#1793

Merged
zuozhiw merged 2 commits into
masterfrom
zuozhi-op-interface
Jan 17, 2023
Merged

[Compiler Refactor 2] Change operator executor interface to use port index#1793
zuozhiw merged 2 commits into
masterfrom
zuozhi-op-interface

Conversation

@zuozhiw

@zuozhiw zuozhiw commented Jan 17, 2023

Copy link
Copy Markdown
Contributor

In this PR, we changed the operator executor interface to use input/output port index instead of link to other operators.

original interface:

  def processTuple(
      tuple: Either[ITuple, InputExhausted],
      input: LinkIdentity,
      pauseManager: PauseManager,
      asyncRPCClient: AsyncRPCClient
  ): Iterator[(ITuple, Option[LinkIdentity])]

// LinkIdentity contains fromOperatorID and toOperatorID

new interafce:

  def processTuple(
      tuple: Either[ITuple, InputExhausted],
      input: Int,
      pauseManager: PauseManager,
      asyncRPCClient: AsyncRPCClient
  ): Iterator[(ITuple, Option[Int])]

The motivation for this change is to make each operator more independent: an operator should independently run on its own and do not care which operator it's connected to.

Before this change, an operator is aware of the specific input and output operators, which violates this independence principle.

After the change, an operator only cares if an input data is coming from a specific port (for multi-input operators such as join), and the output data is sent to a specific port (for multi-output operators such as split)

@zuozhiw zuozhiw force-pushed the zuozhi-op-interface branch from 428e911 to f3c77e9 Compare January 17, 2023 20:28

@shengquan-ni shengquan-ni left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@zuozhiw zuozhiw merged commit a7e2ebc into master Jan 17, 2023
@zuozhiw zuozhiw deleted the zuozhi-op-interface branch January 17, 2023 22:01
@zuozhiw zuozhiw changed the title Change operator executor interface to use port index [Compiler Refactor #2] Change operator executor interface to use port index Jan 18, 2023
@zuozhiw zuozhiw changed the title [Compiler Refactor #2] Change operator executor interface to use port index [Compiler Refactor 2] Change operator executor interface to use port index Jan 19, 2023
Xiao-zhen-Liu pushed a commit that referenced this pull request Jan 27, 2023
…1812)

This PR fixes an issue introduced in #1793 . In some scenarios, the
`inputPortMapping` is not correctly passed into the data processor due
to an initilaization order issue of a lazily evaluated variable. This
causes the join operator to not produce results because it wrongly
thinks all the input data are from the build side.

This PR does a temporary hot-fix of this issue, this issue will be
completely solved once #1807 is merged into master.
yangzhang75 pushed a commit to yangzhang75/texera that referenced this pull request Jun 22, 2026
In this PR, we changed the operator executor interface to use
input/output port index instead of link to other operators.

original interface:
```
  def processTuple(
      tuple: Either[ITuple, InputExhausted],
      input: LinkIdentity,
      pauseManager: PauseManager,
      asyncRPCClient: AsyncRPCClient
  ): Iterator[(ITuple, Option[LinkIdentity])]

// LinkIdentity contains fromOperatorID and toOperatorID
```

new interafce:
```
  def processTuple(
      tuple: Either[ITuple, InputExhausted],
      input: Int,
      pauseManager: PauseManager,
      asyncRPCClient: AsyncRPCClient
  ): Iterator[(ITuple, Option[Int])]
```

The motivation for this change is to make each operator more
independent: an operator should independently run on its own and do not
care which operator it's connected to.

Before this change, an operator is aware of the specific input and
output operators, which violates this independence principle.

After the change, an operator only cares if an input data is coming from
a specific port (for multi-input operators such as join), and the output
data is sent to a specific port (for multi-output operators such as
split)
yangzhang75 pushed a commit to yangzhang75/texera that referenced this pull request Jun 22, 2026
…pache#1812)

This PR fixes an issue introduced in apache#1793 . In some scenarios, the
`inputPortMapping` is not correctly passed into the data processor due
to an initilaization order issue of a lazily evaluated variable. This
causes the join operator to not produce results because it wrongly
thinks all the input data are from the build side.

This PR does a temporary hot-fix of this issue, this issue will be
completely solved once apache#1807 is merged into master.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

engine refactor Refactor the code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants