Skip to content

feat: Make Python Virtual Environment Persistent: Add Environments to Left Panel #5577

Merged
kunwp1 merged 35 commits into
apache:mainfrom
SarahAsad23:add-pve-panel
Jun 12, 2026
Merged

feat: Make Python Virtual Environment Persistent: Add Environments to Left Panel #5577
kunwp1 merged 35 commits into
apache:mainfrom
SarahAsad23:add-pve-panel

Conversation

@SarahAsad23

@SarahAsad23 SarahAsad23 commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

What changes were proposed in this PR?

This PR introduces persistent Python Virtual Environments (PVEs) by moving them out of the Computing Unit (CU) lifecycle and storing them in the database.

Previously, PVEs were managed through Computing Units and existed only within the CU they were created in. As a result, PVEs were lost when the corresponding CU was terminated. This PR adds a new virtual_environments table to persist PVE configurations and introduces a dedicated dashboard interface for managing them.

Users can now create, view, update, and delete their own Python virtual environments through a new "Environments" page in the dashboard sidebar. PVE definitions are stored as user-owned resources in the database and can be managed independently of Computing Units.

Screenshot 2026-06-08 at 6 39 55 PM Screenshot 2026-06-08 at 6 40 19 PM

Note: This PR only introduces persistence for PVE metadata and configuration. Creating, updating, and deleting a PVE in this PR only affects the corresponding database records. The execution-time behavior of materializing and using these virtual environments inside a Computing Unit is not part of this change and will be introduced in a future PR.

K8s configurations for this feature will be added in a future PR.

Any related issues, documentation, discussions?

Related discussions and issues: #5360, #5361.

How was this PR tested?

Tested manually and tests added to PveResourceSpec.

Was this PR authored or co-authored using generative AI tooling?

Co-authored using: Claude Code

@github-actions github-actions Bot added engine ddl-change Changes to the TexeraDB DDL frontend Changes related to the frontend GUI platform Non-amber Scala service paths labels Jun 9, 2026
@SarahAsad23 SarahAsad23 marked this pull request as draft June 9, 2026 00:12
@codecov-commenter

codecov-commenter commented Jun 9, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 81.01266% with 15 lines in your changes missing coverage. Please review.
✅ Project coverage is 52.94%. Comparing base (227cbd7) to head (0d46828).

Files with missing lines Patch % Lines
...esource/pythonvirtualenvironment/PveResource.scala 66.66% 11 Missing and 2 partials ⚠️
...resource/pythonvirtualenvironment/PveManager.scala 93.10% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main    #5577      +/-   ##
============================================
+ Coverage     52.86%   52.94%   +0.07%     
- Complexity     2510     2518       +8     
============================================
  Files          1075     1075              
  Lines         41665    41741      +76     
  Branches       4495     4513      +18     
============================================
+ Hits          22027    22100      +73     
+ Misses        18338    18332       -6     
- Partials       1300     1309       +9     
Flag Coverage Δ *Carryforward flag
access-control-service 71.42% <100.00%> (+6.81%) ⬆️
agent-service 34.36% <ø> (ø) Carriedforward from 227cbd7
amber 53.71% <77.94%> (+0.08%) ⬆️
computing-unit-managing-service 1.65% <ø> (ø)
config-service 56.71% <ø> (ø)
file-service 57.06% <ø> (ø)
frontend 47.38% <100.00%> (+0.03%) ⬆️
pyamber 90.72% <ø> (ø) Carriedforward from 227cbd7
python 90.75% <ø> (ø) Carriedforward from 227cbd7
workflow-compiling-service 58.69% <ø> (ø)

*This pull request uses carry forward flags. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@SarahAsad23 SarahAsad23 marked this pull request as ready for review June 9, 2026 02:17

@kunwp1 kunwp1 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The feature looks good to me! I left a few comments. One main decision is the naming of this module. Do we still want to use Pve or use a different name like "Environment" or "Virtual Environment"? Need a discussion.

Comment thread sql/updates/24.sql
Comment thread sql/texera_ddl.sql Outdated
@SarahAsad23 SarahAsad23 requested a review from kunwp1 June 10, 2026 21:13

@kunwp1 kunwp1 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Update the PR description to be up-to-date. I still see old table names.
  2. Add more test cases. Try to achieve the coverage report to be near 100%. In our Apache monthly meeting, we have a goal to achieve near 90% coverage and your PR is currently lowering the coverage.
  3. Update changelog.xml because you're changing the DDL.
  4. Please explicitly mention in the PR that you're only persisting the metadata of the virtual environments so that the users don't confuse their CRUD on the venv won't drive anything.

Comment thread frontend/src/app/dashboard/component/user/user-venv/user-venv.component.html Outdated
Comment thread frontend/src/app/dashboard/component/user/user-venv/user-venv.component.html Outdated
@SarahAsad23 SarahAsad23 requested a review from kunwp1 June 12, 2026 07:22
@github-actions

github-actions Bot commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

✅ No material benchmark regressions detected

🟢 14 better · 🔴 0 worse · ⚪ 1 noise (<±5%) · 0 without baseline

CI benchmark results are noisy; treat <±5% as noise unless repeated.

Dashboard · Run

config throughput MB/s latency max Δ latest / 7d
🟢 bs=10 sw=10 sl=64 461 0.282 21,923/26,787/26,787 us 🟢 -21.7% / 🟢 -25.4%
🟢 bs=100 sw=10 sl=64 1,193 0.728 83,504/106,451/106,451 us 🟢 -33.6% / 🟢 +34.6%
🟢 bs=1000 sw=10 sl=64 1,408 0.859 705,432/752,675/752,675 us 🟢 +30.7% / 🟢 +36.9%
Baseline details

Latest main 227cbd7 from 2026-06-12T19:07:43.071Z

config metric PR latest main 7d avg Δ latest Δ 7d
bs=10 sw=10 sl=64 throughput 461 tuples/sec 422.14 tuples/sec 398.78 tuples/sec +9.2% +15.6%
bs=10 sw=10 sl=64 MB/s 0.282 MB/s 0.258 MB/s 0.243 MB/s +9.4% +15.9%
bs=10 sw=10 sl=64 p50 21,923 us 22,135 us 24,396 us -1.0% -10.1%
bs=10 sw=10 sl=64 p95 26,787 us 34,198 us 35,892 us -21.7% -25.4%
bs=10 sw=10 sl=64 p99 26,787 us 34,198 us 35,892 us -21.7% -25.4%
bs=100 sw=10 sl=64 throughput 1,193 tuples/sec 894.49 tuples/sec 886.25 tuples/sec +33.4% +34.6%
bs=100 sw=10 sl=64 MB/s 0.728 MB/s 0.546 MB/s 0.541 MB/s +33.3% +34.6%
bs=100 sw=10 sl=64 p50 83,504 us 107,334 us 112,613 us -22.2% -25.8%
bs=100 sw=10 sl=64 p95 106,451 us 160,327 us 135,197 us -33.6% -21.3%
bs=100 sw=10 sl=64 p99 106,451 us 160,327 us 135,197 us -33.6% -21.3%
bs=1000 sw=10 sl=64 throughput 1,408 tuples/sec 1,077 tuples/sec 1,029 tuples/sec +30.7% +36.9%
bs=1000 sw=10 sl=64 MB/s 0.859 MB/s 0.657 MB/s 0.628 MB/s +30.7% +36.8%
bs=1000 sw=10 sl=64 p50 705,432 us 928,059 us 976,698 us -24.0% -27.8%
bs=1000 sw=10 sl=64 p95 752,675 us 950,455 us 1,032,282 us -20.8% -27.1%
bs=1000 sw=10 sl=64 p99 752,675 us 950,455 us 1,032,282 us -20.8% -27.1%
Raw CSV
config_idx,batch_size,schema_width,string_len,num_batches,total_ms,total_tuples,total_bytes,tuples_per_sec,mb_per_sec,lat_p50_us,lat_p95_us,lat_p99_us
0,10,10,64,20,433.63,200,128000,461,0.282,21923.44,26787.11,26787.11
1,100,10,64,20,1676.25,2000,1280000,1193,0.728,83503.96,106451.24,106451.24
2,1000,10,64,20,14205.06,20000,12800000,1408,0.859,705432.35,752674.96,752674.96

Comment thread sql/changelog.xml Outdated

@kunwp1 kunwp1 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@kunwp1 kunwp1 added this pull request to the merge queue Jun 12, 2026
Merged via the queue into apache:main with commit fa5fcbb Jun 12, 2026
22 checks passed
@SarahAsad23 SarahAsad23 deleted the add-pve-panel branch June 12, 2026 21:16
aglinxinyuan added a commit that referenced this pull request Jun 13, 2026
…tion

Addresses PR #4206 review feedback
(#4206 (comment)): the
PhysicalOp flag was named after the operator (isLoopEnd) rather than the
behavior the scheduler actually checks -- 'keep this operator's output
storage across a region re-run.' (This rename was made earlier but lost
when the branch was rebased.)

* PhysicalOp: isLoopEnd -> reusesOutputStorageOnReExecution (+ doc
  comment); withIsLoopEnd -> withReusesOutputStorageOnReExecution.
* RegionExecutionCoordinator: the skip-recreate guard checks
  reusesOutputStorageOnReExecution; local val + comment reworded to the
  behavior.
* LoopEndOpDesc sets .withReusesOutputStorageOnReExecution(true).
* Loop specs + mixin updated to the new name.

Now any future operator that must preserve its output across a region
re-execution can set the flag without a LoopEnd-specific misnomer.

Verified: WorkflowCore + WorkflowOperator compile and all 29 LoopStart/
LoopEnd op-desc specs pass. (The amber module's local compile is blocked
by an unrelated pre-existing JOOQ issue -- PveManager references the
virtual_environments table from #5577, absent from the un-migrated local
DB; CI compiles it against a fresh schema.)
aglinxinyuan added a commit that referenced this pull request Jun 13, 2026
…nCoordinator

Addresses the remaining half of PR #4206 review feedback
(#4206 (comment)): the
skip-create branch (reuse an existing output document on region re-run
instead of clobbering it) was untested. (The rename half landed in
3d4f15b.)

Extract the create-or-reuse decision out of the private
createOutputPortStorageObjects into a pure companion method
RegionExecutionCoordinator.provisionOutputDocument(uri,
reuseExistingStorage, documentExists, createDocument) with the storage
operations injected, so the four cases can be unit-tested without an
iceberg backend or a live region.

New RegionOutputProvisioningSpec pins:
* reuse + existing document  -> NOT recreated (createDocument not called),
  so accumulated loop output survives the re-run;
* reuse + no document yet     -> created (first iteration);
* no-reuse + existing         -> recreated/overwritten (fresh per run);
* no-reuse + none             -> created;
* no-reuse short-circuits documentExists (always recreate, never probe).

Verified the production change compiles (the only remaining amber
compile errors are the pre-existing PveManager/virtual_environments JOOQ
issue from #5577, unrelated to this change and resolved by a migrated DB
in CI); the new spec is a pure ScalaTest unit with no iceberg/actor
dependency.
yangzhang75 pushed a commit to yangzhang75/texera that referenced this pull request Jun 22, 2026
… Left Panel (apache#5577)

<!--
Thanks for sending a pull request (PR)! Here are some tips for you:
1. If this is your first time, please read our contributor guidelines:
[Contributing to
Texera](https://github.com/apache/texera/blob/main/CONTRIBUTING.md)
  2. Ensure you have added or run the appropriate tests for your PR
  3. If the PR is work in progress, mark it a draft on GitHub.
  4. Please write your PR title to summarize what this PR proposes, we 
    are following Conventional Commits style for PR titles as well.
  5. Be sure to keep the PR description updated to reflect all changes.
-->

### What changes were proposed in this PR?
<!--
Please clarify what changes you are proposing. The purpose of this
section
is to outline the changes. Here are some tips for you:
  1. If you propose a new API, clarify the use case for a new API.
  2. If you fix a bug, you can clarify why it is a bug.
  3. If it is a refactoring, clarify what has been changed.
  3. It would be helpful to include a before-and-after comparison using 
     screenshots or GIFs.
  4. Please consider writing useful notes for better and faster reviews.
-->

This PR introduces persistent Python Virtual Environments (PVEs) by
moving them out of the Computing Unit (CU) lifecycle and storing them in
the database.

Previously, PVEs were managed through Computing Units and existed only
within the CU they were created in. As a result, PVEs were lost when the
corresponding CU was terminated. This PR adds a new
`virtual_environments` table to persist PVE configurations and
introduces a dedicated dashboard interface for managing them.

Users can now create, view, update, and delete their own Python virtual
environments through a new "Environments" page in the dashboard sidebar.
PVE definitions are stored as user-owned resources in the database and
can be managed independently of Computing Units.

<img width="1689" height="652" alt="Screenshot 2026-06-08 at 6 39 55 PM"
src="https://github.com/user-attachments/assets/82711baf-b1ce-4cc6-9e84-a29a230ddc3a"
/>

<img width="1461" height="500" alt="Screenshot 2026-06-08 at 6 40 19 PM"
src="https://github.com/user-attachments/assets/5bbbc360-0adf-401b-8ae8-6d9597d486c2"
/>

Note: This PR only introduces persistence for PVE metadata and
configuration. Creating, updating, and deleting a PVE in this PR only
affects the corresponding database records. The execution-time behavior
of materializing and using these virtual environments inside a Computing
Unit is not part of this change and will be introduced in a future PR.

K8s configurations for this feature will be added in a future PR. 

### Any related issues, documentation, discussions?
<!--
Please use this section to link other resources if not mentioned
already.
1. If this PR fixes an issue, please include `Fixes apache#1234`, `Resolves
apache#1234`
or `Closes apache#1234`. If it is only related, simply mention the issue
number.
  2. If there is design documentation, please add the link.
  3. If there is a discussion in the mailing list, please add the link.
-->

Related discussions and issues: apache#5360, apache#5361.

### How was this PR tested?
<!--
If tests were added, say they were added here. Or simply mention that if
the PR
is tested with existing test cases. Make sure to include/update test
cases that
check the changes thoroughly including negative and positive cases if
possible.
If it was tested in a way different from regular unit tests, please
clarify how
you tested step by step, ideally copy and paste-able, so that other
reviewers can
test and check, and descendants can verify in the future. If tests were
not added,
please describe why they were not added and/or why it was difficult to
add.
-->

Tested manually and tests added to PveResourceSpec. 

### Was this PR authored or co-authored using generative AI tooling?
<!--
If generative AI tooling has been used in the process of authoring this
PR,
please include the phrase: 'Generated-by: ' followed by the name of the
tool
and its version. If no, write 'No'. 
Please refer to the [ASF Generative Tooling
Guidance](https://www.apache.org/legal/generative-tooling.html) for
details.
-->

Co-authored using: Claude Code
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ddl-change Changes to the TexeraDB DDL engine frontend Changes related to the frontend GUI platform Non-amber Scala service paths

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants