feat: Make Python Virtual Environment Persistent: Add Environments to Left Panel #5577
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #5577 +/- ##
============================================
+ Coverage 52.86% 52.94% +0.07%
- Complexity 2510 2518 +8
============================================
Files 1075 1075
Lines 41665 41741 +76
Branches 4495 4513 +18
============================================
+ Hits 22027 22100 +73
+ Misses 18338 18332 -6
- Partials 1300 1309 +9
*This pull request uses carry forward flags. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
kunwp1
left a comment
There was a problem hiding this comment.
The feature looks good to me! I left a few comments. One main decision is the naming of this module. Do we still want to use Pve or use a different name like "Environment" or "Virtual Environment"? Need a discussion.
…into add-pve-panel
kunwp1
left a comment
There was a problem hiding this comment.
- Update the PR description to be up-to-date. I still see old table names.
- Add more test cases. Try to achieve the coverage report to be near 100%. In our Apache monthly meeting, we have a goal to achieve near 90% coverage and your PR is currently lowering the coverage.
- Update
changelog.xmlbecause you're changing the DDL. - Please explicitly mention in the PR that you're only persisting the metadata of the virtual environments so that the users don't confuse their CRUD on the venv won't drive anything.
…into add-pve-panel
✅ No material benchmark regressions detected🟢 14 better · 🔴 0 worse · ⚪ 1 noise (<±5%) · 0 without baseline
Baseline detailsLatest main
Raw CSVconfig_idx,batch_size,schema_width,string_len,num_batches,total_ms,total_tuples,total_bytes,tuples_per_sec,mb_per_sec,lat_p50_us,lat_p95_us,lat_p99_us
0,10,10,64,20,433.63,200,128000,461,0.282,21923.44,26787.11,26787.11
1,100,10,64,20,1676.25,2000,1280000,1193,0.728,83503.96,106451.24,106451.24
2,1000,10,64,20,14205.06,20000,12800000,1408,0.859,705432.35,752674.96,752674.96 |
…tion Addresses PR #4206 review feedback (#4206 (comment)): the PhysicalOp flag was named after the operator (isLoopEnd) rather than the behavior the scheduler actually checks -- 'keep this operator's output storage across a region re-run.' (This rename was made earlier but lost when the branch was rebased.) * PhysicalOp: isLoopEnd -> reusesOutputStorageOnReExecution (+ doc comment); withIsLoopEnd -> withReusesOutputStorageOnReExecution. * RegionExecutionCoordinator: the skip-recreate guard checks reusesOutputStorageOnReExecution; local val + comment reworded to the behavior. * LoopEndOpDesc sets .withReusesOutputStorageOnReExecution(true). * Loop specs + mixin updated to the new name. Now any future operator that must preserve its output across a region re-execution can set the flag without a LoopEnd-specific misnomer. Verified: WorkflowCore + WorkflowOperator compile and all 29 LoopStart/ LoopEnd op-desc specs pass. (The amber module's local compile is blocked by an unrelated pre-existing JOOQ issue -- PveManager references the virtual_environments table from #5577, absent from the un-migrated local DB; CI compiles it against a fresh schema.)
…nCoordinator Addresses the remaining half of PR #4206 review feedback (#4206 (comment)): the skip-create branch (reuse an existing output document on region re-run instead of clobbering it) was untested. (The rename half landed in 3d4f15b.) Extract the create-or-reuse decision out of the private createOutputPortStorageObjects into a pure companion method RegionExecutionCoordinator.provisionOutputDocument(uri, reuseExistingStorage, documentExists, createDocument) with the storage operations injected, so the four cases can be unit-tested without an iceberg backend or a live region. New RegionOutputProvisioningSpec pins: * reuse + existing document -> NOT recreated (createDocument not called), so accumulated loop output survives the re-run; * reuse + no document yet -> created (first iteration); * no-reuse + existing -> recreated/overwritten (fresh per run); * no-reuse + none -> created; * no-reuse short-circuits documentExists (always recreate, never probe). Verified the production change compiles (the only remaining amber compile errors are the pre-existing PveManager/virtual_environments JOOQ issue from #5577, unrelated to this change and resolved by a migrated DB in CI); the new spec is a pure ScalaTest unit with no iceberg/actor dependency.
… Left Panel (apache#5577) <!-- Thanks for sending a pull request (PR)! Here are some tips for you: 1. If this is your first time, please read our contributor guidelines: [Contributing to Texera](https://github.com/apache/texera/blob/main/CONTRIBUTING.md) 2. Ensure you have added or run the appropriate tests for your PR 3. If the PR is work in progress, mark it a draft on GitHub. 4. Please write your PR title to summarize what this PR proposes, we are following Conventional Commits style for PR titles as well. 5. Be sure to keep the PR description updated to reflect all changes. --> ### What changes were proposed in this PR? <!-- Please clarify what changes you are proposing. The purpose of this section is to outline the changes. Here are some tips for you: 1. If you propose a new API, clarify the use case for a new API. 2. If you fix a bug, you can clarify why it is a bug. 3. If it is a refactoring, clarify what has been changed. 3. It would be helpful to include a before-and-after comparison using screenshots or GIFs. 4. Please consider writing useful notes for better and faster reviews. --> This PR introduces persistent Python Virtual Environments (PVEs) by moving them out of the Computing Unit (CU) lifecycle and storing them in the database. Previously, PVEs were managed through Computing Units and existed only within the CU they were created in. As a result, PVEs were lost when the corresponding CU was terminated. This PR adds a new `virtual_environments` table to persist PVE configurations and introduces a dedicated dashboard interface for managing them. Users can now create, view, update, and delete their own Python virtual environments through a new "Environments" page in the dashboard sidebar. PVE definitions are stored as user-owned resources in the database and can be managed independently of Computing Units. <img width="1689" height="652" alt="Screenshot 2026-06-08 at 6 39 55 PM" src="https://github.com/user-attachments/assets/82711baf-b1ce-4cc6-9e84-a29a230ddc3a" /> <img width="1461" height="500" alt="Screenshot 2026-06-08 at 6 40 19 PM" src="https://github.com/user-attachments/assets/5bbbc360-0adf-401b-8ae8-6d9597d486c2" /> Note: This PR only introduces persistence for PVE metadata and configuration. Creating, updating, and deleting a PVE in this PR only affects the corresponding database records. The execution-time behavior of materializing and using these virtual environments inside a Computing Unit is not part of this change and will be introduced in a future PR. K8s configurations for this feature will be added in a future PR. ### Any related issues, documentation, discussions? <!-- Please use this section to link other resources if not mentioned already. 1. If this PR fixes an issue, please include `Fixes apache#1234`, `Resolves apache#1234` or `Closes apache#1234`. If it is only related, simply mention the issue number. 2. If there is design documentation, please add the link. 3. If there is a discussion in the mailing list, please add the link. --> Related discussions and issues: apache#5360, apache#5361. ### How was this PR tested? <!-- If tests were added, say they were added here. Or simply mention that if the PR is tested with existing test cases. Make sure to include/update test cases that check the changes thoroughly including negative and positive cases if possible. If it was tested in a way different from regular unit tests, please clarify how you tested step by step, ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future. If tests were not added, please describe why they were not added and/or why it was difficult to add. --> Tested manually and tests added to PveResourceSpec. ### Was this PR authored or co-authored using generative AI tooling? <!-- If generative AI tooling has been used in the process of authoring this PR, please include the phrase: 'Generated-by: ' followed by the name of the tool and its version. If no, write 'No'. Please refer to the [ASF Generative Tooling Guidance](https://www.apache.org/legal/generative-tooling.html) for details. --> Co-authored using: Claude Code
What changes were proposed in this PR?
This PR introduces persistent Python Virtual Environments (PVEs) by moving them out of the Computing Unit (CU) lifecycle and storing them in the database.
Previously, PVEs were managed through Computing Units and existed only within the CU they were created in. As a result, PVEs were lost when the corresponding CU was terminated. This PR adds a new
virtual_environmentstable to persist PVE configurations and introduces a dedicated dashboard interface for managing them.Users can now create, view, update, and delete their own Python virtual environments through a new "Environments" page in the dashboard sidebar. PVE definitions are stored as user-owned resources in the database and can be managed independently of Computing Units.
Note: This PR only introduces persistence for PVE metadata and configuration. Creating, updating, and deleting a PVE in this PR only affects the corresponding database records. The execution-time behavior of materializing and using these virtual environments inside a Computing Unit is not part of this change and will be introduced in a future PR.
K8s configurations for this feature will be added in a future PR.
Any related issues, documentation, discussions?
Related discussions and issues: #5360, #5361.
How was this PR tested?
Tested manually and tests added to PveResourceSpec.
Was this PR authored or co-authored using generative AI tooling?
Co-authored using: Claude Code