ROX-35269: clean up managed clusters before destroying infra-pr GKE clusters#1868
ROX-35269: clean up managed clusters before destroying infra-pr GKE clusters#1868stehessel wants to merge 1 commit into
Conversation
f14e05e to
76e7632
Compare
|
Warning Review limit reached
More reviews will be available in 21 minutes and 1 second. Learn how PR review limits work. Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file). ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits. 🚦 How do rate limits work?CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan review availability. For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, additional reviews become available more gradually as earlier reviews age out of the rolling window. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Central YAML (base), Organization UI (inherited) Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (5)
📝 WalkthroughWalkthroughThe PR adds a new ChangesManaged cluster cleanup on infra-server destroy
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
A single node development cluster (infra-pr-1868) was allocated in production infra for this PR. CI will attempt to deploy 🔌 You can connect to this cluster with: 🛠️ And pull infractl from the deployed dev infra-server with: 🔓 You must go to to export an . Your token from the prod infra instance will not work with dev environments. 🚲 You can then use the dev infra instance e.g.: Further Development☕ If you make changes, you can commit and push and CI will take care of updating the development cluster. 🚀 If you only modify configuration (chart/infra-server/configuration) or templates (chart/infra-server/{static,templates}), you can get a faster update with: LogsLogs for the development infra depending on your @redhat.com authuser: Or: |
a8be563 to
7f65e1d
Compare
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (1)
chart/infra-server/templates/workflowtemplates/cleanup-infra-clusters.yaml (1)
20-20: 🔒 Security & Privacy | 🔵 TrivialPin the Argo CLI image to a fixed tag or digest.
quay.io/argoproj/argocli:latestmakes this destroy-time cleanup path change without a chart update; use a specific release that matches the Argo Workflows version you deploy, or add an explicit version value instead of relying onlatest.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@chart/infra-server/templates/workflowtemplates/cleanup-infra-clusters.yaml` at line 20, The cleanup workflow template is using an unpinned Argo CLI image, which makes the cleanup behavior drift unexpectedly. Update the image reference in the cleanup template to use a fixed tag or digest, ideally matching the deployed Argo Workflows version, and wire it through an explicit chart value if needed instead of relying on the argocli:latest reference.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@chart/infra-server/static/workflow-gke-default.yaml`:
- Around line 24-27: The default value for has-infra-server in
workflow-gke-default.yaml is still set to "true", which causes cleanup to run
for every gke-default workflow instead of only infra-server clusters. Update the
workflow default to "false" and ensure the infra-pr path is the only place that
passes "true", keeping the behavior aligned with the has-infra-server handling
in the workflow/flavor config.
- Around line 47-50: The cleanup-infra-clusters step is missing the required
kubeconfig artifact, so it can skip cleanup. Update the workflow step that uses
templateRef cleanup-infra-clusters to pass
workflow.outputs.artifacts.global-kubeconfig into its inputs/artifacts, using
the cleanup-infra-clusters and global-kubeconfig symbols so the template
receives the kubeconfig it expects.
In `@chart/infra-server/templates/workflowtemplates/cleanup-infra-clusters.yaml`:
- Around line 33-61: The cleanup script in the workflow template currently hides
Argo failures and can exit successfully even when managed workflows are still
active. Update get_active_workflows and the argo stop loop so command failures
are not swallowed, and make the timeout path in the cleanup step fail instead of
falling through with success. Keep the failure signal in this script so the
caller can use continueOn to proceed with destroy while still detecting cleanup
problems.
---
Nitpick comments:
In `@chart/infra-server/templates/workflowtemplates/cleanup-infra-clusters.yaml`:
- Line 20: The cleanup workflow template is using an unpinned Argo CLI image,
which makes the cleanup behavior drift unexpectedly. Update the image reference
in the cleanup template to use a fixed tag or digest, ideally matching the
deployed Argo Workflows version, and wire it through an explicit chart value if
needed instead of relying on the argocli:latest reference.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Central YAML (base), Organization UI (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: 74c32da0-d7d0-4bec-8bb0-f41d328e964e
📒 Files selected for processing (4)
Makefilechart/infra-server/static/flavors.yamlchart/infra-server/static/workflow-gke-default.yamlchart/infra-server/templates/workflowtemplates/cleanup-infra-clusters.yaml
…lusters When a GKE cluster running a test infra-server is destroyed, any test clusters it manages are orphaned because their Argo destroy phases never run. Add a pre-destroy cleanup step that connects to the target cluster, stops all active Argo workflows (triggering their onExit handlers), and waits for cloud resource cleanup to complete before tearing down the cluster. The cleanup is gated by a new has-infra-server workflow parameter (default false) so it only runs on clusters that host an infra-server instance. PR.yaml passes has-infra-server=true for infra-pr clusters. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
742a5bc to
ca22df4
Compare
When a GKE cluster running a test infra-server is destroyed, any test clusters it manages are orphaned because their Argo destroy phases never run. Add a pre-destroy cleanup step that connects to the target cluster, stops all active Argo workflows (triggering their onExit handlers), and waits for cloud resource cleanup to complete before tearing down the cluster.
The cleanup is gated by a new has-infra-server workflow parameter (default false) so it only runs on clusters that host an infra-server instance. PR.yaml passes has-infra-server=true for infra-pr clusters.