feat(datum-platform): add analyze-gcp-spend skill and cost-analyst agent#10
Draft
drewr wants to merge 3 commits into
Draft
feat(datum-platform): add analyze-gcp-spend skill and cost-analyst agent#10drewr wants to merge 3 commits into
drewr wants to merge 3 commits into
Conversation
Weekly GCP cost report covering datum-cloud staging and production. Queries BigQuery billing exports and live cluster state (node pools, PVCs, Cloud SQL), generates mermaid xychart-beta trend charts for the trailing 4 months, and files a PR to datum-cloud/engineering. Includes cadence logic to produce a full prior-month retrospective when run in the first 7 days of a month. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Cluster names datum-prod/datum-staging do not exist; correct to
infrastructure-control-plane-{prod,staging} in SKILL.md and queries.md
(node pool and PVC inventory commands).
Add a Preflight Checks section to SKILL.md that validates BigQuery
billing export access before proceeding. BigQuery is a hard requirement —
storage and Cloud SQL costs are invisible without it, causing the report
to understate spend by $2,000–$3,000/month. The skill now requires
stopping and surfacing the access error rather than publishing an
estimate-based report as authoritative.
Contributor
Author
|
Pushed two fixes based on a real run today: 1. Wrong cluster names — 2. Missing preflight checks — the skill had no guard against BigQuery being inaccessible. Without billing export access, storage and Cloud SQL are invisible, which caused a $5,800/month understatement (~54% off) on today's run. Added a Preflight Checks section that explicitly requires verifying BQ access before proceeding, and makes clear the skill should halt and surface the error rather than publish an estimate as authoritative. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
analyze-gcp-spendskill (3 files) covering BigQuery billing queries, live cluster state commands, and a full report template with mermaid trend chartscost-analystagent that the weekly scheduler invokesdatum-platformplugin to1.7.0What it does
Weekly GCP spend report for all datum-cloud infrastructure (staging + production). On the first 7 days of a month it produces a full prior-month retrospective; all other weeks produce an MTD snapshot. Queries BigQuery billing exports for service/SKU/storage breakdown, pulls live node pool and PVC inventory, computes a trailing-4-month trend, and generates
xychart-betamermaid charts. Output is a PR todatum-cloud/engineeringatreports/gcp-spend/YYYY-MM-DD-gcp-spend.md.Test plan
SKILL.mdcadence logic (day ≤ 7 vs MTD)queries.mdagainst actual billing export schemareport-format.mdrender correctly in GitHub previewdatum-cloud-prod,datum-cloud-staging) and cluster names match actual infracost-analystagent manually with a date in the first 7 days to validate full-month mode