feat(db): nightly pg_cron refresh of pygeoapi materialized views#727
Merged
Conversation
Register a pg_cron job that refreshes the pygeoapi materialized views once a night in production, with the schedule traceable in version control via an alembic migration. - alembic migration x2y3z4a5b6c7 creates the pg_cron extension, a public.refresh_pygeoapi_materialized_views() helper, and a nightly cron job (0 9 * * *, server timezone). Idempotent: it unschedules any same-named job before re-registering. - pg_cron is production-only. The migration is a no-op unless ENABLE_PG_CRON is truthy, so alembic upgrade head still works on the dev/test/CI Postgres image (which does not preload pg_cron). - services/materialized_views.py is the single source of truth for the view list, shared by the CLI refresh command and the migration. - docker/db/Dockerfile (production image) installs pg_cron and preloads it with cron.database_name pointed at the app database. - CD_staging.yml and CD_production.yml set ENABLE_PG_CRON=1 on the migration step. - docs/pg_cron-nightly-refresh.md documents setup (self-hosted Docker and Cloud SQL), verification, and the non-concurrent REFRESH rationale. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 5ad3f25d7c
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Staging refreshes the materialized views on each deploy (the existing "Refresh materialized views" CD step), so it does not need the nightly pg_cron job. Drop ENABLE_PG_CRON from CD_staging.yml; only production registers the cron job. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
- Migration helper now discovers ogc_* materialized views from the catalog at run time instead of importing the mutable view tuple. Keeps the versioned migration immutable and self-contained, and auto-includes views added by later migrations. Resolves the cross-environment drift concern from review. - Production DB image derives cron.database_name from POSTGRES_DB via a start-postgres.sh entrypoint wrapper, so pg_cron tracks the same database the migration connects to even when POSTGRES_DB is overridden. - services/materialized_views.py is now the CLI's curated default only; docs updated to match. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Refresh every materialized view nightly, including transducer_daily_data. - Migration helper drops the ogc_* filter and refreshes all public-schema materialized views discovered from the catalog at run time. - Rename PYGEOAPI_MATERIALIZED_VIEWS -> MATERIALIZED_VIEWS and add transducer_daily_data to the CLI's default list. - Update CLI refresh test expectations (8 views) and docs. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The refresh covers all materialized views, not just the ogc_* pygeoapi views, so rename the pygeoapi-specific identifiers to generic ones. - CLI command refresh-pygeoapi-materialized-views -> refresh-materialized-views (function refresh_pygeoapi_materialized_views -> refresh_materialized_views). - SQL helper public.refresh_pygeoapi_materialized_views() -> public.refresh_materialized_views(); rename it in d5e6f7a8b9c0 too, and have the pg_cron migration drop the legacy function on databases that already created it. - Cron job name refresh-pygeoapi-materialized-views -> refresh-materialized-views. - Update CD workflows (staging/testing/production), tests, and docs. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The docker/db image is used only for development, so it should not carry the production pg_cron setup. Revert it to the stock postgis image, remove the start-postgres.sh wrapper, and document pg_cron as Cloud SQL-only in production. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The cron job was never deployed, so the migration does not need to drop a previously created refresh_pygeoapi_materialized_views helper. Remove the LEGACY_FUNCTION_NAME constant, the DROP in upgrade, and the related note. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Adds a nightly pg_cron job that refreshes the pygeoapi materialized views in production. The schedule is registered through an alembic migration so it's traceable in version control.
Why
The pygeoapi materialized views (
ogc_*) were only refreshed manually / on deploy. This automates a nightly refresh so the OGC API serves current data without operator intervention.How
x2y3z4a5b6c7creates thepg_cronextension, apublic.refresh_pygeoapi_materialized_views()helper, and a cron jobrefresh-pygeoapi-materialized-viewson0 9 * * *(server timezone ≈ 02:00–03:00 MT). Idempotent — unschedules any same-named job before re-registering.ENABLE_PG_CRONis truthy. The dev/test/CI Postgres image doesn't preload pg_cron, soalembic upgrade headkeeps working there untouched.services/materialized_views.pyholds the view list, imported by both the migration and theoco refresh-pygeoapi-materialized-viewsCLI command.docker/db/Dockerfileinstallspostgresql-17-cronand starts Postgres withshared_preload_libraries=pg_cronandcron.database_namepointed at the app DB. Devdocker-compose.ymlis untouched.CD_staging.ymlandCD_production.ymlsetENABLE_PG_CRON=1on the migration step.docs/pg_cron-nightly-refresh.mdcovers self-hosted Docker + Cloud SQL setup, verification queries, and the non-concurrent REFRESH rationale.Reviewer notes / deploy prerequisite
CREATE EXTENSION pg_cron:cloudsql.enable_pg_cron=oncron.database_name=<CLOUD_SQL_DATABASE>(the app DB that holds the matviews — notpostgres)CLOUD_SQL_USERneeds privilege to create the extensionThese are instance-level settings, not in the repo.
The cron helper uses plain (non-concurrent)
REFRESHbecauseREFRESH ... CONCURRENTLYcan't run inside a PL/pgSQL transaction; the brief nightly lock is acceptable. The CLI still offers--concurrentlyfor daytime manual refreshes.🤖 Generated with Claude Code