feat: Radionuclides backfill job (#558)#573
Conversation
Add chemistry backfill that maps legacy NMA_Radionuclides rows into Observation records keyed on nma_pk_chemistryresults (idempotent upsert). - Schema: add chemistry columns to Observation and Sample models/schemas - Migration: add columns + unique constraints, drop stale NMA_Radionuclides.thing_id - Lexicon: add pCi/L unit and Chemistry Observation note type - Backfill: resolve Samples via nma_pk_chemistrysample, create Parameters, AnalysisMethods, Notes; detect_flag from legacy Symbol qualifier - Tests: BDD step definitions for all 7 scenarios, feature file fixes - Orchestrator: wire radionuclides step into backfill pipeline Refs #558 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Wrap cleanup chain in try/except to prevent orphan data on failure (#566) - Scope AnalysisMethod cleanup to test-created IDs only (#567) Also addresses #565 (scalar_one hardening) and #568 (remove unused parameter_ids tracker) which were applied to the new files in the preceding feature commit. Refs #558 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add per-row savepoint isolation so one bad row doesn't abort the run - Skip rows with missing AnalysisDate instead of inserting epoch sentinel - Remove unused batch_size parameter from backfill signatures and CLI - Fix migration downgrade to backfill thing_id before setting NOT NULL - Track analysis_method_ids in test step defs for scoped cleanup Refs #558 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…leanup - Migration upgrade now checks information_schema before dropping thing_id column, guarding against environments where it was already removed - run_backfill.sh no longer references removed --batch-size parameter - Test observation cleanup scoped to scenario sample_ids instead of blanket nma_pk_chemistryresults IS NOT NULL query Refs #558, #570 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… tracking - Migration upgrade dynamically discovers FK names via sa.inspect() with guard for missing name entries - Per-row exception handler now logs full traceback via logger.exception() before appending summary to result.errors - Analysis method tracking scoped to scenario sample_ids - Removed incorrect issue reference from test comment Refs #558 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Implements a radionuclides chemistry backfill pipeline that converts legacy NMA_Radionuclides rows into the Ocotillo Observation schema (plus related Sample metadata), wires it into the backfill orchestrator, and adds BDD coverage + schema/migration/lexicon updates needed to support the new fields.
Changes:
- Added a radionuclides backfill implementation with per-row savepoints, idempotent upserts, and Notes creation.
- Added DB/schema support for chemistry audit keys + metadata (
nma_pk_chemistryresults,detect_flag,uncertainty,analysis_agency,nma_pk_chemistrysample,volume,volume_unit) via Alembic + models + API schemas. - Added/updated BDD steps and test cleanup logic; extended lexicon with
pCi/LandChemistry Observation.
Reviewed changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
transfers/backfill/chemistry_backfill.py |
Implements the radionuclides backfill (Sample resolution, Observation upsert, volume + Notes handling). |
transfers/backfill/backfill.py |
Registers the radionuclides backfill in the orchestrator and improves logging/output. |
tests/features/steps/chemistry-backfill.py |
Adds Behave step definitions and helpers for radionuclides backfill scenarios. |
tests/features/nma-chemistry-radionuclides-refactor.feature |
Minor scenario adjustments for radionuclides backfill BDD feature. |
tests/features/environment.py |
Adds per-scenario cleanup for chemistry backfill fixtures. |
db/sample.py |
Adds chemistry audit/sample metadata columns to the ORM model. |
db/observation.py |
Adds chemistry audit + metadata columns to the ORM model. |
schemas/sample.py |
Exposes new Sample chemistry fields in the API response schema. |
schemas/observation.py |
Exposes new Observation chemistry fields in the API response schema. |
alembic/versions/545a5b77e5e8_add_chemistry_backfill_columns_to_.py |
Migration adding columns/constraints and dropping legacy thing_id from NMA_Radionuclides. |
core/lexicon.json |
Adds lexicon terms for pCi/L and Chemistry Observation. |
run_backfill.sh |
Updates backfill runner invocation to match the orchestrator CLI changes. |
…ow catch - Volume/volume_unit test assertions query by scenario sample_ids - existing_keys set lowercases DB values to match global_id_str normalization - Per-row catch broadened to Exception with logger.exception() for tracebacks - Analysis method tracking scoped to scenario samples - Migration FK drop guarded against missing name entries - Resolve stash merge conflicts in sample volume steps Refs #558 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
jirhiker
left a comment
There was a problem hiding this comment.
Looks good from the description. @ksmuczynski should review re database schema
|
@jirhiker @kbighorse Great, thanks. Will take a closer look as soon as the well inventory ingestion work is completed. |
Snapshot AnalysisMethod IDs before/after backfill to track only actually-created methods, preventing deletion of pre-existing rows. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Delete context._backfill_created in finally block to prevent ID accumulation across scenarios causing redundant deletes. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add ORDER BY id to legacy query so lowest-PK row wins when multiple radionuclide rows map to the same Sample (volume/volume_unit) - Skip overwrites with warning when a conflict is detected - Reject unknown CLI args in backfill entrypoint instead of silently ignoring them Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Skip pg_cron extension creation gracefully when unavailable instead of hard-failing, unblocking local dev and test environments. Refs #576 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Skip pg_cron extension creation gracefully when unavailable instead of hard-failing, unblocking local dev and test environments. Also skip search vector trigger sync for tables that don't exist yet (e.g. asset) to avoid NoSuchTableError during test setup. Refs #576 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ntegrationGroup/OcotilloAPI into 558-radionuclides-backfill
|
Stale PR- closing. Will consider reopening when refactoring chemistry data becomes a priority. |
Summary
NMA_Radionuclides→Observationrecords with chemistry metadataobservation(nma_pk_chemistryresults,detect_flag,uncertainty,analysis_agency) andsample(nma_pk_chemistrysample,volume,volume_unit)thing_idFK fromNMA_Radionuclides(relationships now go throughNMA_Chemistry_SampleInfo)pCi/Lunit andChemistry Observationnote type to lexiconRobustness
AnalysisDateare skipped (logged as errors) instead of using sentinel valuesinformation_schemabefore dropping columns/FKs)thing_id(nullable → backfill → NOT NULL)logger.exception()for per-row failuresTest safety
sample_ids(no global deletes)Closes #558
Refs #565, #566, #567, #568, #570, #571
Test plan
alembic upgrade headon a fresh DB — migration applies cleanlyalembic downgrade -1— migration rolls back cleanly🤖 Generated with Claude Code