Skip to content

Timestamp proposal#264

Merged
43 commits merged into
stagingfrom
timestamp_proposal
Dec 3, 2025
Merged

Timestamp proposal#264
43 commits merged into
stagingfrom
timestamp_proposal

Conversation

@kbighorse

@kbighorse kbighorse commented Nov 27, 2025

Copy link
Copy Markdown
Contributor

Timestamp Field Migration: AMPAPI → NMSampleLocations

Status: Phase 1 (Location AMPAPI Date Fields) - Implemented


Overview

Analysis of AMPAPI timestamp fields revealed that critical temporal data would be lost during migration. This guide documents the Location AMPAPI date fields implemented in Phase 1 using the nma_ prefix (New Mexico Aquifer Mapping).


Implemented Fields (Phase 1)

1. Location.nma_date_created (Date)

Source: AMPAPI Location.DateCreated

  • Population: 100% (39,783 records)
  • Purpose: Preserve when location record was created in AMPAPI database
  • Type: Date (day-precision, no timezone)
  • Nullable: True (null for new records, populated only during migration)
  • Schema Inclusion: LocationResponse only (NOT in CreateLocation or UpdateLocation - read-only)
  • Rationale: Keep AutoBaseMixin.created_at as true audit timestamp for new system; store AMPAPI creation date separately

2. Location.nma_site_date (Date)

Source: AMPAPI Location.SiteDate

  • Population: 9.14% (3,638 records)
  • Purpose: Date when site was first visited/inventoried in the field (AMPAPI migration data only)
  • Type: Date (day-precision, no timezone)
  • Nullable: True (null for new records, populated only during migration)
  • Schema Inclusion: LocationResponse only (NOT in CreateLocation or UpdateLocation - read-only)
  • Rationale: Named nma_site_date using AMPAPI/NMA prefix to clarify this is migration data, not an active field in the new system
  • Critical finding: Time gaps between field inventory and database entry range up to 54 years
    • Maximum gap: 19,751 days (Site SM-0227: inventoried 1954, entered 2008)
    • Mean gap: 683 days (1.9 years)
    • 3,638 historical inventory dates would be permanently lost without this field

Data Analysis Summary

Location Date Relationships (3,638 records with both dates)

Relationship Count Percentage
DateCreated later (normal) 3,125 85.9%
SiteDate later (anomaly) 459 12.6%
Dates equal 54 1.5%

Top 5 Largest Time Gaps

PointID DateCreated SiteDate Gap (years)
SM-0227 2008-05-28 1954-05-01 54.1
PP-082 2017-12-06 2003-12-11 14.0
EB-229 2014-04-03 2002-12-10 11.3
EB-257 2014-04-03 2003-01-07 11.2
EB-253 2014-04-03 2003-01-07 11.2

Design Decisions

Why Date instead of DateTime?

  • AMPAPI data contains only date components (all times are 00:00:00)
  • Avoids unnecessary timezone complexity
  • Semantically accurate for day-precision data
  • Consistent with equipment deployment date patterns

Why separate nma_date_created from created_at?

  • Preserves AutoBaseMixin.created_at semantics (when record created in new system)
  • Clear distinction between new system fields and AMPAPI migration data
  • Prevents polluting audit trail with historical dates
  • Maintains data integrity for both systems

Why nma_site_date?

  • The nma_ prefix clarifies these are AMPAPI/NMA-specific fields (not active fields in the new system)
  • Users cannot set this field when creating new locations
  • Maps clearly to source field name (SiteDate)
  • The new system uses Thing.first_visit_date for ongoing field visit tracking

Location.nma_site_date vs Thing.first_visit_date

These fields serve different purposes and live on different entities:

Field Entity Purpose Active/Migration Semantic Meaning
nma_site_date Location AMPAPI migration data Migration-only (read-only) When the geographic site was first inventoried (before Thing existed)
first_visit_date Thing Ongoing field tracking Active field (read-write) First NMBGMR visit to this specific Thing (well, spring, etc.)

Key Differences:

  1. Entity Level:

    • nma_site_date is on Location (just a geographic point)
    • first_visit_date is on Thing (a monitored physical object at that location)
  2. Temporal Context:

    • nma_site_date: Historical site reconnaissance (may predate well installation)
    • first_visit_date: First interaction with the specific Thing being monitored
  3. Data Population:

    • nma_site_date: Only 9% populated, from AMPAPI migration, read-only
    • first_visit_date: Actively maintained for new Things, user-editable
  4. Use Cases:

    • nma_site_date: Historical research, understanding data gaps from legacy system
    • first_visit_date: Current monitoring workflow, field visit scheduling

Example Scenario:

  • A site was inventoried in 1954 (nma_site_date = 1954-05-01)
  • NMBGMR first visited that location in 2003 (first_visit_date = 2003-12-10)
  • The location record was created in AMPAPI in 2008 (nma_date_created = 2008-05-28)
  • Timeline: 1954 → 2003 → 2008

Post-Migration Write Permissions

Location AMPAPI date fields are read-only (migration-only):

Field Can Users Set Post-Migration? Reason
Location.nma_date_created NO - Read-only AMPAPI audit data only
Location.nma_site_date NO - Read-only AMPAPI field inventory data only

Implementation Constraints:

  • nma_date_created and nma_site_date MUST NOT be in CreateLocation or UpdateLocation schemas (read-only)
  • Both fields are queryable and appear in LocationResponse schema
  • Location AMPAPI date fields are populated ONLY during AMPAPI migration
  • Fields use nma_ prefix (New Mexico Aquifer Mapping) to indicate AMPAPI provenance

Data Provenance:

  • Migrated locations: Have non-null nma_date_created, may have nma_site_date
  • New locations (post-migration): Both AMPAPI date fields are null

Implementation Notes

GeoJSON Response Structure

Issue: AMPAPI date fields need to be accessible in nested GeoJSON responses.

Root Cause:

  • Thing responses include current_location as a LocationGeoJSONResponse
  • GeoJSON Feature format nests data under properties
  • AMPAPI date fields were not being populated in the GeoJSON properties

Solution (schemas/location.py:110-112, 158-159):

class GeoJSONProperties(BaseModel):
    # ... existing fields
    # AMPAPI date fields (read-only, populated only during migration)
    nma_date_created: date | None = None
    nma_site_date: date | None = None

# In model_validator:
data_dict["properties"]["nma_date_created"] = data_dict.get("nma_date_created")
data_dict["properties"]["nma_site_date"] = data_dict.get("nma_site_date")

Impact:

  • AMPAPI date fields accessible at response["current_location"]["properties"]["nma_date_created"]
  • Maintains GeoJSON Feature specification compliance
  • Consistent with other location metadata (elevation, UTM coordinates, etc.)

Created At Field Removal

Issue: Transfer scripts were manually setting created_at from AMPAPI dates.

Root Cause:

  • Old logic combined DateCreated and SiteDate to determine created_at
  • This polluted the audit trail with historical timestamps
  • AutoBaseMixin.created_at should reflect when record is created in NMSampleLocations, not AMPAPI

Solution (transfers/util.py:247-268):

# REMOVED: Old created_at calculation logic
# - No longer parsing DateCreated/SiteDate to determine created_at
# - No longer converting MT to UTC
# - No longer passing created_at to Location constructor

# NEW: AMPAPI date fields stored separately (read-only post-migration)
nma_date_created = None
if row.DateCreated:
    nma_date_created = datetime.strptime(row.DateCreated, "%Y-%m-%d %H:%M:%S.%f").date()

nma_site_date = None
if row.SiteDate:
    nma_site_date = datetime.strptime(row.SiteDate, "%Y-%m-%d %H:%M:%S.%f").date()

location = Location(
    # created_at NOT SET - will be auto-populated by AutoBaseMixin on save
    nma_date_created=nma_date_created,
    nma_site_date=nma_site_date,
)

Impact:

  • created_at is None during transfer (set automatically when saved to database)
  • Clear separation between migration timestamps and system audit timestamps
  • AMPAPI dates preserved in separate read-only fields for historical reference

Fields NOT Being Migrated

Field Population Reason
WLReportDeliver 1.80% (715 records) Below 5% threshold, low user value
ChemistryReportDeliver 1.11% (441 records) Below 5% threshold, low user value

Impact Assessment

Without These Fields (Phase 1)

  • 3,638 inventory dates permanently lost (9.14% of locations)
  • 39,783 AMPAPI creation dates lost (100% of locations)
  • 54 years of historical context for some sites unrecoverable
  • ❌ Cannot distinguish field work dates from database entry dates
  • ❌ No way to trace when records were originally entered into AMPAPI

With These Fields (Phase 1)

  • ✅ Zero data loss from AMPAPI system for Location date fields
  • ✅ Clear semantic separation (field work vs database management vs system audit)
  • ✅ Support historical research and analysis
  • ✅ Maintain complete temporal provenance for locations
  • nma_ prefix clearly indicates AMPAPI provenance

@codecov-commenter

codecov-commenter commented Nov 27, 2025

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 95.10490% with 7 lines in your changes missing coverage. Please review.
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
transfers/util.py 46.15% 7 Missing ⚠️
Files with missing lines Coverage Δ
db/location.py 88.63% <100.00%> (-6.37%) ⬇️
schemas/location.py 96.15% <100.00%> (+3.56%) ⬆️
schemas/thing.py 98.15% <100.00%> (+0.48%) ⬆️
tests/test_location.py 100.00% <100.00%> (ø)
tests/test_transfer_legacy_dates.py 100.00% <100.00%> (ø)
transfers/util.py 31.35% <46.15%> (+4.68%) ⬆️

... and 21 files with indirect coverage changes

@kbighorse kbighorse requested a review from Copilot November 27, 2025 01:09

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR proposes the addition of three new date fields to preserve critical temporal data during migration from the legacy AMPAPI system to NMSampleLocations: Location.legacy_date_created, Location.inventoried_on, and Thing.well_completed_on. Analysis revealed that without these fields, 3,638 historical inventory dates (9.14% of locations) and 11,634 well completion dates (30% of wells) would be permanently lost, including historical context spanning up to 54 years.

Key Changes:

  • Enhanced CSV reading utility to check local data directory before cache/GCS
  • Added comprehensive test scenarios for well completion date field functionality
  • Added comprehensive test scenarios for location legacy date field functionality

Reviewed changes

Copilot reviewed 3 out of 4 changed files in this pull request and generated no comments.

File Description
transfers/util.py Modified CSV reading logic to prioritize local data directory, improving development workflow and data access patterns
tests/features/well-completion-date.feature Added BDD scenarios covering creation, retrieval, updates, and filtering of wells with completion dates
tests/features/location-legacy-dates.feature Added BDD scenarios for location legacy date fields, including edge cases like 54-year gaps between inventory and database entry

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread db/thing.py Outdated
Comment thread schemas/location.py Outdated
Comment on lines +157 to +161
# populate legacy date fields
data_dict["properties"]["legacy_date_created"] = data_dict.get(
"legacy_date_created"
)
data_dict["properties"]["legacy_site_date"] = data_dict.get("legacy_site_date")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there are no features in feature files indicating that this data is required. should it be returned in the response? or just left in the database for audits/when folks are interested?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: create a feature file for this

Comment thread schemas/thing.py Outdated
Comment thread schemas/thing.py Outdated
Comment thread schemas/thing.py Outdated

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this file go into the repo OcotilloBDD rather than here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: Move most features in OcotilloBDD into this repo going forward.

Comment thread transfers/well_transfer.py Outdated
Comment thread db/location.py Outdated
@jirhiker

jirhiker commented Dec 2, 2025

Copy link
Copy Markdown
Member

I appreciate the structure and comprehensiveness of this PR. I think it would benefit from some additional context regarding the history of NMAquifer and how its been used in the past.

Copilot AI review requested due to automatic review settings December 3, 2025 06:41

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 11 out of 12 changed files in this pull request and generated 19 comments.

Comment thread tests/features/steps/post_migration_legacy_data.py Outdated
Comment thread tests/features/post-migration-legacy-data-retrieval.feature Outdated
Comment thread tests/features/steps/post_migration_legacy_data.py Outdated
Comment thread tests/features/steps/post_migration_legacy_data.py
Comment thread tests/features/steps/post_migration_legacy_data.py Outdated
Comment thread tests/features/steps/post_migration_legacy_data.py Outdated
Comment thread tests/features/steps/post_migration_legacy_data.py Outdated
Comment thread tests/features/steps/post_migration_legacy_data.py Outdated
Comment thread tests/features/steps/post_migration_legacy_data.py Outdated
Comment thread tests/features/post-migration-legacy-data-retrieval.feature Outdated
Copilot AI review requested due to automatic review settings December 3, 2025 06:55

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Copilot AI review requested due to automatic review settings December 3, 2025 09:12

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 10 changed files in this pull request and generated 4 comments.

Comment thread tests/features/post-migration-legacy-data-retrieval.feature
Comment thread tests/features/post-migration-legacy-data-retrieval.feature Outdated
Comment thread tests/test_transfer_legacy_dates.py Outdated
Comment thread tests/features/steps/post_migration_legacy_data.py Outdated
Copilot AI review requested due to automatic review settings December 3, 2025 09:25

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 10 changed files in this pull request and generated 2 comments.

Comment thread tests/features/post-migration-legacy-data-retrieval.feature
Comment thread tests/features/steps/post_migration_legacy_data.py Outdated
Copilot AI review requested due to automatic review settings December 3, 2025 09:46

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 10 out of 12 changed files in this pull request and generated 2 comments.

Comment thread transfers/util.py,cover Outdated
Comment thread tests/test_thing.py Outdated
Copilot AI review requested due to automatic review settings December 3, 2025 09:54

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 9 changed files in this pull request and generated 4 comments.

Comment thread tests/features/post-migration-legacy-data-retrieval.feature
Comment thread tests/features/steps/post_migration_legacy_data.py Outdated
Comment thread transfers/util.py Outdated
Comment thread transfers/util.py
Comment thread transfers/util.py
AMP folks said that the earlier date between DateCreated and SiteDate is when
the site was inventoried, whereas the later is when the record was made in
the database. This was because they were used interchangeably.
"""

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the vexing logic this PR is avoiding now.

@@ -0,0 +1,354 @@
# ===============================================================================

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: mark this as disposable post-migration

@jirhiker jirhiker closed this pull request by merging all changes into staging in 12c692c Dec 3, 2025
@kbighorse kbighorse deleted the timestamp_proposal branch December 8, 2025 20:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants