BDMS-221-225: Polymorphic DataProvenance model#245
Conversation
The current schema lacks a way to store and track provenance (origin) data across the database. Created db/data_provenance.py with a polymorphic DataProvenance model for tracking foundational metadata across tables. Added mixin DataProvenanceMixin to db/base.py for reusable polymorphic relationships. Improved documentation and comments in db/base.py for mixins and helper functions.
Introduced DataProvenanceMixin to the `Thing` and `Location` models to enable reusable, efficient, polymorphic relationships to the DataProvenance table.
| collection_method: Mapped[str] = lexicon_term( | ||
| nullable=True, | ||
| comment="Indicates the method used to collect the data (e.g., 'GPS - Survey Grade').", | ||
| ) | ||
| # TODO: Values from the following NMAquifer tables should be included as terms in the lexicon: 'LU_CoordinateAccuracy'. | ||
| accuracy_value: Mapped[float] = mapped_column( | ||
| nullable=True, comment="A numeric value representing the data's accuracy." | ||
| ) | ||
| # TODO: Values from the following NMAquifer tables should be included as terms in the lexicon: 'LU_CoordinateAccuracy'. | ||
| accuracy_unit: Mapped[str] = lexicon_term( | ||
| nullable=True, | ||
| comment="The unit for the `accuracy_value` (e.g., 'meters', 'feet').", | ||
| ) |
There was a problem hiding this comment.
these fields are used for a small subset of fields in a subset of tables. should these move to those tables, rather than be here? since they won't apply to a number of fields for which DataProvenance will be used.
There was a problem hiding this comment.
Maybe, but I saw this as one of the advantages of the Provenance model - all the sparse, optional, and evolving metadata is organized in one central place. If we move these fields to the Location table we'd have to add even more fields (coordinate_accuracy, coordinate_collection_method, coordinate_accuracy_value, coordiante_accuracy_unit, plus the same ones for elevation). I think storing this type of metadata is more efficient with the DataProvenance table, but will let @jirhiker weigh in, too.
There was a problem hiding this comment.
It seems appropriate to store these fields here. We can reevaluate later if user requirements dictate
The database tables are snake_case, so for consistency and ease of debugging, the `target_table` values should also use snake_case.
Refined the _thing_target and _location_target relationships to ensure DataProvenance.target_table uses snake_case ('thing', 'location') for the target table name.
… for class-level usage - Relocated DataProvenanceMixin from base.py to data_provenance.py for better modularity and provenance management. - Refactored mixin to use cls in @declared_attr for proper class-level relationship definition.
…nformation. - Added new `origin_source` and `collection_method` categories and terms. - Added 'meters' as a term associated with the `unit` category. - Added `OriginStatus` to `enums.py`.
…nformation. - Added new `origin_source` and `collection_method` categories and terms. - Added 'meters' as a term associated with the `unit` category. - Added `OriginStatus` to `enums.py`.
Why
This PR addresses the following problem / context:
How
Implementation summary - the following was changed / added / removed:
Notes
Any special considerations, workarounds, or follow-up work to note?