Skip to content

NO TICKET: thing update and fix#179

Merged
9 commits merged into
stagingfrom
jab-thing-update-fix
Oct 10, 2025
Merged

NO TICKET: thing update and fix#179
9 commits merged into
stagingfrom
jab-thing-update-fix

Conversation

@jacob-a-brown

Copy link
Copy Markdown
Contributor

Why

This PR addresses the following problem / context:

  • a ThingResponse should always have an active_location attribute, no matter if that response is from the /thing router or contained in another response, like the ContactResponse
  • all thing fields should be tested
  • depths associated with a well should be validated against each other

How

Implementation summary - the following was changed / added / removed:

  • To ensure that every thing object has an active_location property I eagerly load location_associations for each thing, and for each location_association I eagerly load the location. By doing this I can get set a thing's active_location property as
@property
def active_location(self):
    """
    Returns the currently active Location by sorting the effective_start
    field. Thing eagerly loads location_association, which eagerly loads
    location, which will hopefully prevent N+1 query problems.
    """
    active_location = sorted(
        self.location_associations, key=lambda x: x.effective_start
    )
    return active_location[0].location if active_location else None
  • added missing fields to thing schemas and tests
  • added some validations for well_depth, hole_depth, and well_casing_depth and tested those validations

Notes

Any special considerations, workarounds, or follow-up work to note?

  • Because so many updates have been made the well and water level transfer scripts have been broken. After this PR has been amended, merged, and accepted I'll open a PR for both water level and thing transfer updates. Then I'll work on adding more metadata to the thing model
  • because I eagerly load as lazy="joined" I needed to invoke .unique() when using SQL Alchemy's select (not necessary for query)
  • because a thing's active_location is now part of the object functions they don't need to be retrieved in thing_helper functions

when a different object needs to call a thing, like a contact,
the `active_location` should be present in the thing response. this
is achieved by eagerly loading the `active_location`
@codecov-commenter

codecov-commenter commented Oct 8, 2025

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 99.04762% with 1 line in your changes missing coverage. Please review.
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
services/thing_helper.py 85.71% 1 Missing ⚠️
Files with missing lines Coverage Δ
api/search.py 97.77% <100.00%> (ø)
api/thing.py 98.46% <ø> (ø)
db/location.py 95.00% <100.00%> (ø)
db/thing.py 100.00% <100.00%> (ø)
schemas/thing.py 99.15% <100.00%> (ø)
services/geospatial_helper.py 74.46% <100.00%> (ø)
tests/conftest.py 98.59% <100.00%> (ø)
tests/test_geospatial.py 88.23% <100.00%> (ø)
tests/test_observation.py 95.65% <ø> (ø)
tests/test_thing.py 100.00% <100.00%> (ø)
... and 1 more

@TylerAdamMartinez TylerAdamMartinez left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, lgtm

Comment thread schemas/thing.py


class ValidateWell(BaseModel):
well_depth: float | None = None # in feet

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
well_depth: float | None = None # in feet
well_depth_in_feet: float | None = Field(None, alias="well_depth")

This change would improve clarity by using _in_feet to make units explicit, reducing ambiguity and preventing any future unit-mixing errors, while the alias preserves database compatibility. It also makes the API more 'self-documenting.'

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than add the unit to the field name I can add another field called well_depth_unit and have it automatically be set to ft. Awhile ago we made the decision to not include units in the names, so I'll proceed with this method to keep a consistent style.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fields can have descriptions, so I'll add that to document it for create/update

Comment thread schemas/thing.py Outdated
well_depth: float | None = None # in feet
hole_depth: float | None = None # in feet
well_construction_notes: str | None = None
well_casing_diameter: float | None = None # in inches

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same thing here. Could you add the same level of expressiveness by including _in_inches to make the units explicit and consistent with the other fields?

Comment thread db/thing.py
"""
Returns the currently active Location by sorting the effective_start
field. Thing eagerly loads location_association, which eagerly loads
location, which will hopefully prevent N+1 query problems.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤞

Comment thread db/thing.py
cascade="all, delete-orphan",
passive_deletes=True,
order_by="LocationThingAssociation.effective_start.desc()",
lazy="joined",

@ksmuczynski ksmuczynski Oct 8, 2025

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't this fetch the entire location history for a Thing every time you load it? I would think we would only want a list of historical locations if it's explicitly asked for, which is what the default lazy loading provides.

I understand lazy loading can cause N+1 query problems if you loop through many Things and access the history for each one, but is that really something that will be done very often?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That does load the entire history location for a Thing every time thing records are returned from the database. To access the active (current?) location, though, we need to have access to all location associations so that we can get the latest one (currently based off of effective_start).

If the user gets a list of things, whether from /thing, /thing/water-well, or /thing/spring, extra queries will need to be performed for every object in that list. Weaver, for example, fetches many thousands of wells at a time, so if we use lazy loading there'd be many thousands of extra queries made, which would impact the performance of the API.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I apologize if I'm misinterpreting this discussion, but is the current/active location something that we can cache on the server side so that the location history query doesn't have to happen every time a list of things is requested (with a reasonable TTL)? Along those same lines - the many thousands of things with active locations potentially most relevant to the /geospatial endpoint for requesting many geojsons on a map or in a shapefile all at once (the Weaver story you highlighted - eventually maps in Ocotillo). So perhaps a cache or precalculated results would help with that once we arrive at working on that type of user story?

@jacob-a-brown jacob-a-brown Oct 8, 2025

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By eagerly loading the location associations only one query is made; if done correctly it shouldn't impact the performance of the API. Where it can run into issues, I think, is if there is too much data being transferred. I don't think that's too much of an issue for the time being since the LocationThingAssociation and Location tables are pretty small, but it could become a problem in the future.

There are other eager loading techniques I can look into, like requesting specific columns (I think), but right now I think all of a Location's fields are in the response.

Would the caching technique be difficult to implement?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm checking with CoPilot if there are methods to keep the active/current location as an attribute, but avoid eager loading and N+1. If I can get that to work I'll push these updates

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like the two best options, if we want to take care of this via the API, are:

  1. use eager loading (new method)
  2. write explicit queries (previous method)

The issues with 1 is if too much data is loaded. That is, if a thing has hella associations. Wells only have one location, so this is a moot point for the time being. Rain gages may move, but do they move all that much? Would other types of things have many location associations?

The issue with 2 is that this needs to be implemented every place where a thing is needed, which is cumbersome and error-prone. (This was how I first identified the issue because a contact response has its associated things, and none of those had active locations loaded)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like you're on a good track then 👍. We don't need to make anything more complicated when we don't have a concrete use case or story showing otherwise.

Comment thread db/thing.py Outdated
)

@property
def active_location(self):

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you feel about current_location instead of active_location?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm okay with either name. @jirhiker do you have a preference?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

current_location is fine. Its perhaps less ambiguous than active_location

Comment thread db/thing.py Outdated
Comment thread db/thing.py Outdated
location, which will hopefully prevent N+1 query problems.
"""
active_location = sorted(
self.location_associations, key=lambda x: x.effective_start

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would sorting by effective_end have any advantages? An effective_end of NULL indicates an active/current location. Just curious since that's how I was imagining querying for current location.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In updating the retrieval function i also made sure that the effective_end is None

return (
    active_location[0].location
    if active_location and active_location[0].effective_end is None
    else None
)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My concern would be someone would forget to set effective_end and since effective_start is required I thought that'd be a safer way to sort

@jacob-a-brown jacob-a-brown requested review from TylerAdamMartinez, chasetmartin and ksmuczynski and removed request for chasetmartin October 8, 2025 20:59
@jacob-a-brown

Copy link
Copy Markdown
Contributor Author

@jirhiker @ksmuczynski @chasetmartin @TylerAdamMartinez

In making these updates I realized that the /geospatial endpoint would return multiple records for a thing if it had multiple location associations. I amended that section of the code (/services/geospatial_helper.py) so that if there are multiple location associations only the current/active geometry is used. This wasn't in the original PR and should be reviewed if you are re-reviewing.

@jirhiker

jirhiker commented Oct 8, 2025

Copy link
Copy Markdown
Member

@jacob-a-brown @chasetmartin @ksmuczynski What use case or UI feature leads to this statement "a ThingResponse should always have an active_location attribute, no matter if that response is from the /thing router or contained in another response"

@jacob-a-brown

jacob-a-brown commented Oct 8, 2025

Copy link
Copy Markdown
Contributor Author

Upon reflection I wasn't really thinking about a UI feature. I wanted to get everything standardized so that every time the API needs a particular response or object (e.g. ThingResponse) it will always have information (for the same record), no matter where it is invoked. I thought that if something was missing and it needed to be used elsewhere that it would be difficult to track down, so when I ran into this issue I went ahead and made the updates. Looking at the contact table in Ocotillo, clicking on the thing takes you to the detail page, so active_location (soon to be current) is truly needed there.

@jacob-a-brown

jacob-a-brown commented Oct 8, 2025

Copy link
Copy Markdown
Contributor Author

In light of those needs (or lack thereof), is a reversion recommended? Or should I keep eager loading for standardizing the ThingResponse every time it is invoked?

@jirhiker

jirhiker commented Oct 9, 2025

Copy link
Copy Markdown
Member

I think your intuition is correct with regards to having a consistent ThingResponse. Can you do some performance testing to ensure that this does not have any negative effects

@jacob-a-brown

jacob-a-brown commented Oct 9, 2025

Copy link
Copy Markdown
Contributor Author

I did some testing locally and don't see any impacts, though sometimes doing these things locally can mask them (I'm still haunted by expand, but that was due to an N+1 issue [which I'm trying to avoid with eager loading])

@jirhiker jirhiker closed this pull request by merging all changes into staging in 75a2193 Oct 10, 2025
@TylerAdamMartinez TylerAdamMartinez deleted the jab-thing-update-fix branch February 5, 2026 18:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants