NO TICKET: thing update and fix#179
Conversation
when a different object needs to call a thing, like a contact, the `active_location` should be present in the thing response. this is achieved by eagerly loading the `active_location`
Codecov Report❌ Patch coverage is
|
|
|
||
|
|
||
| class ValidateWell(BaseModel): | ||
| well_depth: float | None = None # in feet |
There was a problem hiding this comment.
| well_depth: float | None = None # in feet | |
| well_depth_in_feet: float | None = Field(None, alias="well_depth") |
This change would improve clarity by using _in_feet to make units explicit, reducing ambiguity and preventing any future unit-mixing errors, while the alias preserves database compatibility. It also makes the API more 'self-documenting.'
There was a problem hiding this comment.
Rather than add the unit to the field name I can add another field called well_depth_unit and have it automatically be set to ft. Awhile ago we made the decision to not include units in the names, so I'll proceed with this method to keep a consistent style.
There was a problem hiding this comment.
Fields can have descriptions, so I'll add that to document it for create/update
| well_depth: float | None = None # in feet | ||
| hole_depth: float | None = None # in feet | ||
| well_construction_notes: str | None = None | ||
| well_casing_diameter: float | None = None # in inches |
There was a problem hiding this comment.
Same thing here. Could you add the same level of expressiveness by including _in_inches to make the units explicit and consistent with the other fields?
| """ | ||
| Returns the currently active Location by sorting the effective_start | ||
| field. Thing eagerly loads location_association, which eagerly loads | ||
| location, which will hopefully prevent N+1 query problems. |
| cascade="all, delete-orphan", | ||
| passive_deletes=True, | ||
| order_by="LocationThingAssociation.effective_start.desc()", | ||
| lazy="joined", |
There was a problem hiding this comment.
Won't this fetch the entire location history for a Thing every time you load it? I would think we would only want a list of historical locations if it's explicitly asked for, which is what the default lazy loading provides.
I understand lazy loading can cause N+1 query problems if you loop through many Things and access the history for each one, but is that really something that will be done very often?
There was a problem hiding this comment.
That does load the entire history location for a Thing every time thing records are returned from the database. To access the active (current?) location, though, we need to have access to all location associations so that we can get the latest one (currently based off of effective_start).
If the user gets a list of things, whether from /thing, /thing/water-well, or /thing/spring, extra queries will need to be performed for every object in that list. Weaver, for example, fetches many thousands of wells at a time, so if we use lazy loading there'd be many thousands of extra queries made, which would impact the performance of the API.
There was a problem hiding this comment.
I apologize if I'm misinterpreting this discussion, but is the current/active location something that we can cache on the server side so that the location history query doesn't have to happen every time a list of things is requested (with a reasonable TTL)? Along those same lines - the many thousands of things with active locations potentially most relevant to the /geospatial endpoint for requesting many geojsons on a map or in a shapefile all at once (the Weaver story you highlighted - eventually maps in Ocotillo). So perhaps a cache or precalculated results would help with that once we arrive at working on that type of user story?
There was a problem hiding this comment.
By eagerly loading the location associations only one query is made; if done correctly it shouldn't impact the performance of the API. Where it can run into issues, I think, is if there is too much data being transferred. I don't think that's too much of an issue for the time being since the LocationThingAssociation and Location tables are pretty small, but it could become a problem in the future.
There are other eager loading techniques I can look into, like requesting specific columns (I think), but right now I think all of a Location's fields are in the response.
Would the caching technique be difficult to implement?
There was a problem hiding this comment.
I'm checking with CoPilot if there are methods to keep the active/current location as an attribute, but avoid eager loading and N+1. If I can get that to work I'll push these updates
There was a problem hiding this comment.
Seems like the two best options, if we want to take care of this via the API, are:
- use eager loading (new method)
- write explicit queries (previous method)
The issues with 1 is if too much data is loaded. That is, if a thing has hella associations. Wells only have one location, so this is a moot point for the time being. Rain gages may move, but do they move all that much? Would other types of things have many location associations?
The issue with 2 is that this needs to be implemented every place where a thing is needed, which is cumbersome and error-prone. (This was how I first identified the issue because a contact response has its associated things, and none of those had active locations loaded)
There was a problem hiding this comment.
Sounds like you're on a good track then 👍. We don't need to make anything more complicated when we don't have a concrete use case or story showing otherwise.
| ) | ||
|
|
||
| @property | ||
| def active_location(self): |
There was a problem hiding this comment.
How do you feel about current_location instead of active_location?
There was a problem hiding this comment.
I'm okay with either name. @jirhiker do you have a preference?
There was a problem hiding this comment.
current_location is fine. Its perhaps less ambiguous than active_location
| location, which will hopefully prevent N+1 query problems. | ||
| """ | ||
| active_location = sorted( | ||
| self.location_associations, key=lambda x: x.effective_start |
There was a problem hiding this comment.
Would sorting by effective_end have any advantages? An effective_end of NULL indicates an active/current location. Just curious since that's how I was imagining querying for current location.
There was a problem hiding this comment.
In updating the retrieval function i also made sure that the effective_end is None
return (
active_location[0].location
if active_location and active_location[0].effective_end is None
else None
)There was a problem hiding this comment.
My concern would be someone would forget to set effective_end and since effective_start is required I thought that'd be a safer way to sort
|
@jirhiker @ksmuczynski @chasetmartin @TylerAdamMartinez In making these updates I realized that the |
|
@jacob-a-brown @chasetmartin @ksmuczynski What use case or UI feature leads to this statement "a ThingResponse should always have an active_location attribute, no matter if that response is from the /thing router or contained in another response" |
|
Upon reflection I wasn't really thinking about a UI feature. I wanted to get everything standardized so that every time the API needs a particular response or object (e.g. |
|
In light of those needs (or lack thereof), is a reversion recommended? Or should I keep eager loading for standardizing the |
|
I think your intuition is correct with regards to having a consistent ThingResponse. Can you do some performance testing to ensure that this does not have any negative effects |
|
I did some testing locally and don't see any impacts, though sometimes doing these things locally can mask them (I'm still haunted by |
Why
This PR addresses the following problem / context:
ThingResponseshould always have anactive_locationattribute, no matter if that response is from the/thingrouter or contained in another response, like theContactResponsethingfields should be testedHow
Implementation summary - the following was changed / added / removed:
thingobject has anactive_locationproperty I eagerly loadlocation_associationsfor eachthing, and for eachlocation_associationI eagerly load thelocation. By doing this I can get set athing'sactive_locationproperty asthingschemas and testswell_depth,hole_depth, andwell_casing_depthand tested those validationsNotes
Any special considerations, workarounds, or follow-up work to note?
thingmodellazy="joined"I needed to invoke.unique()when using SQL Alchemy'sselect(not necessary forquery)thing'sactive_locationis now part of the object functions they don't need to be retrieved inthing_helperfunctions