Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
83 commits
Select commit Hold shift + click to select a range
24fadc5
Merge branch 'feature/jir' into dev/jab
jacob-a-brown Sep 20, 2024
ed7a74a
Added note to README.md for new sources to make tests
jacob-a-brown Sep 20, 2024
2d8530f
Convert MultiPolygon to Polygon for ST connector
jacob-a-brown Sep 20, 2024
08bb9ce
bumped version
jirhiker Sep 20, 2024
5300e9a
handle multipolygons
jirhiker Sep 23, 2024
7c3a4c8
handle multipolygons
jirhiker Sep 23, 2024
eb9f923
Convert MultiPolygon to Polygon for ST2
jacob-a-brown Sep 23, 2024
3f0dd87
Merge branch 'feature/jir' into dev/jab
jacob-a-brown Sep 23, 2024
bf1b4c3
Merge branch 'dev/jir' into dev/jab
jacob-a-brown Sep 23, 2024
ec7a288
Added comment about contains in the transformer
jacob-a-brown Sep 24, 2024
5d7f451
Only change poly from MultiPolygon to Polygon by checking type
jacob-a-brown Sep 24, 2024
fb602c9
User site.id instead of site.name in reporting to be consistent
jacob-a-brown Sep 24, 2024
f9b771b
Start enumeration at 1 to get the correct number of sites for limits
jacob-a-brown Sep 24, 2024
3383a22
Only count sites toward site_limit if they have associated records
jacob-a-brown Sep 24, 2024
4ffa71e
Pass waterlevels tests
jacob-a-brown Sep 24, 2024
202b99a
Docstring for BaseTransformer._transform
jacob-a-brown Sep 25, 2024
7efceae
Changed config for BoR tests as it is in Otero County
jacob-a-brown Sep 25, 2024
20473bd
Added log and warn to transformer to communicate information to user
jacob-a-brown Sep 25, 2024
afda0b4
Updated docstring and type hints for BaseSiteSource._transform_sites
jacob-a-brown Sep 25, 2024
2d02e4a
Renamed parent_record site_record for clarity
jacob-a-brown Sep 25, 2024
b05098e
Disallow None values - including them breaks statistics/summaries
jacob-a-brown Nov 1, 2024
d9c1a13
Revert to old NMBGMR API until /v1 is live
jacob-a-brown Nov 1, 2024
7aa2d80
Ignore output files
jacob-a-brown Nov 1, 2024
b6d01b4
Enable user to specify single timeseries option for a single file
jacob-a-brown Nov 1, 2024
1807cd6
Add source, id, and parameter unit to timeseries files
jacob-a-brown Nov 1, 2024
084462b
Only add headers for first line of timeseries file
jacob-a-brown Nov 1, 2024
4d101df
Skip check() and discover() until implemented
jacob-a-brown Nov 1, 2024
8a8aa10
Enable user to choose separate timeseries files, or single timeseries…
jacob-a-brown Nov 1, 2024
9f4dbb5
Enable parameter record id to be same as site id | mg/L CaCO3 unit co…
jacob-a-brown Nov 4, 2024
cf09bee
Persist logs & warnings | Log site indices for user to track | Skip s…
jacob-a-brown Nov 4, 2024
5f6cc3b
Clean ST2 waterlevel records - only return if not None
jacob-a-brown Nov 5, 2024
02d045f
Keep non-numeric results if not summarize, keep for timeseries data
jacob-a-brown Nov 5, 2024
8e7ab1c
Add horizontal_datum to summary records
jacob-a-brown Nov 5, 2024
f422d78
Add horizontal_datum to summary records
jacob-a-brown Nov 5, 2024
bb5e4ee
Log if no sites are found
jacob-a-brown Nov 5, 2024
2d84c61
Catch value errors when converting units
jacob-a-brown Nov 5, 2024
5c093ff
Log/warn failed to convert units message
jacob-a-brown Nov 5, 2024
25cf6fa
Restrict WQP data to Water samples only
jacob-a-brown Nov 5, 2024
d65765c
Restrict WQP sites to New Mexico only
jacob-a-brown Nov 5, 2024
c277aab
Fixed statecode typo
jacob-a-brown Nov 5, 2024
dedc224
catch CaCO3 units edgecase for WQP data
jacob-a-brown Nov 5, 2024
16c6473
Provide site id for skipped converted unit items
jacob-a-brown Nov 5, 2024
05f16cb
Fixed error where records were added to time series that shouldve bee…
jacob-a-brown Nov 5, 2024
7c4ce17
Write configuration to output.logs.txt
jacob-a-brown Nov 5, 2024
8d9036c
Added missing string from config logs for export
jacob-a-brown Nov 5, 2024
777cfd0
Format/CICD for push to GitHub for branch dev/jab
jacob-a-brown Nov 5, 2024
4cd7154
Formatting changes
jacob-a-brown Nov 5, 2024
b7d4a0c
Removed redundant print message
jacob-a-brown Nov 5, 2024
bbc06bf
Clarify warning that no records were found for a site
jacob-a-brown Nov 6, 2024
b0b9e20
Add Silica analyte
jacob-a-brown Nov 6, 2024
01990ff
Silica mapping for all but dwb
jacob-a-brown Nov 6, 2024
064aa3c
Merge branch 'dev/jab' of https://github.com/DataIntegrationGroup/Dat…
jacob-a-brown Nov 6, 2024
a250b91
Conversion factor=1 for CaCO3 if unit is mg/L as CaCO3, else warn user
jacob-a-brown Nov 7, 2024
7933b35
DWB analyte mappings
jacob-a-brown Nov 7, 2024
7015139
Formatting changes
jacob-a-brown Nov 7, 2024
3c00090
Update conversion factor for HCO3- when unit are mg/L as CaCO3
jacob-a-brown Nov 11, 2024
8e3cfd8
Clarified docstrings | Simplified gathering log
jacob-a-brown Nov 11, 2024
4fe499c
Make log message more informative
jacob-a-brown Nov 11, 2024
fbe2595
Update NMBGMR URLs for new API
jacob-a-brown Nov 11, 2024
41e7dc0
Handle DWB non-detects
jacob-a-brown Nov 11, 2024
082e583
Formatting changes
jacob-a-brown Nov 11, 2024
0dc1bde
Added silica to list of available analytes
jacob-a-brown Nov 14, 2024
a5f8d15
Merge pull request #6 from DataIntegrationGroup/dev/jab
jirhiker Nov 14, 2024
69b04fa
Report date of record whose units could not be converted
jacob-a-brown Nov 15, 2024
2be5227
Formatting changes
jacob-a-brown Nov 15, 2024
249ef69
updated logging
jirhiker Nov 19, 2024
ae76fd5
Merge branch 'dev/jir' into dev/jab
jacob-a-brown Nov 19, 2024
ead62e3
Start of OSE file retrieval implementation - work still to do
jacob-a-brown Nov 25, 2024
6aa327d
Account for Excel date number format for OSE Roswell data (Ft Sumner)
jacob-a-brown Nov 25, 2024
a180c1a
Formatting changes
jacob-a-brown Nov 25, 2024
b5d81af
Updated sources for README.md
jacob-a-brown Dec 2, 2024
756f91b
Removed EBID from sources in README.md as it's not yet available
jacob-a-brown Dec 2, 2024
d5497ca
Update timeseries flags to be more user friendly
jacob-a-brown Dec 3, 2024
d807a92
Disallow sites with lat or lon values of 0
jacob-a-brown Dec 5, 2024
49f331a
Deprecation of old logs/warnings
jacob-a-brown Dec 20, 2024
db30a14
Formatting changes
jacob-a-brown Dec 20, 2024
f08ac42
Updated urllib3 version requirement
jacob-a-brown Jan 7, 2025
723f5ee
Updated README.md
jacob-a-brown Jan 9, 2025
49d4c61
Updated to README
jacob-a-brown Jan 10, 2025
e719c75
Updated README to communicate table headers
jacob-a-brown Jan 10, 2025
71cddc7
Bump version to 0.2.0 because of some breaking changes
jacob-a-brown Jan 10, 2025
73b121a
Updated README
jacob-a-brown Jan 10, 2025
9442a19
Merge pull request #7 from DataIntegrationGroup/dev/jab
jirhiker Jan 10, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/cicd.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ name: CI/CD

on:
push:
branches: [ "main", "feature/jir"]
branches: [ "main", "feature/jir", "dev/jab"]
pull_request:
branches: [ "main"]

Expand Down
4 changes: 2 additions & 2 deletions .github/workflows/format_code.yml
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
name: Format code
on:
pull_request:
branches: [feature/jir, ]
branches: [feature/jir]
push:
branches: [feature/jir,]
branches: [feature/jir, dev/jab]
jobs:
format:
runs-on: ubuntu-latest
Expand Down
9 changes: 9 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -169,3 +169,12 @@ cython_debug/
# and can be added to the global gitignore or merged into this file. For a more nuclear
# option (not recommended) you can uncomment the following to ignore the entire idea folder.
#.idea/

# outputs
output_timeseries
output.combined.csv
output.csv
output.sites.csv
output.timeseries.csv
output.logs.txt
output.warnings.txt
237 changes: 193 additions & 44 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,19 +5,25 @@


![NMWDI](https://newmexicowaterdata.org/wp-content/uploads/2023/11/newmexicowaterdatalogoNov2023.png)
![NMBGMR](https://waterdata.nmt.edu/static/nmbgmr_logo_resized.png)
![NMBGMR](https://waterdata.nmt.edu/latest/static/nmbgmr_logo_resized.png)


This package provides a command line interface to New Mexico Water Data Initiaive's Data Integration Engine. This tool is used to integrate the water data from multiple sources.

## Installation
```bash
pip install nmuwd
```

## Sources
Data comes from the following sources. We are continuously adding new sources as we learn of them and they become available. If you have data that you would like to be part of the Data Integration Engine please get in touch at newmexicowaterdata@nmt.edu.

- [Bureau of Reclamation](https://data.usbr.gov/)
- [USGS (NWIS)](https://waterdata.usgs.gov/nwis)
- [ST2 (NMWDI)](https://st2.newmexicowaterdata.org/FROST-Server/v1.1/)
- Pecos Valley Artesian Conservancy District
- Elephant Butte Irrigation District
- Bernalillo County
- New Mexico Environment Department Drinking Water Bureau
- [NM Water Data CKAN catalog](https://catalog.newmexicowaterdata.org/)
- OSE Roswell District Office
- ISC Seven Rivers
Expand All @@ -27,62 +33,53 @@ This package provides a command line interface to New Mexico Water Data Initiaiv
- EPA
- and over 400 state, federal, tribal, and local agencies

## Installation

```bash
pip install nmuwd
```

## Usage
### Water Levels
### Source Inclusion & Exclusion
The Data Integration Engine enables the user to obtain groundwater level and groundwater quality data from a variety of sources. Data from sources are included in the output unless specifically excluded. The following flags are available to exclude a specific data source:

Get water levels for a county. Return a summary csv
```bash
weave waterlevels --county eddy
```
Get water levels for a bounding box. Return a summary csv
```bash
weave waterlevels --bbox -106.5 32.5 -106.0 33.0
```
- `--no-amp` to exclude New Mexico Bureau of Geology and Mineral Resources Aquifer Mapping Program (AMP) data
- `--no-bor` to exclude Bureaof of Reclamation data
- `--no-nwis` to exclude USGS NWIS data
- `--no-pvacd` to exclude Pecos Valley Artesian Convservancy District (PVACD) data
- `--no-isc-seven-rivers` to exclude Interstate Stream Commission (ISC) Seven Rivers data
- `--no-wqp` to exclude Water Quality Portal (WQP) data
- `--no-ckan` to exclude NM OSE Roswell data that is hosted on CKAN
- `--no-dwb` to exclude New Mexico Environment Department Drinking Water Bureau (DWB) data
- `--no-bernco` to exclude Bernalillo County (BernCo) data

### Water Levels

Get water levels for a county. Return timeseries of water levels for each site
```bash
weave waterlevels --county eddy --timeseries
```
To obtain groundwater levels, use

Exclude a specific data source
```bash
weave waterlevels --county eddy --no-amp
```

Exclude multiple data sources
```bash
weave waterlevels --county eddy --no-amp --no-nwis
weave waterlevels
```

Available data source flags:
- --no-amp
- --no-bor
- --no-ckan
- --no-dwb
- --no-isc-seven-rivers
- --no-nwis
- --no-pvacd
- --no-wqp
- --no-bernco
followed by the desired output type, source filters, date filters, geographic filters, and excluded data sources.

#### Available Data Sources
The following data sources are available for groundwater levels:

- amp
- bor
- ckan
- dwb
- isc-seven-rivers
- nwis
- pvacd
- bernco

### Water Quality
```bash
weave analytes TDS --county eddy
To obtain groundwater quality, use

```
```bash
weave analytes TDS --county eddy --no-bor
weave analytes {analyte}
```

Available analytes:
where `{analyte}` is the name of the analyte whose data is to be retrieved.

#### Available Analytes
The following analytes are currently available for retrieval:
- Arsenic
- Bicarbonate
- Calcium
Expand All @@ -92,7 +89,159 @@ Available analytes:
- Nitrate
- pH
- Potassium
- Silica
- Sodium
- Sulfate
- TDS
- Uranium
- Uranium

#### Available Data Sources
The follow data sources are available for analytes, though not every source has measurements for every analyte:
- bor
- wqp
- isc-seven-rivers
- amp
- dwb

### Geographic Filters

The following flags can be used to geographically filter data:

```
-- county {county name}
```

```
-- bbox 'x1 y1, x2 y2'
```

### Date Filters

The following flags can be used to filter by dates:

```
--start-date YYYY-MM-DD
```

```
--end-date YYYY-MM-DD
```

## Output
The data is saved to the current working directory. A log of the inputs and processes, called `die.log`, is also saved to the current working directory. If a subsquent process is run and the log from the previous process has not been moved or stored elsewhere, the log for the subsequent process will be appended to the existing log.

### Timeseries Data
The flag `--separated_timeseries` exports timeseries for every location in their own file in the directory output_series (e.g. `AB-0002.csv`, `AB-0003.csv`). Locations with only one observation are gathered and exported to the file `output.combined.csv`.

The flag `--unified_timeseries` exports all timeseries for all locations in one file titled `output.timeseries.csv`. It also exports a file titled `output.sites.csv` that contains site information, such as latitude, longitude, and elevation.

#### Table Headers: Unified

The table headers for unified timeseries data are as follows:

**output.sites.csv**
- `source`: the organization/source for the site
- `id`: the id of the site. The id is used as the key to join the output.timeseries.csv table
- `name`: the colloquial name for the site if it exists
- `latitude`: latitude in decimal degrees
- `longitude`: the longitude in decimal degrees
- `elevation` ground surface elevation of the site in feet
- `elevation_units`: the units of the ground surface elevation. Defaults to ft
- `horizontal_datum`: horizontal datum of the latitude and longitude. Defaults to WGS84
- `vertical_datum`: the vertical datum of the elevation
- `usgs_site_id`: USGS site id if it exists
- `alternate_site_id`: alternate site id if it exists
- `formation`: geologic formation in which the well terminates if it exists
- `aquifer`: aquifer from which the well draws water if it exists
- `well_depth`: depth of well if it exists

**output.timeseries.csv - waterlevels**
- `source`: the organization/sources for the site
- `id`: the id of the site. The id is used as the key to join the output.sites.csv table
- `depth_to_water_ft_below_ground_surface`: depth to water below ground surface in ft
- `date_measured`: date of measurement in YYYY-MM-DD format
- `time_measured`: time of measurement if it exists

**output.timeseries.csv - analytes**
- `source`: the organization/sources for the site
- `id`: the id of the site. The id is used as the key to join the output.sites.csv table
- `parameter`: the name of the analyte whose measurements are reported in the table. This corresponds the requested analyte
- `parameter_value`: value of the measurement
- `parameter_units`: units of the measurement
- `date_measured`: date of measurement in YYYY-MM-DD format
- `time_measured`: time of measurement if it exists

#### Table Headers: Separated

The files for the individual sites contain the same headers as **output.timeseries.csv** from the unified time series tables.

**output.combined.csv - waterlevels**
- `source`: the organization/source for the site
- `id`: the id of the site. The id is used as the key to join the output.timeseries.csv table
- `name`: the colloquial name for the site if it exists
- `latitude`: latitude in decimal degrees
- `longitude`: the longitude in decimal degrees
- `elevation` ground surface elevation of the site in feet
- `elevation_units`: the units of the ground surface elevation. Defaults to ft
- `horizontal_datum`: horizontal datum of the latitude and longitude. Defaults to WGS84
- `vertical_datum`: the vertical datum of the elevation
- `usgs_site_id`: USGS site id if it exists
- `alternate_site_id`: alternate site id if it exists
- `formation`: geologic formation in which the well terminates if it exists
- `aquifer`: aquifer from which the well draws water if it exists
- `well_depth`: depth of well if it exists
- `depth_to_water_ft_below_ground_surface`: depth to water below ground surface in ft
- `date_measured`: date of measurement in YYYY-MM-DD format
- `time_measured`: time of measurement if it exists

**output.combined.csv - analytes**
- `source`: the organization/source for the site
- `id`: the id of the site. The id is used as the key to join the output.timeseries.csv table
- `name`: the colloquial name for the site if it exists
- `latitude`: latitude in decimal degrees
- `longitude`: the longitude in decimal degrees
- `elevation` ground surface elevation of the site in feet
- `elevation_units`: the units of the ground surface elevation. Defaults to ft
- `horizontal_datum`: horizontal datum of the latitude and longitude. Defaults to WGS84
- `vertical_datum`: the vertical datum of the elevation
- `usgs_site_id`: USGS site id if it exists
- `alternate_site_id`: alternate site id if it exists
- `formation`: geologic formation in which the well terminates if it exists
- `aquifer`: aquifer from which the well draws water if it exists
- `well_depth`: depth of well if it exists
- `parameter`: the name of the analyte whose measurements are reported in the table. This corresponds the requested analyte
- `parameter_value`: value of the measurement
- `parameter_units`: units of the measurement
- `date_measured`: date of measurement in YYYY-MM-DD format
- `time_measured`: time of measurement if it exists

### Summary Data

If neither of the above flags are specified, a summary table called `output.csv` is exported. The summary table consists of location information as well as summary statistics for the parameter of interest for every location that has observations.

#### Table Headers: Summary

**output.csv - waterlevels and analytes**
- `source`: the organization/source for the site
- `id`: the id of the site. The id is used as the key to join the output.timeseries.csv table
- `location`: the colloquial name for the site if it exists
- `usgs_site_id`: USGS site id if it exists
- `alternate_site_id`: alternate site id if it exists
- `latitude`: latitude in decimal degrees
- `longitude`: the longitude in decimal degrees
- `horizontal_datum`: horizontal datum of the latitude and longitude. Defaults to WGS84
- `elevation` ground surface elevation of the site in feet
- `elevation_units`: the units of the ground surface elevation. Defaults to ft
- `well_depth`: depth of well if it exists
- `well_depth_units`: units of well depth. Defaults to ft
- `parameter`: the name of the analyte whose measurements are reported in the table. This corresponds the requested analyte
- `parameter_value`: value of the measurement
- `parameter_units`: units of the measurement
- `nrecords`: the number of records for the site
- `min`: the minimum record for the site
- `max`: the maximum record for the site
- `mean`: the mean value for the records at the site
- `most_recent_date`: date of most recent record
- `most_recent_time`: time of most recent record if it exists
- `most_recent_value` the value of the most recent record
- `most_recent_units`: the units of the most recent record
Loading