diff --git a/docs/getting-started/installation.md b/docs/getting-started/installation.md index c51dcb1..8f66379 100644 --- a/docs/getting-started/installation.md +++ b/docs/getting-started/installation.md @@ -30,7 +30,7 @@ If you plan to work with shapefiles, choropleths, or any geometry (the `geometry uv add "pypums[spatial]" ``` -This adds [`geopandas`](https://geopandas.org/) and its dependencies (`shapely`, `pyproj`, `fiona`), which enable PyPUMS to fetch TIGER/Line shapefiles and return `GeoDataFrame` objects. +This adds [`geopandas`](https://geopandas.org/) and [`pygris`](https://github.com/walkerke/pygris) (plus their dependencies), enabling `geometry=True` to return `GeoDataFrame` objects. pygris handles shapefile downloads with automatic local caching — files are only downloaded once. ## Get a Census API key @@ -124,7 +124,7 @@ Run this in your terminal to confirm PyPUMS is installed: python -c "import pypums; print(pypums.__version__)" ``` -You should see the version number printed (e.g. `0.2`). +You should see the version number printed (e.g. `0.3`). To verify the CLI is available: @@ -166,7 +166,7 @@ If this prints your key without raising an error, you are ready to go. If empty, follow the [configuration steps](#configure-your-api-key) above. -??? question "I get `ImportError: geopandas` when using `geometry=True`" +??? question "I get `ImportError: geopandas` or `ImportError: pygris` when using `geometry=True`" You need the spatial extras. Install them with: ```bash diff --git a/docs/getting-started/quickstart.md b/docs/getting-started/quickstart.md index 6cadcdf..26f94a8 100644 --- a/docs/getting-started/quickstart.md +++ b/docs/getting-started/quickstart.md @@ -91,7 +91,7 @@ print(type(la_poverty)) # 2. **variables** -- `B17001_002` is the count of people whose income is below the poverty level (from table B17001). 3. **state** -- Required for tract-level queries so the API knows which state to pull tracts from. 4. **county** -- `"037"` is the FIPS code for Los Angeles County. Use `pypums.datasets.fips.lookup_fips(state="California", county="Los Angeles County")` to look up codes. -5. **geometry** -- When `True`, PyPUMS fetches TIGER/Line cartographic boundary shapefiles and merges them with the data. The result is a `GeoDataFrame` with a `geometry` column. +5. **geometry** -- When `True`, PyPUMS downloads cartographic boundary shapefiles (via pygris, cached locally) and merges them with the data. The result is a `GeoDataFrame` with a `geometry` column. Now plot it with [Altair](https://altair-viz.github.io/): @@ -211,11 +211,13 @@ The resulting map shows poverty counts by Census tract across Los Angeles County } ``` -!!! info "What is a TIGER/Line shapefile?" +!!! info "How does `geometry=True` work?" The Census Bureau publishes free geographic boundary files called - TIGER/Line shapefiles. When you set `geometry=True`, PyPUMS automatically - downloads the correct shapefile for your geography level and year, then - joins it to your data on the `GEOID` column. + TIGER/Line shapefiles. When you set `geometry=True`, PyPUMS uses + [pygris](https://github.com/walkerke/pygris) to download the correct + shapefile for your geography level and year, then joins it to your data + on the `GEOID` column. Files are cached locally so subsequent calls are + fast. **What to try next:** [Spatial Data & Mapping guide](../guides/spatial.md) for dot-density maps, population-weighted interpolation, and custom resolutions. diff --git a/docs/guides/acs-data.md b/docs/guides/acs-data.md index ad3978a..052976c 100644 --- a/docs/guides/acs-data.md +++ b/docs/guides/acs-data.md @@ -410,8 +410,8 @@ age_with_total["share"] = ( ## Geometry support -Set `geometry=True` to return a GeoDataFrame with TIGER/Line -cartographic boundary shapes joined to your data. +Set `geometry=True` to return a GeoDataFrame with cartographic boundary +shapes joined to your data (downloaded via pygris, cached locally). ```python ca_counties_geo = pypums.get_acs( @@ -759,4 +759,4 @@ See the [Finding Variables](variables.md) guide for full details. - [Finding Variables](variables.md) — Discovering variable codes with `load_variables()` - [Geography & FIPS](geography.md) — Understanding geography levels and FIPS code lookups - [Margins of Error](margins-of-error.md) — MOE propagation formulas and statistical testing -- [Spatial Data](spatial.md) — Attaching TIGER/Line geometry to query results +- [Spatial Data](spatial.md) — Attaching geometry to query results diff --git a/docs/guides/caching.md b/docs/guides/caching.md index 5d939cb..cb06081 100644 --- a/docs/guides/caching.md +++ b/docs/guides/caching.md @@ -114,9 +114,20 @@ structure: api/ # API response cache (get_acs, get_decennial, etc.) variables/ # Variable table cache (load_variables) pums_vars/ # PUMS variable dictionary cache - geography/ # Geometry / shapefile cache ``` +!!! note "Shapefile cache is managed by pygris" + When you use `geometry=True`, shapefiles are cached separately by + [pygris](https://github.com/walkerke/pygris) in its own directory: + + - **macOS:** `~/Library/Caches/pygris/` + - **Linux:** `~/.cache/pygris/` + - **Windows:** `C:\Users\{user}\AppData\Local\pygris\Cache\` + + To clear the shapefile cache, delete that directory manually. + `CensusCache.clear()` only clears the PyPUMS API response caches above, + not the pygris shapefile cache. + You can inspect the cache directory at any time: ```bash diff --git a/docs/guides/decennial-data.md b/docs/guides/decennial-data.md index 4bd9403..117dec8 100644 --- a/docs/guides/decennial-data.md +++ b/docs/guides/decennial-data.md @@ -228,7 +228,7 @@ print(hispanic_pop.head()) ## Geometry support -Set `geometry=True` to return a GeoDataFrame with TIGER/Line shapes: +Set `geometry=True` to return a GeoDataFrame with cartographic boundary shapes: ```python county_geo = pypums.get_decennial( @@ -605,5 +605,5 @@ print(pop_vars[["name", "label"]].head(5)) - [API Reference](../reference/api.md) — Full `get_decennial()` function signature - [Geography & FIPS](geography.md) — Understanding geography levels and FIPS code lookups -- [Spatial Data](spatial.md) — Attaching TIGER/Line geometry to query results +- [Spatial Data](spatial.md) — Attaching geometry to query results - [ACS Data](acs-data.md) — For when you need margins of error and more recent annual data diff --git a/docs/guides/migration-flows.md b/docs/guides/migration-flows.md index e4a2986..99f87b4 100644 --- a/docs/guides/migration-flows.md +++ b/docs/guides/migration-flows.md @@ -21,7 +21,7 @@ df = get_flows( state=None, # state FIPS, abbreviation, or name county=None, # county FIPS code msa=None, # metropolitan statistical area code - geometry=False, # attach TIGER/Line shapes + geometry=False, # attach cartographic boundary shapes moe_level=90, # confidence level: 90, 95, or 99 cache_table=False, # cache API response on disk show_call=False, # print the API URL @@ -218,8 +218,8 @@ where z-scores are 1.645 (90%), 1.960 (95%), and 2.576 (99%). ## Geometry support -Set `geometry=True` to attach TIGER/Line cartographic boundary shapes to -the origin geography. This returns a GeoDataFrame you can plot directly: +Set `geometry=True` to attach cartographic boundary shapes to the origin +geography. This returns a GeoDataFrame you can plot directly: ```python ca_flows_geo = get_flows( diff --git a/docs/guides/population-estimates.md b/docs/guides/population-estimates.md index 576ec59..37580cc 100644 --- a/docs/guides/population-estimates.md +++ b/docs/guides/population-estimates.md @@ -439,7 +439,7 @@ print(housing.head(3)) ## Geometry support -Set `geometry=True` to get a GeoDataFrame with TIGER/Line shapes: +Set `geometry=True` to get a GeoDataFrame with cartographic boundary shapes: ```python pop_geo = pypums.get_estimates( diff --git a/docs/guides/spatial.md b/docs/guides/spatial.md index 7eb2355..51aa2f5 100644 --- a/docs/guides/spatial.md +++ b/docs/guides/spatial.md @@ -1,12 +1,14 @@ # Spatial Data & Mapping -PyPUMS can attach TIGER/Line cartographic boundary shapefiles to any Census +PyPUMS can attach Census cartographic boundary shapefiles to any Census query, returning a `GeoDataFrame` that is ready for mapping, spatial joins, and -geospatial analysis. +geospatial analysis. Shapefile downloads are handled by +[pygris](https://github.com/walkerke/pygris), with automatic local caching so +files are only downloaded once. !!! info "Requires the spatial extras" - Geometry features depend on **geopandas** and its dependencies (`shapely`, - `pyproj`, `fiona`). Install them with: + Geometry features depend on **geopandas** and **pygris** (plus their + dependencies). Install them with: ```bash uv add "pypums[spatial]" @@ -18,8 +20,9 @@ geospatial analysis. The fastest way to get spatial data is to pass `geometry=True` to any of the main data retrieval functions. PyPUMS will automatically download the -corresponding TIGER/Line shapefile, merge it with the tabular data on the -`GEOID` column, and return a `GeoDataFrame`. +corresponding cartographic boundary shapefile (via pygris), merge it with the +tabular data on the `GEOID` column, and return a `GeoDataFrame`. Downloaded +shapefiles are cached locally so subsequent calls are fast. === "get_acs()" @@ -226,7 +229,7 @@ corresponding TIGER/Line shapefile, merge it with the tabular data on the ## Coordinate reference system All geometry returned by PyPUMS is in **NAD83 (EPSG:4269)**, which is the -native CRS of the Census Bureau's TIGER/Line files. You can verify this on any +native CRS of the Census Bureau's cartographic boundary files. You can verify this on any `GeoDataFrame`: ```python @@ -291,7 +294,7 @@ You will often need to reproject to a different CRS depending on your use case. ## Shapefile resolution -TIGER/Line cartographic boundary files come in three resolutions. The default +Cartographic boundary files come in three resolutions. The default is `500k`, which strikes a good balance between detail and file size. | Resolution | Description | Best for | @@ -308,7 +311,7 @@ always uses `500k`. ## Supported geographies -The following geography levels have matching TIGER/Line shapefiles: +The following geography levels have matching cartographic boundary shapefiles: | Geography | Requires `state`? | Notes | |------------------------|--------------------|----------------------------------------| @@ -325,8 +328,8 @@ The following geography levels have matching TIGER/Line shapefiles: !!! warning "Sub-state geographies require the `state` parameter" For `tract`, `block group`, `place`, and `puma`, the Census Bureau - publishes shapefiles per state. PyPUMS needs the `state` parameter to - know which file to download. + publishes shapefiles per state. PyPUMS will raise a `ValueError` if you + omit the `state` parameter for these geography levels. --- @@ -409,6 +412,7 @@ attach_geometry( state=None, # state FIPS or abbreviation year=2023, # data year resolution="500k", # "500k", "5m", or "20m" + cache=True, # cache shapefiles locally (via pygris) ) -> GeoDataFrame ``` @@ -960,8 +964,10 @@ memory. A few tips for working with large datasets: (1318, 6) ``` - - **Cache your queries** with `cache_table=True` so repeated runs do not - re-fetch the shapefile from the Census Bureau servers. + - **Shapefile caching is automatic.** PyPUMS uses pygris with caching + enabled, so shapefiles are downloaded once and reused from a local + cache directory (`~/.cache/pygris/` on Linux, + `~/Library/Caches/pygris/` on macOS). --- @@ -969,12 +975,16 @@ memory. A few tips for working with large datasets: **`ImportError: geopandas is required for spatial operations`** : Install the spatial extra: `uv add "pypums[spatial]"`. This pulls in - `geopandas`, `shapely`, and `pyproj`. + `geopandas`, `pygris`, `shapely`, and `pyproj`. + +**`ValueError: geography='tract' requires a state parameter`** +: Sub-state geographies (`tract`, `block group`, `place`, `puma`) need a + `state` argument. Pass a FIPS code, abbreviation, or full name. **Geometry column is all `None`** -: The Census TIGER/Line server may not have shapefiles for the geography - level and year you requested. Try a different year or a broader - geography (e.g., county instead of block group). +: The Census Bureau may not have shapefiles for the geography level and + year you requested. Try a different year or a broader geography (e.g., + county instead of block group). **CRS mismatch when combining DataFrames** : All PyPUMS geometry is returned in EPSG:4269 (NAD83). If you are diff --git a/docs/index.md b/docs/index.md index 2657d72..d7c6631 100644 --- a/docs/index.md +++ b/docs/index.md @@ -46,7 +46,7 @@ print(df.head()) --- - Add `geometry=True` to any query and get a GeoDataFrame with TIGER/Line boundaries. + Add `geometry=True` to any query and get a GeoDataFrame with cartographic boundary shapes (via pygris, cached locally). [:octicons-arrow-right-24: Spatial guide](guides/spatial.md) diff --git a/docs/migration/from-census-ftp.md b/docs/migration/from-census-ftp.md index e1ba055..a79b1e1 100644 --- a/docs/migration/from-census-ftp.md +++ b/docs/migration/from-census-ftp.md @@ -73,7 +73,7 @@ df = pypums.get_pums( | **Survey design** | Manual SDR calculation | `to_survey()` helper | | **Caching** | Manage files yourself | Built-in with TTL | | **Summary tables** | Not available (PUMS only) | `get_acs()` for pre-tabulated data | -| **Geometry** | Separate TIGER/Line download | `geometry=True` parameter | +| **Geometry** | Separate TIGER/Line download | `geometry=True` (pygris handles the download) | ## When You Still Need FTP diff --git a/docs/migration/from-old-pypums.md b/docs/migration/from-old-pypums.md index d668eff..65dcd13 100644 --- a/docs/migration/from-old-pypums.md +++ b/docs/migration/from-old-pypums.md @@ -79,7 +79,7 @@ pypums estimates state -s CA 1. **No more large file downloads** — The Census API returns exactly the data you need, not gigabyte-sized CSV files 2. **More data sources** — Access ACS summary tables, Decennial Census, population estimates, and migration flows in addition to PUMS microdata -3. **Geometry support** — Get GeoDataFrames with TIGER/Line boundaries in a single call +3. **Geometry support** — Get GeoDataFrames with cartographic boundary geometry in a single call (powered by pygris) 4. **Feature parity with tidycensus** — If you can do it in R with tidycensus, you can now do it in Python with PyPUMS ## The ACS Class Still Works (For Now) diff --git a/docs/migration/from-tidycensus.md b/docs/migration/from-tidycensus.md index b9609c7..8a8310b 100644 --- a/docs/migration/from-tidycensus.md +++ b/docs/migration/from-tidycensus.md @@ -236,7 +236,7 @@ If you've used Kyle Walker's [tidycensus](https://walker-data.com/tidycensus/) R | **Variable naming** | Can rename inline: `c(medincome = "B19013_001")` | Use standard variable codes; rename with pandas after | | **County parameter** | Accepts county names: `county = "Los Angeles"` | Uses FIPS codes: `county="037"`. Use `lookup_fips()` to find codes | | **Output type** | tibble / sf object | pandas DataFrame / GeoDataFrame | -| **Spatial CRS** | Varies by function | Always NAD83 (EPSG:4269) from TIGER/Line | +| **Spatial CRS** | Varies by function | Always NAD83 (EPSG:4269) via pygris | | **Plotting** | ggplot2 / tmap | Altair / geopandas | | **PUMS download** | Downloads CSV files from FTP | Queries Census API directly (faster for filtered requests) | | **Survey design** | Returns `tbl_svy` (srvyr package) | Returns `SurveyDesign` object with SDR methods | @@ -246,6 +246,6 @@ If you've used Kyle Walker's [tidycensus](https://walker-data.com/tidycensus/) R - Same Census API under the hood - Same variable codes (B19013_001, P1_001N, etc.) - Same geography names ("state", "county", "tract", etc.) -- Same TIGER/Line shapefiles for geometry +- Same Census cartographic boundary files for geometry (via pygris) - Same MOE formulas from the ACS Handbook - Same replicate weight methodology (SDR with 80 weights) diff --git a/docs/reference/changelog.md b/docs/reference/changelog.md index eabcfe8..1ebc43c 100644 --- a/docs/reference/changelog.md +++ b/docs/reference/changelog.md @@ -1,5 +1,51 @@ # Changelog +## 0.3.1 (2026) + +### Changed + +- **Shapefile downloads now use pygris** — The internal `_fetch_tiger_shapes()` + function now delegates to [pygris](https://github.com/walkerke/pygris) instead + of manually constructing Census Bureau URLs. The public API (`geometry=True`, + `attach_geometry()`) is unchanged. + +### Added + +- **Automatic shapefile caching** — Downloaded shapefiles are cached locally via + pygris (`~/Library/Caches/pygris/` on macOS, `~/.cache/pygris/` on Linux). + Repeated `geometry=True` calls no longer re-download files. +- **`cache` parameter** on `attach_geometry()` — Pass `cache=False` to force a + fresh download. +- **Clear error for missing `state`** — Sub-state geographies (`tract`, + `block group`, `place`, `puma`) now raise a `ValueError` with a helpful message + when `state` is omitted. + +### Fixed + +- **ZCTA and PUMA geometry for pre-2020 years** — Previously broken due to + hardcoded 2020 vintage suffixes. pygris handles vintage selection correctly. +- **Congressional district year mapping** — Previously used a hardcoded formula. + pygris handles this internally. + +### Improved + +- **Broader year range** — Geometry support extended from ~2014+ to ~1990+ for + most geography levels. +- **Faster county downloads** — County-level queries now pass `state` through to + pygris when provided, downloading a smaller state-specific file. + +### Dependencies + +- Added `pygris>=0.1.7,<1` to the `spatial` optional dependency group. + +--- + +## 0.3 (2026) + +Version bump. No user-facing changes from 0.2. + +--- + ## 0.2 (2026) Major release with complete Census API feature parity with R's tidycensus. @@ -49,7 +95,7 @@ Major release with complete Census API feature parity with R's tidycensus. - `moe_product()` — MOE for derived products - `significance()` — Statistical significance testing -- **Spatial support** — TIGER/Line cartographic boundary integration +- **Spatial support** — Cartographic boundary integration - `attach_geometry()` — Merge shapefiles with Census data - `as_dot_density()` — Dot-density point conversion - `interpolate_pw()` — Population-weighted areal interpolation diff --git a/pyproject.toml b/pyproject.toml index c8f8b46..aac9e2f 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -30,7 +30,7 @@ Changelog = "https://github.com/chekos/pypums/releases" pypums = "pypums.cli:cli" [project.optional-dependencies] -spatial = ["geopandas>=0.12"] +spatial = ["geopandas>=0.12", "pygris>=0.1.7,<1"] test = ["pytest"] docs = [ "mkdocs>=1.6,<2", diff --git a/pypums/spatial.py b/pypums/spatial.py index 26a7cfa..10cc73c 100644 --- a/pypums/spatial.py +++ b/pypums/spatial.py @@ -1,93 +1,139 @@ """Spatial/geometry support for Census data. -Provides TIGER/Line shapefile fetching and merging, plus -dot-density conversion for thematic mapping. +Provides shapefile fetching (via `pygris `_) +and merging, plus dot-density conversion and areal interpolation for thematic +mapping. -Requires ``geopandas`` (optional dependency). +Requires the ``spatial`` optional dependency group (``geopandas`` + ``pygris``). """ from __future__ import annotations -from typing import TYPE_CHECKING +from typing import TYPE_CHECKING, Any if TYPE_CHECKING: + from collections.abc import Callable + import geopandas as gpd import pandas as pd -# TIGER/Line cartographic boundary base URL. -_TIGER_BASE = "https://www2.census.gov/geo/tiger" - -# Mapping of pypums geography names to TIGER/Line shapefile identifiers. -_GEO_TO_TIGER: dict[str, str] = { - "state": "cb_{year}_us_state_{resolution}", - "county": "cb_{year}_us_county_{resolution}", - "tract": "cb_{year}_{state_fips}_tract_{resolution}", - "block group": "cb_{year}_{state_fips}_bg_{resolution}", - "place": "cb_{year}_{state_fips}_place_{resolution}", - "congressional district": "cb_{year}_us_cd{congress}_{resolution}", - "zcta": "cb_{year}_us_zcta520_{resolution}", - "puma": "cb_{year}_{state_fips}_puma20_{resolution}", - "cbsa": "cb_{year}_us_cbsa_{resolution}", - "csa": "cb_{year}_us_csa_{resolution}", + +def _pygris_func(name: str) -> Callable[..., Any]: + """Lazily import a pygris function by name.""" + import pygris + + func = getattr(pygris, name, None) + if func is None: + raise ImportError( + f"pygris>=0.1.7 is required but does not expose '{name}'. " + "Upgrade with: pip install 'pypums[spatial]'" + ) + return func + + +# geography -> (pygris_func, accepts_state, accepts_resolution, requires_state) +_GEO_TO_PYGRIS: dict[str, tuple[str, bool, bool, bool]] = { + "state": ("states", False, True, False), + "county": ("counties", True, True, False), + "tract": ("tracts", True, False, True), + "block group": ("block_groups", True, False, True), + "place": ("places", True, False, True), + "congressional district": ("congressional_districts", False, True, False), + "zcta": ("zctas", False, False, False), + "puma": ("pumas", True, False, True), + "cbsa": ("core_based_statistical_areas", False, True, False), + "csa": ("combined_statistical_areas", False, True, False), } +def _normalize_geoid(gdf: gpd.GeoDataFrame) -> gpd.GeoDataFrame: + """Ensure the GeoDataFrame has a ``GEOID`` column. + + pygris may return ``GEOID20`` or ``GEOID10`` depending on the geography + and vintage year. This normalizes to ``GEOID``. + """ + if "GEOID" in gdf.columns: + return gdf + + for candidate in ("GEOID20", "GEOID10"): + if candidate in gdf.columns: + return gdf.rename(columns={candidate: "GEOID"}) + + msg = ( + "pygris returned a GeoDataFrame without a GEOID column. " + f"Available columns: {list(gdf.columns)}" + ) + raise ValueError(msg) + + def _fetch_tiger_shapes( geography: str, *, state: str | None = None, year: int = 2023, resolution: str = "500k", + cache: bool = True, ) -> gpd.GeoDataFrame: - """Download TIGER/Line cartographic boundary shapefiles. + """Download cartographic boundary shapefiles via pygris. Parameters ---------- geography Geography level name. state - State FIPS code (required for sub-state geographies). + State FIPS code, abbreviation, or name (required for sub-state + geographies). year Data year for the shapefiles. resolution - Resolution: ``"500k"``, ``"5m"``, or ``"20m"``. + Resolution: ``"500k"``, ``"5m"``, or ``"20m"``. Only applies to + geographies that support multiple resolutions (state, county, + congressional district, cbsa, csa). + cache + If True (default), cache downloaded shapefiles locally so + subsequent calls are fast. Returns ------- gpd.GeoDataFrame - Shapefile geometries with a ``GEOID`` column. + Shapefile geometries with a ``GEOID`` column in EPSG:4269. """ - import geopandas as _gpd - geo = geography.lower() - # Build the shapefile URL. - template = _GEO_TO_TIGER.get(geo) - if template is None: + entry = _GEO_TO_PYGRIS.get(geo) + if entry is None: + raise ValueError(f"No shapefile mapping for geography: {geography!r}") + + func_name, accepts_state, accepts_resolution, requires_state = entry + + if requires_state and state is None: raise ValueError( - f"No TIGER/Line shapefile mapping for geography: {geography!r}" + f"geography={geography!r} requires a state parameter. " + "Pass a state FIPS code, abbreviation, or name." ) - # Resolve state FIPS if needed. - state_fips = state or "us" - if state and not state.isdigit(): - from pypums.api.geography import _resolve_state_fips + func = _pygris_func(func_name) - state_fips = _resolve_state_fips(state) + kwargs: dict[str, Any] = { + "cb": True, + "year": year, + "cache": cache, + } + if accepts_resolution: + kwargs["resolution"] = resolution + if accepts_state and state is not None: + kwargs["state"] = state - # Congress number changes every 2 years starting from the 113th (2013). - congress = str(113 + (year - 2013) // 2) if year >= 2013 else "113" + gdf = func(**kwargs) - filename = template.format( - year=year, - state_fips=state_fips, - resolution=resolution, - congress=congress, - ) + # Normalize GEOID column name across vintages. + gdf = _normalize_geoid(gdf) - url = f"{_TIGER_BASE}/GENZ{year}/shp/{filename}.zip" + # Guarantee EPSG:4269 (NAD83) as documented. + if gdf.crs is None or gdf.crs.to_epsg() != 4269: + gdf = gdf.to_crs(epsg=4269) - return _gpd.read_file(url) + return gdf def attach_geometry( @@ -97,8 +143,9 @@ def attach_geometry( state: str | None = None, year: int = 2023, resolution: str = "500k", + cache: bool = True, ) -> gpd.GeoDataFrame: - """Fetch TIGER/Line shapes and merge with Census tabular data. + """Fetch shapes via pygris and merge with Census tabular data. Parameters ---------- @@ -112,6 +159,8 @@ def attach_geometry( Data year. resolution Shapefile resolution. + cache + If True (default), cache downloaded shapefiles locally. Returns ------- @@ -125,6 +174,7 @@ def attach_geometry( state=state, year=year, resolution=resolution, + cache=cache, ) if "GEOID" not in df.columns: diff --git a/uv.lock b/uv.lock index 617e047..ad0e4ae 100644 --- a/uv.lock +++ b/uv.lock @@ -1095,6 +1095,20 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/c7/21/705964c7812476f378728bdf590ca4b771ec72385c533964653c68e86bdc/pygments-2.19.2-py3-none-any.whl", hash = "sha256:86540386c03d588bb81d44bc3928634ff26449851e99741617ecb9037ee5ec0b", size = 1225217, upload-time = "2025-06-21T13:39:07.939Z" }, ] +[[package]] +name = "pygris" +version = "0.2.1" +source = { registry = "https://pypi.org/simple" } +dependencies = [ + { name = "geopandas" }, + { name = "platformdirs" }, + { name = "requests" }, +] +sdist = { url = "https://files.pythonhosted.org/packages/4b/c6/138d75978264144d11cfce5e03b1751c02d10ce5674a031d57f2573ac069/pygris-0.2.1.tar.gz", hash = "sha256:d0a6893b60a4f10c1fda1939228450a89b4c6f6c7a7afcbad441da24666d30d0", size = 52170, upload-time = "2025-12-27T15:06:35.281Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/64/8a/183cdcf7492b164d4e93abeb2c9af979c85e297aba5224395a023b793f4f/pygris-0.2.1-py3-none-any.whl", hash = "sha256:34645b3cc456f6b761326979f0bac98f780931e72956bb6c86e742895908beca", size = 58320, upload-time = "2025-12-27T15:06:36.444Z" }, +] + [[package]] name = "pymdown-extensions" version = "10.21" @@ -1310,6 +1324,7 @@ docs = [ ] spatial = [ { name = "geopandas" }, + { name = "pygris" }, ] test = [ { name = "pytest" }, @@ -1328,6 +1343,7 @@ requires-dist = [ { name = "mkdocstrings", extras = ["python"], marker = "extra == 'docs'" }, { name = "pandas", specifier = ">=1.1.0" }, { name = "pyarrow", specifier = ">=10.0.0" }, + { name = "pygris", marker = "extra == 'spatial'", specifier = ">=0.1.7,<1" }, { name = "pymdown-extensions", marker = "extra == 'docs'" }, { name = "pytest", marker = "extra == 'test'" }, { name = "rich", specifier = ">=11.0.0" },