Skip to content

feat(spatial): Replace custom TIGER/Line downloads with pygris#308

Merged
chekos merged 4 commits intomainfrom
feat/use-pygris
Mar 13, 2026
Merged

feat(spatial): Replace custom TIGER/Line downloads with pygris#308
chekos merged 4 commits intomainfrom
feat/use-pygris

Conversation

@chekos
Copy link
Owner

@chekos chekos commented Mar 13, 2026

Summary

  • Replace hand-rolled TIGER/Line URL construction in pypums/spatial.py with pygris — the Python port of R's tigris package
  • Adds automatic shapefile caching, better year/vintage handling, and fixes broken ZCTA/PUMA downloads for pre-2020 years
  • Public API (geometry=True, attach_geometry()) is completely unchanged

Changes

  • pypums/spatial.py — Replace _TIGER_BASE + _GEO_TO_TIGER URL templates with _GEO_TO_PYGRIS dispatch to pygris functions. Add _normalize_geoid() for vintage column variants. Add CRS assertion (EPSG:4269). Enable caching by default.
  • pyproject.toml — Add pygris>=0.1.7 to spatial optional deps
  • uv.lock — Updated
  • docs/guides/spatial.md — Updated references to mention pygris, automatic caching, and updated install/troubleshooting sections

Test plan

  • All 167 existing tests pass (mock target _fetch_tiger_shapes preserved)
  • Docs build cleanly with mkdocs build --strict
  • Ruff lint + format pass
  • Manual spot-check: get_acs(geography="county", variables="B01001_001", state="CA", geometry=True) returns a GeoDataFrame

Post-Deploy Monitoring & Validation

No additional operational monitoring required: this is a library dependency swap with no runtime infrastructure.

🤖 Generated with Claude Code

Use pygris for all shapefile downloads instead of manually constructing
Census Bureau URLs. This adds automatic caching, better year/vintage
handling, and fixes broken ZCTA/PUMA downloads for pre-2020 years.

- Replace _GEO_TO_TIGER URL templates with _GEO_TO_PYGRIS function dispatch
- Add GEOID column normalization for vintage variants (GEOID20, GEOID10)
- Add CRS assertion to guarantee EPSG:4269
- Enable pygris caching by default
- Add pygris>=0.1.7 to spatial optional deps
- Update docs to mention pygris and automatic caching

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@claude
Copy link

claude bot commented Mar 13, 2026

Claude finished @chekos's task in 1m 39s —— View job


Code Review: feat(spatial): Replace custom TIGER/Line downloads with pygris

Tasks

  • Gather context and read changed files
  • Review pypums/spatial.py
  • Review pyproject.toml and uv.lock
  • Review docs/guides/spatial.md
  • Post final review

Summary

This is a clean dependency swap — pygris handles URL construction, vintage logic, and caching instead of hand-rolled string templates. The public API is unchanged, the diff is minimal, and the logic is easy to follow. A few issues are worth addressing before merge.


Issues

1. AFFGEOID normalization may silently break merges (medium severity)

pypums/spatial.py:52

for candidate in ("GEOID20", "GEOID10", "AFFGEOID"):
    if candidate in gdf.columns:
        return gdf.rename(columns={candidate: "GEOID"})

AFFGEOID has a different format than GEOID — it is the full legal/statistical area code (e.g., 0400000US06 for California), not the short numeric code (e.g., 06). If AFFGEOID gets aliased to GEOID and then merged with Census tabular data that uses short numeric GEOIDs, every row will be unmatched and the merge will silently produce all-None geometry.

Consider removing AFFGEOID from the candidate list, or adding a format check/strip before renaming. Fix this →


2. resolution kwarg is passed to all pygris functions — not all accept it (potential TypeError)

pypums/spatial.py:98-103

kwargs: dict[str, Any] = {
    "cb": True,
    "resolution": resolution,
    "year": year,
    "cache": True,
}

resolution is passed unconditionally to every pygris function. Some geographies — notably zctas in certain pygris versions — do not accept a resolution parameter for cartographic boundary files (the Census Bureau only publishes one resolution for ZCTAs). Calling pygris.zctas(cb=True, resolution="500k", year=2023, cache=True) would raise a TypeError if pygris doesn't accept that kwarg.

It's worth verifying which pygris functions accept resolution, and filtering it out where it isn't supported. Fix this →


3. _pygris_func raises an unhelpful AttributeError if a function is missing

pypums/spatial.py:21-25

def _pygris_func(name: str) -> Callable[..., Any]:
    """Lazily import a pygris function by name."""
    import pygris
    return getattr(pygris, name)

If a pygris install (e.g., an older version below 0.1.7) doesn't expose the expected function, this silently raises AttributeError: module 'pygris' has no attribute 'combined_statistical_areas' — which gives users no hint about what went wrong.

A simple guard would give a much better error:

func = getattr(pygris, name, None)
if func is None:
    raise ImportError(
        f"pygris>={required_version} is required but does not expose '{name}'. "
        "Upgrade with: pip install 'pypums[spatial]'"
    )
return func

4. No way to opt out of caching

pypums/spatial.py:102

"cache": True,

Caching is hardcoded as always-on. This is a good default but prevents users who need to force a fresh download (e.g., after a Census shapefile is updated, or when debugging) from bypassing the cache. attach_geometry() and _fetch_tiger_shapes() don't expose a cache parameter to callers.

This is a minor usability concern — consider adding cache: bool = True to _fetch_tiger_shapes and attach_geometry.


5. Loose lower bound on pygris version

pyproject.toml:33

spatial = ["geopandas>=0.12", "pygris>=0.1.7"]

pygris>=0.1.7 allows any future major version. If pygris introduces a breaking API change (e.g., renames core_based_statistical_areas or changes cb/resolution kwargs), the install will stay "compatible" but fail at runtime. Consider whether an upper bound like pygris>=0.1.7,<2 would be appropriate, or at minimum add a comment noting which pygris features are relied upon.


Minor Observations

  • county in _GEO_TO_PYGRIS is accepts_state=True — this is correct and actually an improvement over the old code, which always fetched all US counties. Now a state filter is passed through to pygris when provided.

  • Congress number removal — the old hand-rolled congress = str(113 + (year - 2013) // 2) formula was a known fragility. Delegating this to pygris is the right call.

  • Docs are accurate — the spatial.md caching path note (~/.cache/pygris/ on Linux) reflects pygris defaults correctly.

  • CRS enforcement is good — the explicit to_crs(epsg=4269) normalization on lines 113–114 is defensive and matches the documented guarantee.


Overall this is a well-motivated change with a clean implementation. The AFFGEOID issue (point 1) is the most likely to cause a silent user-facing bug; the resolution kwarg issue (point 2) is the most likely to cause a hard error. Both are worth fixing before merging.

- Remove AFFGEOID from GEOID normalization candidates (different format
  would silently break merges)
- Only pass resolution kwarg to pygris functions that accept it (tracts,
  block_groups, places, zctas, pumas do not)
- Add helpful ImportError when pygris function is missing
- Add cache parameter to _fetch_tiger_shapes and attach_geometry so
  users can force fresh downloads
- Cap pygris version to <1 to guard against breaking API changes

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@claude
Copy link

claude bot commented Mar 13, 2026

Claude finished @chekos's task in 1m 10s —— View job


Tasks

  • Gather context
  • Read changed files
  • Review changes vs previous feedback
  • Post final review

Follow-up Review

All five issues from the initial review have been addressed. Here's a summary of what changed:

✅ Issue 1 — AFFGEOID normalization (fixed)

_normalize_geoid now only checks GEOID20 and GEOID10, no longer including AFFGEOID. The function also raises a descriptive ValueError when no candidate is found instead of returning a broken GeoDataFrame.

✅ Issue 2 — resolution kwarg passed unconditionally (fixed)

_GEO_TO_PYGRIS now carries a third boolean flag accepts_resolution. _fetch_tiger_shapes only includes resolution in kwargs when the flag is True, so zctas, pumas, tracts, block_groups, and places are no longer exposed to a potential TypeError.

✅ Issue 3 — Unhelpful AttributeError from _pygris_func (fixed)

getattr(pygris, name, None) with an explicit ImportError is much cleaner than the bare getattr. The message correctly points users to pip install 'pypums[spatial]'.

✅ Issue 4 — No way to opt out of caching (fixed)

cache: bool = True is now a parameter on both _fetch_tiger_shapes and attach_geometry, wired through cleanly.

✅ Issue 5 — Loose pygris version bound (fixed)

Changed to pygris>=0.1.7,<1 in pyproject.toml, giving a sensible major-version ceiling.


Remaining minor observations (non-blocking)

  • _pygris_func error message — The message says "pygris does not expose '{name}'. Upgrade with: pip install 'pypums[spatial]'" but doesn't mention the minimum version (0.1.7). Very minor, but something like "pygris>=0.1.7 is required…" would be slightly more actionable for users who have an older pygris installed.

  • tract/block group require state in practiceaccepts_state=True for these is correct, but _fetch_tiger_shapes silently omits state when it's None. pygris will likely raise its own error for national-level tracts, but a proactive check (if accepts_state and state is None: raise ValueError(...)) could give users a clearer message. Not a blocker since the pygris error will surface anyway.


The implementation is clean, the public API is unchanged, and all significant review feedback was incorporated. This looks ready to merge.

- Include minimum version (>=0.1.7) in pygris ImportError message
- Add requires_state validation so tract/block group/place/puma raise
  a clear ValueError when state is missing, instead of a cryptic pygris
  error downstream

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@claude
Copy link

claude bot commented Mar 13, 2026

Claude finished @chekos's task in 1m 10s —— View job


Tasks

  • Gather context
  • Read changed files
  • Review changes vs previous feedback
  • Post final review

Follow-up Review (3rd pass)

All issues and nits from the previous two rounds of review are addressed. Here's a final accounting:

✅ All 5 original issues — resolved

Already confirmed in the second review.

✅ Remaining nits from second review — also resolved

  • _pygris_func error message — The message now reads "pygris>=0.1.7 is required but does not expose '{name}'", which correctly includes the minimum version. This is more actionable than what the second review saw.

  • tract/block group state requirement — The new requires_state flag in _GEO_TO_PYGRIS (4th tuple element) plus the explicit check in _fetch_tiger_shapes:109-113 gives users a clear, proactive error rather than waiting for pygris to raise a less helpful one. place and puma also have requires_state=True, which is correct.


Current state of the code

pypums/spatial.py is clean and well-structured:

  • _pygris_func — lazy import with a helpful ImportError pointing to the correct install command and minimum version
  • _GEO_TO_PYGRIS — 4-tuple (func_name, accepts_state, accepts_resolution, requires_state) is self-documenting and easy to extend
  • _normalize_geoid — covers GEOID, GEOID20, GEOID10; raises ValueError with column list on failure
  • _fetch_tiger_shapes — filters kwargs to only what each pygris function accepts; validates requires_state upfront; normalizes CRS to EPSG:4269
  • attach_geometry — passes cache through; validates GEOID presence before merge

pyproject.toml: pygris>=0.1.7,<1 is a sensible version bound.

docs/guides/spatial.md: References to "TIGER/Line" updated to "cartographic boundary" where appropriate; caching behavior, install requirements, and troubleshooting section all updated accurately.


This is ready to merge. The implementation is clean, the public API is unchanged, all review feedback was addressed, and the code is easy to understand and extend.

Comprehensive docs sweep across 13 files:
- Replace "TIGER/Line" with "cartographic boundary" or "via pygris"
  in all guides, migration docs, and getting-started pages
- Add 0.3 and 0.3.1 changelog entries with full details
- Update installation docs to mention pygris as spatial dependency
- Add cache=True to attach_geometry() signature in spatial guide
- Add ValueError troubleshooting entry for missing state param
- Update caching guide: remove stale geography/ dir, add pygris
  cache section with OS-specific paths
- Fix example version number (0.2 -> 0.3) in installation.md

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@claude
Copy link

claude bot commented Mar 13, 2026

Claude finished @chekos's task in 1m 44s —— View job


Tasks

  • Gather context and check what changed since last review
  • Review current state of pypums/spatial.py
  • Review other changed files
  • Post final review

Follow-up Review (4th pass — docs sync commit)

The new commit (b3bbcac) is a docs-only sweep with no changes to pypums/spatial.py, pyproject.toml, or uv.lock. The implementation is exactly as approved in the 3rd review pass.

What changed in this commit

A comprehensive pass over 13 documentation files:

  • All "TIGER/Line" references replaced — updated to "cartographic boundary" (for the Census Bureau files themselves) or "via pygris" (for the download mechanism) across all guides, migration docs, and getting-started pages.
  • docs/guides/spatial.mdattach_geometry() signature block now shows cache=True parameter; troubleshooting section adds a ValueError entry for the missing-state-param error introduced in this PR.
  • docs/guides/caching.md — New note clarifying that CensusCache.clear() only clears PyPUMS API caches; the pygris shapefile cache (in ~/.cache/pygris/ on Linux, ~/Library/Caches/pygris/ on macOS) must be cleared separately.
  • docs/reference/changelog.md — New 0.3 and 0.3.1 entries documenting all changes from this PR (pygris dependency, automatic caching, cache parameter, requires_state validation, ZCTA/PUMA fix, congress district fix).

Accuracy check

All documentation claims are consistent with the implementation:

  • OS-specific pygris cache paths are correct (pygris uses platformdirs defaults)
  • The supported geography table in spatial.md accurately reflects _GEO_TO_PYGRIS (which geographies requires_state)
  • attach_geometry() signature in the guide matches the actual function signature including cache=True

One minor observation (non-blocking)

pyproject.toml still has version = "0.3" but the changelog documents these changes under 0.3.1. This is presumably intentional — the version bump to 0.3.1 would be a follow-up commit after merge. Just worth confirming before tagging a release.


No new issues introduced. This is ready to merge.

@chekos chekos merged commit c8be244 into main Mar 13, 2026
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant