Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions docs/getting-started/installation.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ If you plan to work with shapefiles, choropleths, or any geometry (the `geometry
uv add "pypums[spatial]"
```

This adds [`geopandas`](https://geopandas.org/) and its dependencies (`shapely`, `pyproj`, `fiona`), which enable PyPUMS to fetch TIGER/Line shapefiles and return `GeoDataFrame` objects.
This adds [`geopandas`](https://geopandas.org/) and [`pygris`](https://github.com/walkerke/pygris) (plus their dependencies), enabling `geometry=True` to return `GeoDataFrame` objects. pygris handles shapefile downloads with automatic local caching — files are only downloaded once.

## Get a Census API key

Expand Down Expand Up @@ -124,7 +124,7 @@ Run this in your terminal to confirm PyPUMS is installed:
python -c "import pypums; print(pypums.__version__)"
```

You should see the version number printed (e.g. `0.2`).
You should see the version number printed (e.g. `0.3`).

To verify the CLI is available:

Expand Down Expand Up @@ -166,7 +166,7 @@ If this prints your key without raising an error, you are ready to go.

If empty, follow the [configuration steps](#configure-your-api-key) above.

??? question "I get `ImportError: geopandas` when using `geometry=True`"
??? question "I get `ImportError: geopandas` or `ImportError: pygris` when using `geometry=True`"
You need the spatial extras. Install them with:

```bash
Expand Down
12 changes: 7 additions & 5 deletions docs/getting-started/quickstart.md
Original file line number Diff line number Diff line change
Expand Up @@ -91,7 +91,7 @@ print(type(la_poverty)) # <class 'geopandas.geodataframe.GeoDataFrame'>
2. **variables** -- `B17001_002` is the count of people whose income is below the poverty level (from table B17001).
3. **state** -- Required for tract-level queries so the API knows which state to pull tracts from.
4. **county** -- `"037"` is the FIPS code for Los Angeles County. Use `pypums.datasets.fips.lookup_fips(state="California", county="Los Angeles County")` to look up codes.
5. **geometry** -- When `True`, PyPUMS fetches TIGER/Line cartographic boundary shapefiles and merges them with the data. The result is a `GeoDataFrame` with a `geometry` column.
5. **geometry** -- When `True`, PyPUMS downloads cartographic boundary shapefiles (via pygris, cached locally) and merges them with the data. The result is a `GeoDataFrame` with a `geometry` column.

Now plot it with [Altair](https://altair-viz.github.io/):

Expand Down Expand Up @@ -211,11 +211,13 @@ The resulting map shows poverty counts by Census tract across Los Angeles County
}
```

!!! info "What is a TIGER/Line shapefile?"
!!! info "How does `geometry=True` work?"
The Census Bureau publishes free geographic boundary files called
TIGER/Line shapefiles. When you set `geometry=True`, PyPUMS automatically
downloads the correct shapefile for your geography level and year, then
joins it to your data on the `GEOID` column.
TIGER/Line shapefiles. When you set `geometry=True`, PyPUMS uses
[pygris](https://github.com/walkerke/pygris) to download the correct
shapefile for your geography level and year, then joins it to your data
on the `GEOID` column. Files are cached locally so subsequent calls are
fast.

**What to try next:** [Spatial Data & Mapping guide](../guides/spatial.md) for dot-density maps, population-weighted interpolation, and custom resolutions.

Expand Down
6 changes: 3 additions & 3 deletions docs/guides/acs-data.md
Original file line number Diff line number Diff line change
Expand Up @@ -410,8 +410,8 @@ age_with_total["share"] = (

## Geometry support

Set `geometry=True` to return a GeoDataFrame with TIGER/Line
cartographic boundary shapes joined to your data.
Set `geometry=True` to return a GeoDataFrame with cartographic boundary
shapes joined to your data (downloaded via pygris, cached locally).

```python
ca_counties_geo = pypums.get_acs(
Expand Down Expand Up @@ -759,4 +759,4 @@ See the [Finding Variables](variables.md) guide for full details.
- [Finding Variables](variables.md) — Discovering variable codes with `load_variables()`
- [Geography & FIPS](geography.md) — Understanding geography levels and FIPS code lookups
- [Margins of Error](margins-of-error.md) — MOE propagation formulas and statistical testing
- [Spatial Data](spatial.md) — Attaching TIGER/Line geometry to query results
- [Spatial Data](spatial.md) — Attaching geometry to query results
13 changes: 12 additions & 1 deletion docs/guides/caching.md
Original file line number Diff line number Diff line change
Expand Up @@ -114,9 +114,20 @@ structure:
api/ # API response cache (get_acs, get_decennial, etc.)
variables/ # Variable table cache (load_variables)
pums_vars/ # PUMS variable dictionary cache
geography/ # Geometry / shapefile cache
```

!!! note "Shapefile cache is managed by pygris"
When you use `geometry=True`, shapefiles are cached separately by
[pygris](https://github.com/walkerke/pygris) in its own directory:

- **macOS:** `~/Library/Caches/pygris/`
- **Linux:** `~/.cache/pygris/`
- **Windows:** `C:\Users\{user}\AppData\Local\pygris\Cache\`

To clear the shapefile cache, delete that directory manually.
`CensusCache.clear()` only clears the PyPUMS API response caches above,
not the pygris shapefile cache.

You can inspect the cache directory at any time:

```bash
Expand Down
4 changes: 2 additions & 2 deletions docs/guides/decennial-data.md
Original file line number Diff line number Diff line change
Expand Up @@ -228,7 +228,7 @@ print(hispanic_pop.head())

## Geometry support

Set `geometry=True` to return a GeoDataFrame with TIGER/Line shapes:
Set `geometry=True` to return a GeoDataFrame with cartographic boundary shapes:

```python
county_geo = pypums.get_decennial(
Expand Down Expand Up @@ -605,5 +605,5 @@ print(pop_vars[["name", "label"]].head(5))

- [API Reference](../reference/api.md) — Full `get_decennial()` function signature
- [Geography & FIPS](geography.md) — Understanding geography levels and FIPS code lookups
- [Spatial Data](spatial.md) — Attaching TIGER/Line geometry to query results
- [Spatial Data](spatial.md) — Attaching geometry to query results
- [ACS Data](acs-data.md) — For when you need margins of error and more recent annual data
6 changes: 3 additions & 3 deletions docs/guides/migration-flows.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ df = get_flows(
state=None, # state FIPS, abbreviation, or name
county=None, # county FIPS code
msa=None, # metropolitan statistical area code
geometry=False, # attach TIGER/Line shapes
geometry=False, # attach cartographic boundary shapes
moe_level=90, # confidence level: 90, 95, or 99
cache_table=False, # cache API response on disk
show_call=False, # print the API URL
Expand Down Expand Up @@ -218,8 +218,8 @@ where z-scores are 1.645 (90%), 1.960 (95%), and 2.576 (99%).

## Geometry support

Set `geometry=True` to attach TIGER/Line cartographic boundary shapes to
the origin geography. This returns a GeoDataFrame you can plot directly:
Set `geometry=True` to attach cartographic boundary shapes to the origin
geography. This returns a GeoDataFrame you can plot directly:

```python
ca_flows_geo = get_flows(
Expand Down
2 changes: 1 addition & 1 deletion docs/guides/population-estimates.md
Original file line number Diff line number Diff line change
Expand Up @@ -439,7 +439,7 @@ print(housing.head(3))

## Geometry support

Set `geometry=True` to get a GeoDataFrame with TIGER/Line shapes:
Set `geometry=True` to get a GeoDataFrame with cartographic boundary shapes:

```python
pop_geo = pypums.get_estimates(
Expand Down
44 changes: 27 additions & 17 deletions docs/guides/spatial.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,14 @@
# Spatial Data & Mapping

PyPUMS can attach TIGER/Line cartographic boundary shapefiles to any Census
PyPUMS can attach Census cartographic boundary shapefiles to any Census
query, returning a `GeoDataFrame` that is ready for mapping, spatial joins, and
geospatial analysis.
geospatial analysis. Shapefile downloads are handled by
[pygris](https://github.com/walkerke/pygris), with automatic local caching so
files are only downloaded once.

!!! info "Requires the spatial extras"
Geometry features depend on **geopandas** and its dependencies (`shapely`,
`pyproj`, `fiona`). Install them with:
Geometry features depend on **geopandas** and **pygris** (plus their
dependencies). Install them with:

```bash
uv add "pypums[spatial]"
Expand All @@ -18,8 +20,9 @@ geospatial analysis.

The fastest way to get spatial data is to pass `geometry=True` to any of the
main data retrieval functions. PyPUMS will automatically download the
corresponding TIGER/Line shapefile, merge it with the tabular data on the
`GEOID` column, and return a `GeoDataFrame`.
corresponding cartographic boundary shapefile (via pygris), merge it with the
tabular data on the `GEOID` column, and return a `GeoDataFrame`. Downloaded
shapefiles are cached locally so subsequent calls are fast.

=== "get_acs()"

Expand Down Expand Up @@ -226,7 +229,7 @@ corresponding TIGER/Line shapefile, merge it with the tabular data on the
## Coordinate reference system

All geometry returned by PyPUMS is in **NAD83 (EPSG:4269)**, which is the
native CRS of the Census Bureau's TIGER/Line files. You can verify this on any
native CRS of the Census Bureau's cartographic boundary files. You can verify this on any
`GeoDataFrame`:

```python
Expand Down Expand Up @@ -291,7 +294,7 @@ You will often need to reproject to a different CRS depending on your use case.

## Shapefile resolution

TIGER/Line cartographic boundary files come in three resolutions. The default
Cartographic boundary files come in three resolutions. The default
is `500k`, which strikes a good balance between detail and file size.

| Resolution | Description | Best for |
Expand All @@ -308,7 +311,7 @@ always uses `500k`.

## Supported geographies

The following geography levels have matching TIGER/Line shapefiles:
The following geography levels have matching cartographic boundary shapefiles:

| Geography | Requires `state`? | Notes |
|------------------------|--------------------|----------------------------------------|
Expand All @@ -325,8 +328,8 @@ The following geography levels have matching TIGER/Line shapefiles:

!!! warning "Sub-state geographies require the `state` parameter"
For `tract`, `block group`, `place`, and `puma`, the Census Bureau
publishes shapefiles per state. PyPUMS needs the `state` parameter to
know which file to download.
publishes shapefiles per state. PyPUMS will raise a `ValueError` if you
omit the `state` parameter for these geography levels.

---

Expand Down Expand Up @@ -409,6 +412,7 @@ attach_geometry(
state=None, # state FIPS or abbreviation
year=2023, # data year
resolution="500k", # "500k", "5m", or "20m"
cache=True, # cache shapefiles locally (via pygris)
) -> GeoDataFrame
```

Expand Down Expand Up @@ -960,21 +964,27 @@ memory. A few tips for working with large datasets:
(1318, 6)
```

- **Cache your queries** with `cache_table=True` so repeated runs do not
re-fetch the shapefile from the Census Bureau servers.
- **Shapefile caching is automatic.** PyPUMS uses pygris with caching
enabled, so shapefiles are downloaded once and reused from a local
cache directory (`~/.cache/pygris/` on Linux,
`~/Library/Caches/pygris/` on macOS).

---

## Troubleshooting

**`ImportError: geopandas is required for spatial operations`**
: Install the spatial extra: `uv add "pypums[spatial]"`. This pulls in
`geopandas`, `shapely`, and `pyproj`.
`geopandas`, `pygris`, `shapely`, and `pyproj`.

**`ValueError: geography='tract' requires a state parameter`**
: Sub-state geographies (`tract`, `block group`, `place`, `puma`) need a
`state` argument. Pass a FIPS code, abbreviation, or full name.

**Geometry column is all `None`**
: The Census TIGER/Line server may not have shapefiles for the geography
level and year you requested. Try a different year or a broader
geography (e.g., county instead of block group).
: The Census Bureau may not have shapefiles for the geography level and
year you requested. Try a different year or a broader geography (e.g.,
county instead of block group).

**CRS mismatch when combining DataFrames**
: All PyPUMS geometry is returned in EPSG:4269 (NAD83). If you are
Expand Down
2 changes: 1 addition & 1 deletion docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ print(df.head())

---

Add `geometry=True` to any query and get a GeoDataFrame with TIGER/Line boundaries.
Add `geometry=True` to any query and get a GeoDataFrame with cartographic boundary shapes (via pygris, cached locally).

[:octicons-arrow-right-24: Spatial guide](guides/spatial.md)

Expand Down
2 changes: 1 addition & 1 deletion docs/migration/from-census-ftp.md
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,7 @@ df = pypums.get_pums(
| **Survey design** | Manual SDR calculation | `to_survey()` helper |
| **Caching** | Manage files yourself | Built-in with TTL |
| **Summary tables** | Not available (PUMS only) | `get_acs()` for pre-tabulated data |
| **Geometry** | Separate TIGER/Line download | `geometry=True` parameter |
| **Geometry** | Separate TIGER/Line download | `geometry=True` (pygris handles the download) |

## When You Still Need FTP

Expand Down
2 changes: 1 addition & 1 deletion docs/migration/from-old-pypums.md
Original file line number Diff line number Diff line change
Expand Up @@ -79,7 +79,7 @@ pypums estimates state -s CA

1. **No more large file downloads** — The Census API returns exactly the data you need, not gigabyte-sized CSV files
2. **More data sources** — Access ACS summary tables, Decennial Census, population estimates, and migration flows in addition to PUMS microdata
3. **Geometry support** — Get GeoDataFrames with TIGER/Line boundaries in a single call
3. **Geometry support** — Get GeoDataFrames with cartographic boundary geometry in a single call (powered by pygris)
4. **Feature parity with tidycensus** — If you can do it in R with tidycensus, you can now do it in Python with PyPUMS

## The ACS Class Still Works (For Now)
Expand Down
4 changes: 2 additions & 2 deletions docs/migration/from-tidycensus.md
Original file line number Diff line number Diff line change
Expand Up @@ -236,7 +236,7 @@ If you've used Kyle Walker's [tidycensus](https://walker-data.com/tidycensus/) R
| **Variable naming** | Can rename inline: `c(medincome = "B19013_001")` | Use standard variable codes; rename with pandas after |
| **County parameter** | Accepts county names: `county = "Los Angeles"` | Uses FIPS codes: `county="037"`. Use `lookup_fips()` to find codes |
| **Output type** | tibble / sf object | pandas DataFrame / GeoDataFrame |
| **Spatial CRS** | Varies by function | Always NAD83 (EPSG:4269) from TIGER/Line |
| **Spatial CRS** | Varies by function | Always NAD83 (EPSG:4269) via pygris |
| **Plotting** | ggplot2 / tmap | Altair / geopandas |
| **PUMS download** | Downloads CSV files from FTP | Queries Census API directly (faster for filtered requests) |
| **Survey design** | Returns `tbl_svy` (srvyr package) | Returns `SurveyDesign` object with SDR methods |
Expand All @@ -246,6 +246,6 @@ If you've used Kyle Walker's [tidycensus](https://walker-data.com/tidycensus/) R
- Same Census API under the hood
- Same variable codes (B19013_001, P1_001N, etc.)
- Same geography names ("state", "county", "tract", etc.)
- Same TIGER/Line shapefiles for geometry
- Same Census cartographic boundary files for geometry (via pygris)
- Same MOE formulas from the ACS Handbook
- Same replicate weight methodology (SDR with 80 weights)
48 changes: 47 additions & 1 deletion docs/reference/changelog.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,51 @@
# Changelog

## 0.3.1 (2026)

### Changed

- **Shapefile downloads now use pygris** — The internal `_fetch_tiger_shapes()`
function now delegates to [pygris](https://github.com/walkerke/pygris) instead
of manually constructing Census Bureau URLs. The public API (`geometry=True`,
`attach_geometry()`) is unchanged.

### Added

- **Automatic shapefile caching** — Downloaded shapefiles are cached locally via
pygris (`~/Library/Caches/pygris/` on macOS, `~/.cache/pygris/` on Linux).
Repeated `geometry=True` calls no longer re-download files.
- **`cache` parameter** on `attach_geometry()` — Pass `cache=False` to force a
fresh download.
- **Clear error for missing `state`** — Sub-state geographies (`tract`,
`block group`, `place`, `puma`) now raise a `ValueError` with a helpful message
when `state` is omitted.

### Fixed

- **ZCTA and PUMA geometry for pre-2020 years** — Previously broken due to
hardcoded 2020 vintage suffixes. pygris handles vintage selection correctly.
- **Congressional district year mapping** — Previously used a hardcoded formula.
pygris handles this internally.

### Improved

- **Broader year range** — Geometry support extended from ~2014+ to ~1990+ for
most geography levels.
- **Faster county downloads** — County-level queries now pass `state` through to
pygris when provided, downloading a smaller state-specific file.

### Dependencies

- Added `pygris>=0.1.7,<1` to the `spatial` optional dependency group.

---

## 0.3 (2026)

Version bump. No user-facing changes from 0.2.

---

## 0.2 (2026)

Major release with complete Census API feature parity with R's tidycensus.
Expand Down Expand Up @@ -49,7 +95,7 @@ Major release with complete Census API feature parity with R's tidycensus.
- `moe_product()` — MOE for derived products
- `significance()` — Statistical significance testing

- **Spatial support** — TIGER/Line cartographic boundary integration
- **Spatial support** — Cartographic boundary integration
- `attach_geometry()` — Merge shapefiles with Census data
- `as_dot_density()` — Dot-density point conversion
- `interpolate_pw()` — Population-weighted areal interpolation
Expand Down
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ Changelog = "https://github.com/chekos/pypums/releases"
pypums = "pypums.cli:cli"

[project.optional-dependencies]
spatial = ["geopandas>=0.12"]
spatial = ["geopandas>=0.12", "pygris>=0.1.7,<1"]
test = ["pytest"]
docs = [
"mkdocs>=1.6,<2",
Expand Down
Loading
Loading