Skip to content

Add R-parity Python functions and datasets#420

Open
JoaoCarabetta wants to merge 4 commits into
ipeaGIT:masterfrom
JoaoCarabetta:python-r-parity
Open

Add R-parity Python functions and datasets#420
JoaoCarabetta wants to merge 4 commits into
ipeaGIT:masterfrom
JoaoCarabetta:python-r-parity

Conversation

@JoaoCarabetta
Copy link
Copy Markdown
Collaborator

Summary

  • Add cep_to_state, remove_islands, read_capitals, read_favela, read_polling_places, read_quilombola_land
  • Bundle br_offcoast.parquet for island clipping and export new symbols from __init__.py

Test plan

  • Unit tests for each new function (pytest -m "not network")

Depends on #418

Made with Cursor

JoaoCarabetta and others added 3 commits May 21, 2026 12:55
Introduce cached parquet downloads, filtering, multi-format output (sf/arrow/duckdb relation), and shared read_geobr_v2/hybrid helpers to align Python with the R v2.0.0 data path.

Co-authored-by: Cursor <cursoragent@cursor.com>
Port cep_to_state, remove_islands, and read_* wrappers for capitals, favelas, polling places, and quilombola lands.

Co-authored-by: Cursor <cursoragent@cursor.com>
Upgrade deprecated GitHub Actions, use astral-sh/setup-uv cross-platform, and skip network-dependent list_geobr test while testing filters via read_geobr_v2.

Co-authored-by: Cursor <cursoragent@cursor.com>
@rafapereirabr rafapereirabr requested a review from camilagb May 21, 2026 17:19
AppVeyor is not required for Python (GitHub Actions Python-CMD-check covers all platforms). Path filters skip builds when only python-package or .github change.

Co-authored-by: Cursor <cursoragent@cursor.com>
Copy link
Copy Markdown
Collaborator

@camilagb camilagb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One function that is missing is the grid_state_correspondence_table, that returns the data in https://github.com/ipeaGIT/geobr/blob/v1.9.1/python-package/geobr/data/grid_state_correspondence_table.csv. Maybe include a new file?

from geobr import __path__ as geobr_directory
import pandas as pd

grid_file_path = geobr_directory[0] + "/data/grid_state_correspondence_table.csv"
dtypes = {"name_state": str, "abbrev_state": str, "code_grid": str}

def get_grid_state_table() -> pd.DataFrame:
    grid_state_correspondence_table = pd.read_csv(
        grid_file_path, encoding="latin-1", dtype=dtypes
    )
    return grid_state_correspondence_table 

@camilagb camilagb marked this pull request as draft May 28, 2026 11:47
@camilagb camilagb marked this pull request as ready for review May 28, 2026 11:47
Comment on lines +41 to +45
output: str = "sf",
show_progress: bool = True,
cache: bool = True,
verbose: bool = False,
year: int = 2010,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
output: str = "sf",
show_progress: bool = True,
cache: bool = True,
verbose: bool = False,
year: int = 2010,
year: int,
output: str = "gpd",
show_progress: bool = True,
cache: bool = True,
verbose: bool = False

Parameters
----------
output : str
``"sf"`` for GeoDataFrame (default), ``"duckdb"``, or ``"arrow"``.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
``"sf"`` for GeoDataFrame (default), ``"duckdb"``, or ``"arrow"``.
``"gpd"`` for GeoDataFrame (default), ``"duckdb"``, or ``"arrow"``.

Comment on lines +59 to +85
if output == "sf":
gdf = read_municipal_seat(year=year, verbose=verbose)
gdf = gdf[gdf["code_muni"].isin(codes)]
return gdf.sort_values("code_muni").reset_index(drop=True)

from geobr.utils import read_geobr_v2

results = []
for code in codes:
part = read_geobr_v2(
geography="municipalseat",
year=year,
code=code,
simplified=True,
output=output,
show_progress=show_progress,
cache=cache,
verbose=False,
)
results.append(part)
if output == "sf":
import geopandas as gpd

return gpd.GeoDataFrame(
pd.concat(results, ignore_index=True)
).sort_values("code_muni").reset_index(drop=True)
return results
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this update, it's necessary to also update the read_municipal_seat function and use the filter_by_code proposed in #418

Suggested change
if output == "sf":
gdf = read_municipal_seat(year=year, verbose=verbose)
gdf = gdf[gdf["code_muni"].isin(codes)]
return gdf.sort_values("code_muni").reset_index(drop=True)
from geobr.utils import read_geobr_v2
results = []
for code in codes:
part = read_geobr_v2(
geography="municipalseat",
year=year,
code=code,
simplified=True,
output=output,
show_progress=show_progress,
cache=cache,
verbose=False,
)
results.append(part)
if output == "sf":
import geopandas as gpd
return gpd.GeoDataFrame(
pd.concat(results, ignore_index=True)
).sort_values("code_muni").reset_index(drop=True)
return results
return read_municipal_seat(
year=year,
code_muni=codes,
output=output,
show_progress=show_progress,
cache=cache,
verbose=verbose
)

def read_polling_places(
year: int,
code_muni: str = "all",
output: str = "sf",
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
output: str = "sf",
output: str = "gpd",

year: int,
code_muni: str = "all",
simplified: bool = True,
output: str = "sf",
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
output: str = "sf",
output: str = "gpd",

date: int,
code_state: str = "all",
simplified: bool = True,
output: str = "sf",
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
output: str = "sf",
output: str = "gpd",

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants