Add R-parity Python functions and datasets by JoaoCarabetta · Pull Request #420 · ipeaGIT/geobr

JoaoCarabetta · 2026-05-21T15:55:52Z

Summary

Add cep_to_state, remove_islands, read_capitals, read_favela, read_polling_places, read_quilombola_land
Bundle br_offcoast.parquet for island clipping and export new symbols from __init__.py

Test plan

Unit tests for each new function (pytest -m "not network")

Depends on #418

Made with Cursor

Introduce cached parquet downloads, filtering, multi-format output (sf/arrow/duckdb relation), and shared read_geobr_v2/hybrid helpers to align Python with the R v2.0.0 data path. Co-authored-by: Cursor <cursoragent@cursor.com>

Port cep_to_state, remove_islands, and read_* wrappers for capitals, favelas, polling places, and quilombola lands. Co-authored-by: Cursor <cursoragent@cursor.com>

Upgrade deprecated GitHub Actions, use astral-sh/setup-uv cross-platform, and skip network-dependent list_geobr test while testing filters via read_geobr_v2. Co-authored-by: Cursor <cursoragent@cursor.com>

AppVeyor is not required for Python (GitHub Actions Python-CMD-check covers all platforms). Path filters skip builds when only python-package or .github change. Co-authored-by: Cursor <cursoragent@cursor.com>

camilagb

One function that is missing is the grid_state_correspondence_table, that returns the data in https://github.com/ipeaGIT/geobr/blob/v1.9.1/python-package/geobr/data/grid_state_correspondence_table.csv. Maybe include a new file?

from geobr import __path__ as geobr_directory
import pandas as pd

grid_file_path = geobr_directory[0] + "/data/grid_state_correspondence_table.csv"
dtypes = {"name_state": str, "abbrev_state": str, "code_grid": str}

def get_grid_state_table() -> pd.DataFrame:
    grid_state_correspondence_table = pd.read_csv(
        grid_file_path, encoding="latin-1", dtype=dtypes
    )
    return grid_state_correspondence_table

camilagb · 2026-05-28T11:58:04Z

+    output: str = "sf",
+    show_progress: bool = True,
+    cache: bool = True,
+    verbose: bool = False,
+    year: int = 2010,


Suggested change

output: str = "sf",

show_progress: bool = True,

cache: bool = True,

verbose: bool = False,

year: int = 2010,

year: int,

output: str = "gpd",

show_progress: bool = True,

cache: bool = True,

verbose: bool = False

camilagb · 2026-05-28T11:58:27Z

+    Parameters
+    ----------
+    output : str
+        ``"sf"`` for GeoDataFrame (default), ``"duckdb"``, or ``"arrow"``.


Suggested change

``"sf"`` for GeoDataFrame (default), ``"duckdb"``, or ``"arrow"``.

``"gpd"`` for GeoDataFrame (default), ``"duckdb"``, or ``"arrow"``.

camilagb · 2026-05-28T12:07:39Z

+    if output == "sf":
+        gdf = read_municipal_seat(year=year, verbose=verbose)
+        gdf = gdf[gdf["code_muni"].isin(codes)]
+        return gdf.sort_values("code_muni").reset_index(drop=True)
+
+    from geobr.utils import read_geobr_v2
+
+    results = []
+    for code in codes:
+        part = read_geobr_v2(
+            geography="municipalseat",
+            year=year,
+            code=code,
+            simplified=True,
+            output=output,
+            show_progress=show_progress,
+            cache=cache,
+            verbose=False,
+        )
+        results.append(part)
+    if output == "sf":
+        import geopandas as gpd
+
+        return gpd.GeoDataFrame(
+            pd.concat(results, ignore_index=True)
+        ).sort_values("code_muni").reset_index(drop=True)
+    return results


For this update, it's necessary to also update the read_municipal_seat function and use the filter_by_code proposed in #418

Suggested change

if output == "sf":

gdf = read_municipal_seat(year=year, verbose=verbose)

gdf = gdf[gdf["code_muni"].isin(codes)]

return gdf.sort_values("code_muni").reset_index(drop=True)

from geobr.utils import read_geobr_v2

results = []

for code in codes:

part = read_geobr_v2(

geography="municipalseat",

year=year,

code=code,

simplified=True,

output=output,

show_progress=show_progress,

cache=cache,

verbose=False,

)

results.append(part)

if output == "sf":

import geopandas as gpd

return gpd.GeoDataFrame(

pd.concat(results, ignore_index=True)

).sort_values("code_muni").reset_index(drop=True)

return results

return read_municipal_seat(

year=year,

code_muni=codes,

output=output,

show_progress=show_progress,

cache=cache,

verbose=verbose

)

camilagb · 2026-05-28T12:08:30Z

+def read_polling_places(
+    year: int,
+    code_muni: str = "all",
+    output: str = "sf",


Suggested change

output: str = "sf",

output: str = "gpd",

camilagb · 2026-05-28T12:11:00Z

+    year: int,
+    code_muni: str = "all",
+    simplified: bool = True,
+    output: str = "sf",


Suggested change

output: str = "sf",

output: str = "gpd",

camilagb · 2026-05-28T12:12:15Z

+    date: int,
+    code_state: str = "all",
+    simplified: bool = True,
+    output: str = "sf",


Suggested change

output: str = "sf",

output: str = "gpd",

JoaoCarabetta and others added 3 commits May 21, 2026 12:55

Add v2 parquet pipeline foundation for Python geobr.

3d58836

Introduce cached parquet downloads, filtering, multi-format output (sf/arrow/duckdb relation), and shared read_geobr_v2/hybrid helpers to align Python with the R v2.0.0 data path. Co-authored-by: Cursor <cursoragent@cursor.com>

Add R-parity Python functions and datasets.

5143c61

Port cep_to_state, remove_islands, and read_* wrappers for capitals, favelas, polling places, and quilombola lands. Co-authored-by: Cursor <cursoragent@cursor.com>

Fix Python CI workflow and stabilize PR418 test suite.

45561c6

Upgrade deprecated GitHub Actions, use astral-sh/setup-uv cross-platform, and skip network-dependent list_geobr test while testing filters via read_geobr_v2. Co-authored-by: Cursor <cursoragent@cursor.com>

rafapereirabr requested a review from camilagb May 21, 2026 17:19

Scope AppVeyor to r-package only; skip Python changes.

55bffb1

AppVeyor is not required for Python (GitHub Actions Python-CMD-check covers all platforms). Path filters skip builds when only python-package or .github change. Co-authored-by: Cursor <cursoragent@cursor.com>

camilagb reviewed May 27, 2026

View reviewed changes

camilagb marked this pull request as draft May 28, 2026 11:47

camilagb marked this pull request as ready for review May 28, 2026 11:47

camilagb reviewed May 28, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add R-parity Python functions and datasets#420

Add R-parity Python functions and datasets#420
JoaoCarabetta wants to merge 4 commits into
ipeaGIT:masterfrom
JoaoCarabetta:python-r-parity

JoaoCarabetta commented May 21, 2026

Uh oh!

camilagb left a comment

Uh oh!

camilagb May 28, 2026

Uh oh!

camilagb May 28, 2026

Uh oh!

camilagb May 28, 2026

Uh oh!

camilagb May 28, 2026

Uh oh!

camilagb May 28, 2026

Uh oh!

camilagb May 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	``"sf"`` for GeoDataFrame (default), ``"duckdb"``, or ``"arrow"``.
	``"gpd"`` for GeoDataFrame (default), ``"duckdb"``, or ``"arrow"``.

Conversation

JoaoCarabetta commented May 21, 2026

Summary

Test plan

Uh oh!

camilagb left a comment

Choose a reason for hiding this comment

Uh oh!

camilagb May 28, 2026

Choose a reason for hiding this comment

Uh oh!

camilagb May 28, 2026

Choose a reason for hiding this comment

Uh oh!

camilagb May 28, 2026

Choose a reason for hiding this comment

Uh oh!

camilagb May 28, 2026

Choose a reason for hiding this comment

Uh oh!

camilagb May 28, 2026

Choose a reason for hiding this comment

Uh oh!

camilagb May 28, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants