Improve list_geobr catalog and lookup_muni fuzzy matching by JoaoCarabetta · Pull Request #421 · ipeaGIT/geobr

JoaoCarabetta · 2026-05-21T15:55:58Z

Summary

list_geobr() returns a DataFrame joined with live v2 metadata years
lookup_muni() adds year parameter and rapidfuzz-based fuzzy name matching

Test plan

test_list_geobr, test_lookup_muni, test_lookup_muni_v2

Depends on #418

Made with Cursor

Introduce cached parquet downloads, filtering, multi-format output (sf/arrow/duckdb relation), and shared read_geobr_v2/hybrid helpers to align Python with the R v2.0.0 data path. Co-authored-by: Cursor <cursoragent@cursor.com>

Join live v2 metadata in list_geobr and add year-aware fuzzy municipality lookup using rapidfuzz. Co-authored-by: Cursor <cursoragent@cursor.com>

Cherry-pick CI workflow upgrade from python-v2-pipeline; keep PR4 list_geobr tests unchanged. Co-authored-by: Cursor <cursoragent@cursor.com>

AppVeyor is not required for Python (GitHub Actions Python-CMD-check covers all platforms). Path filters skip builds when only python-package or .github change. Co-authored-by: Cursor <cursoragent@cursor.com>

camilagb · 2026-05-28T17:34:48Z

+        "alias": [
+            "country", "regions", "states", "mesoregions", "microregions",
+            "intermediateregions", "immediateregions", "municipalities",
+            "municipalseats", "weightingareas", "censustracts", "statsgrid",
+            "metroarea", "urbanareas", "amazonialegal", "biomes",
+            "conservationunits", "disasterriskareas", "indigenousland",
+            "semiarid", "healthfacilities", "healthregions", "neighborhoods",
+            "schools", "amc", "urbanconcentrations", "poparrengements",
+            "favelas", "pollingplaces", "quilombolalands",
+        ],


Following the R package

Suggested change

"alias": [

"country", "regions", "states", "mesoregions", "microregions",

"intermediateregions", "immediateregions", "municipalities",

"municipalseats", "weightingareas", "censustracts", "statsgrid",

"metroarea", "urbanareas", "amazonialegal", "biomes",

"conservationunits", "disasterriskareas", "indigenousland",

"semiarid", "healthfacilities", "healthregions", "neighborhoods",

"schools", "amc", "urbanconcentrations", "poparrengements",

"favelas", "pollingplaces", "quilombolalands",

],

"alias": [

"country", "regions", "states", "mesoregions", "microregions",

"intermediateregions", "immediateregions", "municipalities",

"municipalseats", "weightingareas", "censustracts", "statsgrid",

"metroarea", "urbanareas", "amazonialegal", "biomes",

"conservationunits", "disasterriskareas", "indigenouslands",

"semiarid", "healthfacilities", "healthregions", "neighborhoods",

"schools", "amc", "poparrangements", "poparrangements",

"favelas", "pollingplaces", "quilombolalands",

],

camilagb · 2026-05-28T17:53:47Z

+    rows = []
+    for _, row in out.iterrows():
+        raw = row.get("years_available")
+        if raw is None or (isinstance(raw, float) and pd.isna(raw)):
+            years = []
+        else:
+            years = str(raw).split(", ")
+        if not years or years == [""]:
+            rows.append(row.to_dict())
+        else:
+            for y in years:
+                r = row.to_dict()
+                r["year"] = y.strip()
+                rows.append(r)
+    return pd.DataFrame(rows)


Suggested change

rows = []

for _, row in out.iterrows():

raw = row.get("years_available")

if raw is None or (isinstance(raw, float) and pd.isna(raw)):

years = []

else:

years = str(raw).split(", ")

if not years or years == [""]:

rows.append(row.to_dict())

else:

for y in years:

r = row.to_dict()

r["year"] = y.strip()

rows.append(r)

return pd.DataFrame(rows)

out["year"] = out["years_available"].fillna("").str.split(", ")

out_expandido = out.explode("year")

out_expandido["year"] = out_expandido["year"].str.strip()

out_expandido = out_expandido.drop(columns=["years_available"])

return out_expandido

camilagb · 2026-05-28T17:55:46Z

-    immediate regions. You should not select both code_muni and name_muni
+
+def lookup_muni(
+    year: int = 2010,


Suggested change

year: int = 2010,

year: int,

JoaoCarabetta and others added 3 commits May 21, 2026 12:55

Add v2 parquet pipeline foundation for Python geobr.

3d58836

Introduce cached parquet downloads, filtering, multi-format output (sf/arrow/duckdb relation), and shared read_geobr_v2/hybrid helpers to align Python with the R v2.0.0 data path. Co-authored-by: Cursor <cursoragent@cursor.com>

Improve list_geobr catalog and lookup_muni fuzzy matching.

34cb522

Join live v2 metadata in list_geobr and add year-aware fuzzy municipality lookup using rapidfuzz. Co-authored-by: Cursor <cursoragent@cursor.com>

Fix Python CI workflow and stabilize filter pipeline tests.

e7b7139

Cherry-pick CI workflow upgrade from python-v2-pipeline; keep PR4 list_geobr tests unchanged. Co-authored-by: Cursor <cursoragent@cursor.com>

rafapereirabr requested a review from camilagb May 21, 2026 17:19

Scope AppVeyor to r-package only; skip Python changes.

c5b895c

AppVeyor is not required for Python (GitHub Actions Python-CMD-check covers all platforms). Path filters skip builds when only python-package or .github change. Co-authored-by: Cursor <cursoragent@cursor.com>

camilagb reviewed May 28, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve list_geobr catalog and lookup_muni fuzzy matching#421

Improve list_geobr catalog and lookup_muni fuzzy matching#421
JoaoCarabetta wants to merge 4 commits into
ipeaGIT:masterfrom
JoaoCarabetta:python-catalog-lookup

JoaoCarabetta commented May 21, 2026

Uh oh!

camilagb May 28, 2026 •

edited

Loading

Uh oh!

camilagb May 28, 2026

Uh oh!

camilagb May 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

JoaoCarabetta commented May 21, 2026

Summary

Test plan

Uh oh!

camilagb May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

camilagb May 28, 2026

Choose a reason for hiding this comment

Uh oh!

camilagb May 28, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

camilagb May 28, 2026 •

edited

Loading