Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
82 changes: 58 additions & 24 deletions .github/workflows/Python-CMD-check.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,13 +2,15 @@ on:
push:
paths:
- python-package/**
- .github/workflows/Python-CMD-check.yaml
branches:
- main
- master
- dev
pull_request:
paths:
- python-package/**
- .github/workflows/Python-CMD-check.yaml
branches:
- main
- master
Expand All @@ -20,48 +22,80 @@ jobs:
Python-CMD-check:
runs-on: ${{ matrix.os }}

name: ${{ matrix.os }} (${{ matrix.python-version }})
name: ${{ matrix.os }} (${{ matrix.python-version }}${{ matrix.extras != '' && format(', {0}', matrix.extras) || '' }})

strategy:
fail-fast: false
matrix:
os: [ubuntu-latest, macOS-latest, windows-latest]
python-version: ["3.9", "3.10", "3.11", "3.12", "3.13"]
# Python 3.8 support ends in 2024-10
# Python 3.12 support starts in 2023-10
# Check Python maintenance status at: https://www.python.org/downloads/

env:
GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }}
extras: [""]
include:
# Optional DuckDB extra: ubuntu only to keep CI cost reasonable.
- os: ubuntu-latest
python-version: "3.11"
extras: all
- os: ubuntu-latest
python-version: "3.12"
extras: all

steps:
- name: Check out geobr
uses: actions/checkout@v3
uses: actions/checkout@v4

- name: Setup Python ${{ matrix.python-version }}
uses: actions/setup-python@v4
uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python-version }}
cache: "pip"

- name: Install uv
run: |
python -m pip install --upgrade pip
curl -LsSf https://astral.sh/uv/install.sh | sh
uses: astral-sh/setup-uv@v5
with:
enable-cache: true

- name: Install dependencies with uv
run: uv sync
- name: Install dependencies
shell: bash
working-directory: python-package

- name: Run tests with uv
run: |
uv run pytest -n auto ./tests
uv sync
if [ -n "${{ matrix.extras }}" ]; then
uv pip install -e ".[${{ matrix.extras }}]"
else
uv pip install -e .
fi

- name: Run tests
shell: bash
working-directory: python-package
run: uv run pytest -n auto ./tests -m "not network"

Python-network-check:
runs-on: ubuntu-latest
name: network tests (ubuntu, duckdb)
timeout-minutes: 20

steps:
- name: Check out geobr
uses: actions/checkout@v4

- name: Upload check results
if: always()
uses: actions/upload-artifact@v3
- name: Setup Python 3.11
uses: actions/setup-python@v5
with:
name: test-results
path: python-package/test-results.txt
if-no-files-found: warn
python-version: "3.11"

- name: Install uv
uses: astral-sh/setup-uv@v5
with:
enable-cache: true

- name: Install dependencies
shell: bash
working-directory: python-package
run: |
uv sync
uv pip install -e ".[duckdb]"

- name: Run network tests
shell: bash
working-directory: python-package
run: uv run pytest ./tests -m network -v
54 changes: 51 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,8 +37,21 @@ here](https://github.com/r-spatial/sf#linux).

## Installation Python

[uv](https://docs.astral.sh/uv/) is the recommended installer. From your project
directory (run `uv init` first if you don't have a `pyproject.toml` yet):

``` bash
uv add geobr

# DuckDB SQL API and spatial analysis
uv add "geobr[duckdb]"
```

Alternatively, with pip:

``` bash
pip install geobr
pip install "geobr[duckdb]"
```

*Windows users:*
Expand All @@ -49,7 +62,7 @@ conda activate geo_env
conda config --env --add channels conda-forge
conda config --env --set channel_priority strict
conda install python=3 geopandas
pip install geobr
uv add geobr
```

# Basic Usage
Expand Down Expand Up @@ -93,8 +106,40 @@ mun = read_municipality(code_muni="RJ", year=2010)
mun = read_municipality(code_muni="all", year=2018)
```

More examples
[here](https://github.com/ipeaGIT/geobr/tree/master/python-package/examples)
Since v0.3.0, the Python package uses a hybrid GeoParquet pipeline (with
GeoPackage fallback). For DuckDB workflows, use `query()` to load and
analyze snapshots directly in SQL.

## Python, DuckDB SQL and spatial analysis

Install the optional DuckDB extra, then run SQL across geobr snapshots.
Missing views are downloaded automatically on first use.

``` python
from geobr import query, to_geopandas

# Filter a snapshot (auto-downloads states_2020 on first use)
query("""
SELECT name_state, abbrev_state
FROM states_2020
WHERE abbrev_state = 'RJ'
""").df()

# Spatial join across datasets
query("""
SELECT count(*) AS schools_in_amazon
FROM schools_2020 s
JOIN biomes_2019 b ON ST_Within(s.geometry, b.geometry)
WHERE b.name_biome ILIKE '%Amaz%'
""").df()

# Round-trip to GeoPandas for plotting
gdf = to_geopandas("states_2020")
```

More examples in
[python-package/examples](https://github.com/ipeaGIT/geobr/tree/master/python-package/examples),
including [duckdb_demo.ipynb](https://github.com/ipeaGIT/geobr/blob/master/python-package/examples/duckdb_demo.ipynb).

# Available datasets:

Expand Down Expand Up @@ -142,6 +187,8 @@ CRS(4674).**
|----|----|
| `list_geobr` | List all datasets available in the geobr package |
| `lookup_muni` | Look up municipality codes by their name, or the other way around |
| `query` | Run SQL on geobr snapshots with DuckDB (Python, v0.3.0+) |
| `to_geopandas` | Convert a DuckDB view or relation to GeoPandas (Python, v0.3.0+) |
| `remove_islands` | Removes distant oceanic islands from Brazil |
| `grid_state_correspondence_table` | Loads a correspondence table indicating what quadrants of IBGE’s statistical grid intersect with each state |
| `cep_to_state` | Determine the state of a given CEP postal code |
Expand Down Expand Up @@ -178,6 +225,7 @@ contributions to the community, including for example:
- Option to download geometries with simplified borders for fast
rendering
- Option to download geometries as geoarrow objects out of memory
- DuckDB SQL API for cross-dataset spatial analysis in Python (v0.3.0+)
- Stable version published on CRAN for R users, and on PyPI for Python
users

Expand Down
76 changes: 76 additions & 0 deletions appveyor.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
# AppVeyor: Windows R CMD check for r-package only.
# Python CI runs in GitHub Actions (.github/workflows/Python-CMD-check.yaml).
# R also runs on Windows via GitHub Actions (.github/workflows/R-CMD-check.yaml).

only_commits:
files:
- r-package/**

skip_commits:
files:
- python-package/**
- .github/**
- mcp-server/**

environment:
global:
R_REMOTES_STANDALONE: true
PKGDIR: r-package
matrix:
- R_VERSION: release
R_ARCH: x64

init:
ps: |
$ErrorActionPreference = "Stop"
Get-Date

install:
ps: |
$ErrorActionPreference = "Stop"
if (-not (Test-Path r-appveyor-scripts)) {
New-Item -ItemType Directory -Force -Path r-appveyor-scripts | Out-Null
}
if (-not (Test-Path r-appveyor-scripts/appveyor-tool.ps1)) {
Invoke-WebRequest -UseBasicParsing `
-Uri "https://raw.githubusercontent.com/krlmlr/r-appveyor/master/scripts/appveyor-tool.ps1" `
-OutFile "r-appveyor-scripts/appveyor-tool.ps1"
}
Import-Module .\r-appveyor-scripts\appveyor-tool.ps1
Bootstrap

build_script:
ps: |
$ErrorActionPreference = "Stop"
Push-Location $env:PKGDIR
try {
travis-tool.sh install_deps
} finally {
Pop-Location
}

test_script:
ps: |
$ErrorActionPreference = "Stop"
Push-Location $env:PKGDIR
try {
travis-tool.sh run_tests
} finally {
Pop-Location
}

on_failure:
- 7z a failure.zip *.Rcheck\*
- appveyor PushArtifact failure.zip

artifacts:
- path: r-package\*.Rcheck\**\*.log
name: Logs
- path: r-package\*.Rcheck\**\*.out
name: Logs
- path: r-package\*.Rcheck\**\*.fail
name: Logs
- path: r-package\*.Rcheck\**\*.Rout
name: Logs
- path: r-package\*_*.zip
name: Bits
30 changes: 30 additions & 0 deletions python-package/CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,36 @@

-------------------------------------------------------

# 0.3.0 (unreleased)

## Foundation (Phase 0)
* Core dependencies include `pyarrow` and `rapidfuzz` (Arrow output and fuzzy `lookup_muni`)
* Optional extra: `geobr[duckdb]` (alias `geobr[all]`)
* Parquet v2.0.0 download pipeline (`download_metadata_v2`, `download_parquet`, disk cache)
* Shared helpers: `_filter`, `_output`, `_cache`, `read_geobr_v2`, `read_geobr_hybrid`

### Phase 1 — Agent 1
* `read_capitals`, `read_favela`, `read_polling_places`, `read_quilombola_land`
* `cep_to_state`, `remove_islands`

### Phase 1 — Agent 2
* `code_muni` filtering: `read_schools`, `read_health_facilities`, `read_neighborhood`, `read_disaster_risk_area`, `read_statistical_grid`
* `keep_areas_operacionais` on `read_municipality`

### Phase 1 — Agent 3
* `code_state` filtering: `read_indigenous_land`, `read_metro_area`, `read_pop_arrangements`, `read_urban_concentrations`, `read_conservation_units`
* Default year 2010 for pop arrangements / urban concentrations

### Phase 1 — Agent 4
* `lookup_muni(year=...)`, fuzzy name match via rapidfuzz
* `list_geobr(wide=)` returns DataFrame
* `read_health_region(geometry_level=, code_state=)`

### Phase 1 — Agent 5
* `output="duckdb"` and `output="arrow"` via `convert_output`

-------------------------------------------------------

# 0.1.10
* Enforces correct data types to certain variables (issue #260)
* Changes package manager to poetry
Expand Down
Loading
Loading