Skip to content

taeyun16/bgpframe

Repository files navigation

bgpframe

Rust 2024 Edition Python 3.12+ PyO3 0.27 Polars 1.36+ GitHub Repo Default Branch Project Scope Package Type Status Rust tests Python tests Doctest Coverage Type Check

English documentation. Korean version: docs/README.ko.md

A Rust-based MRT (BGP) parsing + Parquet processing library. You can build and use it directly from Python with maturin, and write concise prefix containment queries with Polars expressions.

Key Highlights

  • Fast parsing: converts MRT to Parquet using bgpkit-parser + Rust implementation.
  • Memory reuse: reduces allocation/copy overhead with batch buffer swap during flush.
  • Rust scan filter: parquet_filter_updates for fast direct filtering/writing on large Parquet files.
  • Python-friendly API: prefix/IP containment + BGP-specific filters (announce, origin, as_path).
  • Typed API: includes .pyi stubs (_core.pyi, polars_utils.pyi, __init__.pyi).

Installation

Install from PyPI:

pip install bgpframe

If you use the Polars helper expressions:

pip install "bgpframe[polars]"

Using uv:

uv add bgpframe
# or
uv add "bgpframe[polars]"

Requirements

  • Rust stable toolchain
  • Python 3.12+
  • uv
  • maturin

Development Build

uv venv
source .venv/bin/activate
uv pip install maturin
maturin develop
python -c "import bgpframe; print(bgpframe.hello())"

Run without activating the virtual environment:

uv run -- maturin develop
uv run python -c "import bgpframe; print(bgpframe.hello())"

Examples

1) MRT -> Parquet

import bgpframe

bgpframe.mrt_to_parquet(
    "https://data.ris.ripe.net/rrc00/latest-update.gz",
    "rrc00_latest.parquet",
    limit=200_000,      # optional
    batch_size=100_000, # optional
)

2) Prefix containment query (IPv4/IPv6)

import bgpframe
import polars as pl

df = pl.read_parquet("rrc00_latest.parquet")
res = df.filter(bgpframe.contains_prefix_expr("8.8.8.8"))
print(res.head())

3) Filter large Parquet and write output

import bgpframe

matched = bgpframe.parquet_contains_ip(
    "rrc00_latest.parquet",
    "2001:4860:4860::8888",
    output="rrc00_latest_match_google_dns_v6.parquet",
    limit=100_000,  # optional
)
print("matched rows:", matched)

4) Combined BGP convenience filters

import bgpframe
import polars as pl

df = pl.read_parquet("rrc00_latest.parquet")

# announce + origin AS 15169 + AS_PATH includes 3356 + path length 2..5
res = bgpframe.filter_bgp_updates(
    df,
    elem_type="announce",
    origin_asn=15169,
    as_path_contains=3356,
    min_as_path_len=2,
    max_as_path_len=5,
)

# exact prefix match (host bits are normalized with strict=False behavior)
exact = df.filter(bgpframe.prefix_exact_expr("2001:4860:4860::8888/32"))

5) Rust high-speed scan filter (file -> file)

import bgpframe

matched = bgpframe.parquet_filter_updates(
    "rrc00_latest.parquet",
    output="rrc00_latest_updates_filtered.parquet",
    contains_ip="8.8.8.8",
    elem_type="announce",
    origin_asn=15169,
    as_path_contains=3356,
    min_as_path_len=2,
    max_as_path_len=8,
    limit=50_000,
)
print("matched rows:", matched)

The same code is available at example/parquet_filter_updates.py.

Recommended Query Patterns for BGP Data

  • Split event types: announce_expr(), withdraw_expr()
  • Analyze route origin: origin_asn_expr(asn)
  • Track transit/upstream ASN: as_path_contains_expr(asn)
  • Find policy/risk signals: as_path_len_between_expr(min_len=..., max_len=...)
  • Exact prefix comparisons: prefix_exact_expr("x.x.x.x/len")
  • Apply combined filters once: filter_bgp_updates(...)
  • Direct Parquet processing: parquet_filter_updates(...)

Testing / Quality Gates

Results below are from local runs on 2026-03-01 (Asia/Seoul).

  • Rust unit tests: 7 passed
  • Rust doc tests: 0 failed
  • Python regression tests (unittest): 6 passed
  • Python doctest: 4 passed
  • Coverage (Python): 93%
  • Type check (pyrefly): 0 errors

Run commands:

# One-time workaround if cargo test has macOS + Homebrew Python framework link issue
mkdir -p /tmp/Python3.framework/Versions/3.9
ln -sf /opt/homebrew/Frameworks/Python.framework/Versions/Current/Python /tmp/Python3.framework/Versions/3.9/Python3

# Rust tests
DYLD_FRAMEWORK_PATH=/tmp cargo test --lib
DYLD_FRAMEWORK_PATH=/tmp cargo test --doc

# Python tests + doctest
uv run python -m unittest -v tests.test_regression
uv run python -m doctest -v src/bgpframe/polars_utils.py

# Coverage
uv run coverage erase
uv run coverage run -m unittest tests.test_regression
uv run coverage run -a -m doctest src/bgpframe/polars_utils.py
uv run coverage report

# Type check
uv run pyrefly check

CI/CD and PyPI Publishing

  • CI workflow: .github/workflows/ci.yml
    • Trigger: push to main, pull requests
    • Runs: Rust tests, Python regression tests, doctest, type checks
  • Automated release workflow: .github/workflows/release-please.yml
    • Trigger: push to main (or manual dispatch)
    • Creates/updates a Release PR, updates versions (pyproject.toml, Cargo.toml), and publishes a GitHub Release
  • Release workflow: .github/workflows/publish-pypi.yml
    • Trigger: GitHub Release (published/released) or manual dispatch
    • Builds: wheels (ubuntu/macos/windows) + sdist
    • Publishes: PyPI via Trusted Publishing (OIDC)

Required setup for PyPI release

  1. Configure a PyPI Trusted Publisher for this project.
  2. In PyPI Trusted Publisher settings, use:
    • Owner: taeyun16
    • Repository: bgpframe
    • Workflow filename: publish-pypi.yml
    • Environment name: pypi
  3. In GitHub, create environment pypi (Settings -> Environments).
  4. Create a GitHub Release (for example tag v0.1.0) to trigger publish.

With Trusted Publishing, you do not need a long-lived PYPI_API_TOKEN secret.

Automatic release flow

  1. Merge commits into main (use Conventional Commit prefixes like feat:, fix:, docs:).
  2. release-please opens/updates a Release PR with version/changelog changes.
  3. Merge the Release PR.
  4. release-please creates a GitHub Release.
  5. publish-pypi.yml runs and uploads artifacts to PyPI.

Schema Summary

The schema is normalized to numeric/list columns and minimizes string fields.

Column Type Description
timestamp i64 Unix timestamp in seconds
elem_type u32 announce=1, withdraw=0
peer_ip_ver u32 4 or 6
peer_ip_v4 u32? Present only for IPv4 peers
peer_ip_v6_hi u64? Upper 64 bits of IPv6
peer_ip_v6_lo u64? Lower 64 bits of IPv6
peer_asn u32 Peer ASN
prefix_ver u32 4 or 6
prefix_v4 u32? IPv4 prefix
prefix_v6_hi u64? Upper 64 bits of IPv6
prefix_v6_lo u64? Lower 64 bits of IPv6
prefix_len u32 Prefix length
prefix_end_v4 u32? IPv4 range end (query acceleration)
next_hop_ver u32? 4 or 6
next_hop_v4 u32? IPv4 next hop
next_hop_v6_hi u64? Upper 64 bits of IPv6
next_hop_v6_lo u64? Lower 64 bits of IPv6
as_path list Flattened AS_PATH
as_path_len u32 Route length
has_as_set bool Contains AS_SET/CONFED_SET
origin_asn u32? Present only for a single origin ASN
local_pref u32? local-pref
med u32? MED

About

High-performance Rust + Python toolkit for parsing BGP MRT data into Parquet and running fast prefix/AS-path/origin ASN filters.

Topics

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors