Async Python client for the OpenAlex scholarly works API. Typed, rate-limited, disk-cached, polite-pool-first.
import asyncio
from oalex import Client
async def main() -> None:
async with Client(email="you@example.com") as oa:
works = await oa.search("attention is all you need", per_page=5)
for w in works:
print(w.id, w.title, w.citation_count)
vaswani = await oa.fetch_work("W2626778328")
if vaswani is not None:
print(vaswani.title, "→", len(vaswani.referenced_works), "references")
asyncio.run(main())OpenAlex is a comprehensive open scholarly graph (>240M works, >1B citation edges). The HTTP API is clean — but the polite-pool rules, inverted-index abstract reconstruction, and per-page caps add friction to every greenfield client. oalex packages those concerns once.
- Typed —
WorkandAuthorare frozen dataclasses;Work.rawexposes the full payload for fields the typed surface doesn't cover. - Async —
httpxunder the hood;async with Client(...)for clean teardown. - Polite-pool aware —
emailis required at construction; every request carriesmailto=so you land in the fast pool. - Rate-limited — defaults to 0.1s minimum interval (~10 req/sec, the documented polite-pool tolerance).
- Disk-cached —
~/.cache/oalex/with a 24-hour TTL by default; configurable via constructor kwargs. - Retried — 1s/2s/4s exponential backoff on 429 and 5xx, honoring
Retry-After. - Citation graph —
fetch_referencedandfetch_relatedresolve OpenAlex'sreferenced_works/related_worksarrays into fullWorkrecords, one fan-out per neighbor.
pip install oalexPython 3.10+. Only runtime dependency is httpx.
works = await client.search(
"graph neural networks",
per_page=25,
year_min=2020,
year_max=2024,
)per_page is capped at 200 (OpenAlex's per-page limit); for larger result sets, paginate yourself using OpenAlex's cursor parameter against Client._fetch or roll your own wrapper.
# By OpenAlex ID — three equivalent forms
work = await client.fetch_work("W2626778328")
work = await client.fetch_work("openalex:W2626778328")
# By DOI
work = await client.fetch_doi("10.1038/nature12373")
work = await client.fetch_work("doi:10.1038/nature12373")fetch_work returns None for 404 (the ID doesn't exist in OpenAlex's graph) and raises OalexUnavailable for transient failures so you can distinguish "not found" from "outage."
# Papers this paper cites
refs = await client.fetch_referenced("W2626778328", limit=10)
# Papers OpenAlex considers similar (topic-vector overlap, not citations)
related = await client.fetch_related("W2626778328", limit=10)Individual neighbor failures are skipped silently so one bad reference doesn't poison the batch; the parent fetch's transient errors propagate as OalexUnavailable.
Work.raw is a read-only mapping of the full OpenAlex response. Reach into it for fields the typed surface doesn't expose:
concepts = work.raw.get("concepts", [])
host_venue_id = work.raw.get("host_venue", {}).get("id")Client(
email="you@example.com", # required — polite-pool contact
cache_dir="/var/cache/oalex", # default: ~/.cache/oalex
ttl_seconds=24 * 60 * 60, # default: 24h
min_interval_seconds=0.1, # default: 0.1s (polite-pool tolerance)
timeout=30.0, # default: 30s per request
client=my_httpx_client, # optional: bring your own AsyncClient
)When you pass your own httpx.AsyncClient, oalex won't close it on exit — that's your responsibility.
OalexError— base class for all oalex-defined exceptions.OalexUnavailable— transient failure (network error, 429/5xx after retries, malformed JSON). Always safe to retry.
404s from fetch_work / fetch_doi are NOT raised — they return None so you can distinguish "OpenAlex doesn't know this ID" from "OpenAlex is having a bad day."
git clone https://github.com/Burton-David/oalex
cd oalex
python -m venv .venv && source .venv/bin/activate
pip install -e ".[dev]"
pytest
ruff check oalex tests
mypy oalexExtracted from research-mcp's OpenAlex source adapter. The polite-pool design, retry policy, and disk-cache invariants were settled there first.
MIT — see LICENSE.