jobhive

The open dataset and toolkit for global job market data. 3.3M+ live jobs from 400 000+ companies, scraped directly from the ATS platforms where companies actually post. No LinkedIn, no reposts, no recruiters.

from jobhive import search

df = search(query="ml engineer", location="Paris", remote=True)

No API key, no auth, no rate limits. The dataset refreshes every 24 hours.

Why jobhive

Most job aggregators scrape LinkedIn and Indeed — both full of duplicates, ghost listings, and reposts. jobhive goes one layer down: directly to the ATS platforms (Greenhouse, Lever, Ashby, Workday, BambooHR…) where companies actually post.

Single source of truth — every row comes from the company's own ATS, so titles, locations, and salaries are accurate.
No duplicates — one ATS posting = one row.
Structured salary when the ATS exposes it (Ashby, Greenhouse Pay Transparency, Lever salaryRange, etc.).
MIT licensed, fully open — fork the dataset, fork the scrapers.

Coverage

Metric	Value
Live jobs	3 376 000+
Companies	406 000+
ATS platforms	31

Top 10 by job count:

ATS	Jobs
Bundesagentur (DE public-sector)	931 049
Workday	653 041
EURES (EU/EEA public-sector)	626 783
SmartRecruiters	213 372
SuccessFactors	180 499
Greenhouse	110 071
Oracle HCM	107 464
iCIMS	92 211
Lever	60 342
Phenom	56 483

Counts come from the live manifest at https://storage.stapply.ai/jobhive/v1/manifest.json — verify any time with jobhive list-ats.

Install

pip install jobhive-py

Distributed as jobhive-py on PyPI; the import name is still jobhive.

Optional extras:

pip install "jobhive-py[parquet]"     # faster downloads via Apache Parquet
pip install "jobhive-py[scrapers]"    # build your own pipeline
pip install "jobhive-py[all]"

Two ways to use it

1. Query the public dataset

from jobhive import search

# Free-text title + location + remote filter
df = search(query="rust", location="Berlin", remote=True, salary_min=80_000)

# Restrict to one ATS slice (smaller download)
df = search(query="data engineer", ats="ashby")

# Pandas all the way down
df.groupby("company").size().sort_values(ascending=False).head(20)

Every row carries:

global_id, url, title, company, ats_type, ats_id,
location, country_iso, region, is_remote, lat, lon,
salary_min, salary_max, salary_currency, salary_period, salary_summary,
employment_type, commitment, experience, department, team,
description, posted_at, fetched_at, language,
requisition_id, apply_url, raw

Full per-field semantics (types, defaults, derivation rules, examples) live in JOB_SCHEMA.md. global_id is the cross-ATS unique key in the form {ats_type}:{ats_id}. Optional fields are None when the source ATS doesn't expose them; raw keeps any provider-specific fields the canonical schema doesn't represent.

2. Scrape your own companies

from jobhive.scrapers import GreenhouseScraper, LeverScraper, AshbyScraper

jobs = GreenhouseScraper("anthropic").fetch()    # → list[Job]
jobs = LeverScraper("palantir").fetch()
jobs = AshbyScraper("openai").fetch()

Or pick by name:

from jobhive.scrapers import get_scraper

scraper = get_scraper("ashby", "openai")

Scrapers

Multi-tenant ATS (pass the company's slug on that ATS):

Greenhouse, Lever, Ashby, SmartRecruiters, Workable, Rippling, Personio, Gem, JoinCom, iCIMS, JazzHR, Breezy, Teamtailor, Pinpoint, BambooHR, Cornerstone, Recruitee, Recruiterbox, Eightfold, Avature, Phenom, Workday, Oracle, SuccessFactors, Taleo, Mercor.

Custom big-tech APIs (single-tenant, slug ignored): Amazon, Apple, Google, TikTok, Uber.

National public-sector aggregators: Bundesagentur (DE), Arbetsformedlingen (SE), Eures (EU/EEA-wide).

Hybrid jobboards: WelcomeToTheJungle.

Browser-required (run via Browserbase remote sessions): Meta, Tesla. Set JOBHIVE_USE_BROWSERBASE=1 together with BROWSERBASE_API_KEY and BROWSERBASE_PROJECT_ID to enable; without those env vars the scrapers log a warning and skip. Tesla also needs a Browserbase project that bypasses Akamai (default sessions are currently 403'd).

CLI

jobhive search "platform engineer" --location Paris --limit 20
jobhive scrape ashby openai
jobhive list-ats

Contributing

The goal is the largest open-source live job dataset on the internet. That's a forever project, and there's a clear path to make it bigger:

Add a new ATS scraper — every ATS we don't cover yet is a few thousand companies missing from the dataset. The scraper API is intentionally tiny: subclass BaseScraper, set ats, implement fetch(). See any file under src/jobhive/scrapers/ for a 50-line reference, and the Job model in src/jobhive/models.py for the schema you populate.
Improve coverage on an existing ATS — many scrapers extract description / salary / employment-type only when the ATS surfaces them. If you find a tenant where a field is structurally available but we're missing it, a one-line PR is welcome.
Add new tenants — every supported ATS has a CSV under ats-companies/. New rows = new companies in the dataset. One-line PRs are welcome.
Report broken scrapers — open an issue with the slug and the failure mode. ATS APIs drift; flagging a regression early keeps the dataset accurate for everyone.

git clone https://github.com/stapply-ai/ats-scrapers
cd ats-scrapers
uv pip install -e ".[dev,scrapers]"
pytest
ruff check .

PRs welcome on main. CI is green for all 6 of {3.11, 3.12, 3.13} × {ubuntu, macos}; please keep it that way.

License

MIT.

Acknowledgments

Built with Reverse API Engineer.

Name		Name	Last commit message	Last commit date
Latest commit History 342 Commits
.github		.github
assets		assets
ats-companies		ats-companies
examples		examples
scripts		scripts
src/jobhive		src/jobhive
tests		tests
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
JOB_SCHEMA.md		JOB_SCHEMA.md
LICENSE		LICENSE
README.md		README.md
provider-description-matrix.md		provider-description-matrix.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

jobhive

Why jobhive

Coverage

Install

Two ways to use it

1. Query the public dataset

2. Scrape your own companies

Scrapers

CLI

Contributing

License

Acknowledgments

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

jobhive

Why jobhive

Coverage

Install

Two ways to use it

1. Query the public dataset

2. Scrape your own companies

Scrapers

CLI

Contributing

License

Acknowledgments

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages