job-scanner

A small, hackable script that scans a curated list of company job boards once a week and sends new matching roles to Slack, Discord, or stdout. Built for a personal job search where you'd rather watch a focused corridor of companies than wade through LinkedIn.

No accounts, no API keys, no paid services. Stdlib Python only.

What this does

Every time it runs, it:

Reads companies.json — the list of companies you actually care about
Hits each company's job board API (Greenhouse, Lever, or Ashby — the three big ATS providers cover most tech companies)
Filters every role by title and location (defaults are tuned for product management; edit the regexes for engineering, design, etc.)
Skips anything you've already seen (SQLite dedup)
Sends the new ones to your notification channel

A typical message looks like:

🎯 Job scan — 3 net-new

Tier 1
• Staff Product Manager, Knowledge — Pinecone · US Remote
• Senior PM, Developer Experience — Modal · Remote

Tier 2
• Group Product Manager, Platform — PostHog · Remote

It runs in roughly 30 seconds against ~30 companies. Once a week is plenty — most companies don't post new roles daily, and you won't miss anything.

Why this instead of a job board

Job boards optimize for breadth — you see every PM role posted in the last 24 hours across everywhere. That's mostly noise.

A curated list optimizes for fit. If you've already decided you want to work at, say, AI infrastructure companies, you only care what those 30 companies are hiring for. That's a much smaller signal that's much easier to act on.

This script is the second kind. Edit the company list to be your shortlist, and the title regex to be your role.

Quick start (5 minutes)

git clone <this repo>
cd job-scanner

# Run it once with the default company list, output to terminal
python3 scan.py --dry-run

# You should see a message with current open roles at the example companies.

That's it. No dependencies to install — only the Python standard library.

To send to Slack instead of the terminal:

cp .env.example .env
# Edit .env, paste in a Slack incoming webhook URL
# (https://api.slack.com/messaging/webhooks — takes 2 minutes)

# Load the env var and run
export $(cat .env | xargs)
python3 scan.py

Customizing for your search

Three files do the work. Edit them.

1. `companies.json` — your shortlist

Each entry needs an ats (one of greenhouse, lever, ashby) and a slug (whatever the company's job board uses). Tier is just a label — group however you want.

{
  "name": "Pinecone",
  "tier": "tier1",
  "ats": "ashby",
  "slug": "pinecone"
}

How to find the slug: most company careers pages link to boards.greenhouse.io/<slug> or jobs.lever.co/<slug> or jobs.ashbyhq.com/<slug>. The slug is the last segment of that URL. If you can't find it, run python3 probe_ats.py after editing the CANDIDATES dict at the top of that file with the company name and a few candidate slugs — it'll tell you which one works.

If a company is on none of those three ATS providers (some big companies use proprietary boards or Workday), set "no_ats": true and the scanner will skip it gracefully.

2. `matchers.py` — what counts as a role you want

The defaults are PM-flavored. Two lists do the filtering:

INCLUDE_PATTERNS — title must match at least one
EXCLUDE_PATTERNS — title must match zero

Both are plain regex. Edit them.

For an engineering search, replace INCLUDE_PATTERNS with something like:

INCLUDE_PATTERNS = [
    r"\bstaff (software )?engineer\b",
    r"\bsenior (software )?engineer\b",
    r"\bprincipal engineer\b",
    r"\bbackend engineer\b",
]

For design:

INCLUDE_PATTERNS = [
    r"\bsenior product designer\b",
    r"\bstaff product designer\b",
    r"\bprincipal designer\b",
    r"\bdesign lead\b",
]

Adjust EXCLUDE_PATTERNS to drop the role family you don't want even if the title accidentally matches (e.g. for engineering, exclude "engineering manager" if you want IC roles only).

3. `matchers.py` again — location filter

LOCATION_EXCLUDES is the kill list. Default is "kill non-US-eligible remote" (Europe-only, EMEA-only, etc.). Flip it for the opposite — kill anything that says "US only" if you're EU-based:

LOCATION_EXCLUDES = [
    r"\bus only\b",
    r"\bunited states only\b",
    r"\bnorth america only\b",
]

Or remove location filtering entirely by setting LOCATION_EXCLUDES = [] and changing location_ok to just return True.

Setting up notifications

Pick one. They're all opt-in via env var.

Slack incoming webhook (recommended)

Easiest. No bot, no OAuth.

Go to https://api.slack.com/messaging/webhooks
Create an incoming webhook for the channel you want messages in
Copy the URL into .env as SLACK_WEBHOOK_URL

The scanner will use it automatically.

Discord webhook

Server Settings → Integrations → Webhooks → New Webhook → Copy URL. Set as DISCORD_WEBHOOK_URL in .env.

Stdout (no env vars set)

If neither webhook URL is set, the scanner prints to stdout. Useful for testing or for cron jobs that pipe the output to email.

Scheduling

The scanner is just a script. Schedule it however you'd schedule any script.

macOS / Linux: cron

# Sunday 6pm
0 18 * * 0 cd /path/to/job-scanner && /usr/bin/python3 scan.py >> scan.log 2>&1

macOS: launchd

If you want it to survive sleep/wake cycles cleanly, launchd is more reliable than cron on Mac. There's a sample com.example.job-scanner.plist you can adapt — see Apple's docs or just paste a launchd plist into ChatGPT and tell it the script path.

GitHub Actions

If you don't have a server to run it on, GitHub Actions can run it on a schedule for free. Create .github/workflows/scan.yml:

name: scan
on:
  schedule:
    - cron: "0 18 * * 0"  # Sundays 6pm UTC
  workflow_dispatch:
jobs:
  run:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - run: python3 scan.py
        env:
          SLACK_WEBHOOK_URL: ${{ secrets.SLACK_WEBHOOK_URL }}
      - uses: stefanzweifel/git-auto-commit-action@v5
        with:
          commit_message: "scan: update seen_jobs.db"
          file_pattern: state/seen_jobs.db

The auto-commit step persists the dedup state back to the repo so the next run doesn't re-alert you on the same jobs. Add SLACK_WEBHOOK_URL (or DISCORD_WEBHOOK_URL) as a repo secret.

CLI reference

python3 scan.py              # full run; sends to notifier, marks jobs seen
python3 scan.py --dry-run    # show matches; no notification, no state change
python3 scan.py --reset      # wipe state/seen_jobs.db (treat all current jobs as new next run)
python3 probe_ats.py         # verify slugs for a list of candidate companies

How it works (architecture)

companies.json  →  sources.py  →  matchers.py  →  state.py  →  notifiers.py
   (your list)     (ATS APIs)     (regex filter)   (dedup)      (Slack/Discord/stdout)

sources.py — one function per ATS provider, all returning the same shape. Adding a new provider is one function.
matchers.py — pure regex over title and location. No ML, no LLM. Easy to debug, easy to tune.
state.py — SQLite at state/seen_jobs.db. Keyed on (company, ats_job_id). Tracks first-seen and last-seen.
notifiers.py — picks Slack/Discord/stdout based on which env var is set.

About 350 lines of Python total. Read it, change it, you understand it.

FAQ

The script said 0 net-new every week. What now?

Either you've already seen everything (your shortlist is small and stable), or your title regex is too tight, or the companies on your list aren't hiring. Try:

python3 scan.py --reset      # wipe dedup, treat everything as new
python3 scan.py --dry-run    # see what's currently active

If --dry-run shows zero matches but the companies have open roles, your INCLUDE_PATTERNS is too restrictive.

A company I want isn't on Greenhouse/Lever/Ashby. Now what?

Some companies use Workday, Lever-Connect, Greenhouse Job Board v2, or a fully custom careers page. This scanner doesn't scrape HTML. Options:

Set "no_ats": true in companies.json and check that company manually
Add a new adapter to sources.py for whatever ATS they use (some are JSON-API-friendly, others aren't)

How do I know what tier to put a company in?

Tiers are just labels for grouping in the output message. Use them however helps you scan. "Tier 1 = dream companies, Tier 2 = strong fit, Tier 3 = backup" is one approach. "By industry" is another. Make them yours.

Will this get me rate-limited?

Greenhouse, Lever, and Ashby all expose public job boards via free APIs that handle modest polling fine. Once a week against ~30 companies is nothing. If you bump it to every hour or expand to hundreds of companies, you might hit limits — back off and stagger.

What if I want LLM-based fit scoring?

Out of scope for this version. The simplest path is to add an optional pass between matchers.py and state.py that calls an LLM to score each match. Pull request welcome.

Limitations

Public ATS only. Companies on private/proprietary boards (Workday, custom careers pages) require a different approach.
Title-and-location filter only. The scanner has no idea whether you'd be a good fit — it just tells you what's posted. Reading and filtering the actual JD is still on you.
No LLM scoring. The defaults are regex-based. That's a feature for transparency and speed, but if you want "rank by fit," this isn't that tool yet.
Companies move ATSes. Helicone got acquired by Mintlify; Replicate by Cloudflare. Slugs change. If you start seeing 0 jobs from a company that should have plenty, run probe_ats.py to find their new slug.

License

MIT. Fork, hack, ship your own.

If you build something interesting on top of this, I'd love to hear about it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

job-scanner

What this does

Why this instead of a job board

Quick start (5 minutes)

Customizing for your search

1. `companies.json` — your shortlist

2. `matchers.py` — what counts as a role you want

3. `matchers.py` again — location filter

Setting up notifications

Slack incoming webhook (recommended)

Discord webhook

Stdout (no env vars set)

Scheduling

macOS / Linux: cron

macOS: launchd

GitHub Actions

CLI reference

How it works (architecture)

FAQ

Limitations

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
companies.json		companies.json
matchers.py		matchers.py
notifiers.py		notifiers.py
probe_ats.py		probe_ats.py
scan.py		scan.py
sources.py		sources.py
state.py		state.py

Folders and files

Latest commit

History

Repository files navigation

job-scanner

What this does

Why this instead of a job board

Quick start (5 minutes)

Customizing for your search

1. companies.json — your shortlist

2. matchers.py — what counts as a role you want

3. matchers.py again — location filter

Setting up notifications

Slack incoming webhook (recommended)

Discord webhook

Stdout (no env vars set)

Scheduling

macOS / Linux: cron

macOS: launchd

GitHub Actions

CLI reference

How it works (architecture)

FAQ

Limitations

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

1. `companies.json` — your shortlist

2. `matchers.py` — what counts as a role you want

3. `matchers.py` again — location filter

Packages