A small, hackable script that scans a curated list of company job boards once a week and sends new matching roles to Slack, Discord, or stdout. Built for a personal job search where you'd rather watch a focused corridor of companies than wade through LinkedIn.
No accounts, no API keys, no paid services. Stdlib Python only.
Every time it runs, it:
- Reads
companies.jsonβ the list of companies you actually care about - Hits each company's job board API (Greenhouse, Lever, or Ashby β the three big ATS providers cover most tech companies)
- Filters every role by title and location (defaults are tuned for product management; edit the regexes for engineering, design, etc.)
- Skips anything you've already seen (SQLite dedup)
- Sends the new ones to your notification channel
A typical message looks like:
π― Job scan β 3 net-new
Tier 1
β’ Staff Product Manager, Knowledge β Pinecone Β· US Remote
β’ Senior PM, Developer Experience β Modal Β· Remote
Tier 2
β’ Group Product Manager, Platform β PostHog Β· Remote
It runs in roughly 30 seconds against ~30 companies. Once a week is plenty β most companies don't post new roles daily, and you won't miss anything.
Job boards optimize for breadth β you see every PM role posted in the last 24 hours across everywhere. That's mostly noise.
A curated list optimizes for fit. If you've already decided you want to work at, say, AI infrastructure companies, you only care what those 30 companies are hiring for. That's a much smaller signal that's much easier to act on.
This script is the second kind. Edit the company list to be your shortlist, and the title regex to be your role.
git clone <this repo>
cd job-scanner
# Run it once with the default company list, output to terminal
python3 scan.py --dry-run
# You should see a message with current open roles at the example companies.That's it. No dependencies to install β only the Python standard library.
To send to Slack instead of the terminal:
cp .env.example .env
# Edit .env, paste in a Slack incoming webhook URL
# (https://api.slack.com/messaging/webhooks β takes 2 minutes)
# Load the env var and run
export $(cat .env | xargs)
python3 scan.pyThree files do the work. Edit them.
Each entry needs an ats (one of greenhouse, lever, ashby) and a slug (whatever the company's job board uses). Tier is just a label β group however you want.
{
"name": "Pinecone",
"tier": "tier1",
"ats": "ashby",
"slug": "pinecone"
}How to find the slug: most company careers pages link to boards.greenhouse.io/<slug> or jobs.lever.co/<slug> or jobs.ashbyhq.com/<slug>. The slug is the last segment of that URL. If you can't find it, run python3 probe_ats.py after editing the CANDIDATES dict at the top of that file with the company name and a few candidate slugs β it'll tell you which one works.
If a company is on none of those three ATS providers (some big companies use proprietary boards or Workday), set "no_ats": true and the scanner will skip it gracefully.
The defaults are PM-flavored. Two lists do the filtering:
INCLUDE_PATTERNSβ title must match at least oneEXCLUDE_PATTERNSβ title must match zero
Both are plain regex. Edit them.
For an engineering search, replace INCLUDE_PATTERNS with something like:
INCLUDE_PATTERNS = [
r"\bstaff (software )?engineer\b",
r"\bsenior (software )?engineer\b",
r"\bprincipal engineer\b",
r"\bbackend engineer\b",
]For design:
INCLUDE_PATTERNS = [
r"\bsenior product designer\b",
r"\bstaff product designer\b",
r"\bprincipal designer\b",
r"\bdesign lead\b",
]Adjust EXCLUDE_PATTERNS to drop the role family you don't want even if the title accidentally matches (e.g. for engineering, exclude "engineering manager" if you want IC roles only).
LOCATION_EXCLUDES is the kill list. Default is "kill non-US-eligible remote" (Europe-only, EMEA-only, etc.). Flip it for the opposite β kill anything that says "US only" if you're EU-based:
LOCATION_EXCLUDES = [
r"\bus only\b",
r"\bunited states only\b",
r"\bnorth america only\b",
]Or remove location filtering entirely by setting LOCATION_EXCLUDES = [] and changing location_ok to just return True.
Pick one. They're all opt-in via env var.
Easiest. No bot, no OAuth.
- Go to https://api.slack.com/messaging/webhooks
- Create an incoming webhook for the channel you want messages in
- Copy the URL into
.envasSLACK_WEBHOOK_URL
The scanner will use it automatically.
Server Settings β Integrations β Webhooks β New Webhook β Copy URL. Set as DISCORD_WEBHOOK_URL in .env.
If neither webhook URL is set, the scanner prints to stdout. Useful for testing or for cron jobs that pipe the output to email.
The scanner is just a script. Schedule it however you'd schedule any script.
# Sunday 6pm
0 18 * * 0 cd /path/to/job-scanner && /usr/bin/python3 scan.py >> scan.log 2>&1If you want it to survive sleep/wake cycles cleanly, launchd is more reliable than cron on Mac. There's a sample com.example.job-scanner.plist you can adapt β see Apple's docs or just paste a launchd plist into ChatGPT and tell it the script path.
If you don't have a server to run it on, GitHub Actions can run it on a schedule for free. Create .github/workflows/scan.yml:
name: scan
on:
schedule:
- cron: "0 18 * * 0" # Sundays 6pm UTC
workflow_dispatch:
jobs:
run:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- run: python3 scan.py
env:
SLACK_WEBHOOK_URL: ${{ secrets.SLACK_WEBHOOK_URL }}
- uses: stefanzweifel/git-auto-commit-action@v5
with:
commit_message: "scan: update seen_jobs.db"
file_pattern: state/seen_jobs.dbThe auto-commit step persists the dedup state back to the repo so the next run doesn't re-alert you on the same jobs. Add SLACK_WEBHOOK_URL (or DISCORD_WEBHOOK_URL) as a repo secret.
python3 scan.py # full run; sends to notifier, marks jobs seen
python3 scan.py --dry-run # show matches; no notification, no state change
python3 scan.py --reset # wipe state/seen_jobs.db (treat all current jobs as new next run)
python3 probe_ats.py # verify slugs for a list of candidate companiescompanies.json β sources.py β matchers.py β state.py β notifiers.py
(your list) (ATS APIs) (regex filter) (dedup) (Slack/Discord/stdout)
- sources.py β one function per ATS provider, all returning the same shape. Adding a new provider is one function.
- matchers.py β pure regex over title and location. No ML, no LLM. Easy to debug, easy to tune.
- state.py β SQLite at
state/seen_jobs.db. Keyed on(company, ats_job_id). Tracks first-seen and last-seen. - notifiers.py β picks Slack/Discord/stdout based on which env var is set.
About 350 lines of Python total. Read it, change it, you understand it.
The script said 0 net-new every week. What now?
Either you've already seen everything (your shortlist is small and stable), or your title regex is too tight, or the companies on your list aren't hiring. Try:
python3 scan.py --reset # wipe dedup, treat everything as new
python3 scan.py --dry-run # see what's currently activeIf --dry-run shows zero matches but the companies have open roles, your INCLUDE_PATTERNS is too restrictive.
A company I want isn't on Greenhouse/Lever/Ashby. Now what?
Some companies use Workday, Lever-Connect, Greenhouse Job Board v2, or a fully custom careers page. This scanner doesn't scrape HTML. Options:
- Set
"no_ats": trueincompanies.jsonand check that company manually - Add a new adapter to
sources.pyfor whatever ATS they use (some are JSON-API-friendly, others aren't)
How do I know what tier to put a company in?
Tiers are just labels for grouping in the output message. Use them however helps you scan. "Tier 1 = dream companies, Tier 2 = strong fit, Tier 3 = backup" is one approach. "By industry" is another. Make them yours.
Will this get me rate-limited?
Greenhouse, Lever, and Ashby all expose public job boards via free APIs that handle modest polling fine. Once a week against ~30 companies is nothing. If you bump it to every hour or expand to hundreds of companies, you might hit limits β back off and stagger.
What if I want LLM-based fit scoring?
Out of scope for this version. The simplest path is to add an optional pass between matchers.py and state.py that calls an LLM to score each match. Pull request welcome.
- Public ATS only. Companies on private/proprietary boards (Workday, custom careers pages) require a different approach.
- Title-and-location filter only. The scanner has no idea whether you'd be a good fit β it just tells you what's posted. Reading and filtering the actual JD is still on you.
- No LLM scoring. The defaults are regex-based. That's a feature for transparency and speed, but if you want "rank by fit," this isn't that tool yet.
- Companies move ATSes. Helicone got acquired by Mintlify; Replicate by Cloudflare. Slugs change. If you start seeing 0 jobs from a company that should have plenty, run
probe_ats.pyto find their new slug.
MIT. Fork, hack, ship your own.
If you build something interesting on top of this, I'd love to hear about it.