Wikipedia publishes hundreds of new articles every day. Finding the ones relevant to tech — without endlessly refreshing a feed full of noise — is the problem AutoWiki solves.
AutoWiki is a zero-config Node.js bot that polls Wikipedia's recentchanges API every 5 minutes and surfaces newly created articles that match five tech domains: AI & Machine Learning, Programming & Software, Hardware & Systems, Web & Cloud, and Data Science. Articles with no tech signal are silently skipped. Matching ones are logged to stdout with a stable permanent link that survives Wikipedia page renames.
git clone https://github.com/johnlester-0369/AutoWiki.git
cd AutoWiki
npm install
npm startThat's it — no API keys, no environment variables, no configuration files. AutoWiki connects to Wikipedia's public API immediately and begins polling.
What you'll see:
⏱ Checking Wikipedia...
🆕 New Tech Article Detected:
Title: Transformer architecture in NLP
Categories: AI & Machine Learning, Programming & Software
URL: https://en.wikipedia.org/?curid=82767441
Time: 2026-03-24T05:12:00Z
--------------------------
⏱ Checking Wikipedia...
Articles that match no tech keywords produce no output — the bot runs silently until something relevant appears.
Tech domain classification runs against the article title at the keyword level. A single article can match multiple domains simultaneously.
| Domain | Example Keywords |
|---|---|
| AI & Machine Learning | artificial intelligence, llm, neural network, transformer, nlp |
| Programming & Software | algorithm, open source, devops, git, computer science |
| Hardware & Systems | gpu, semiconductor, quantum computing, robotics, fpga |
| Web & Cloud | kubernetes, blockchain, cybersecurity, docker, microservices |
| Data Science | data science, big data, pandas, tensorflow, pytorch |
The full keyword lists live in src/config/categories.js. Adding a new domain or keyword requires editing that file only — no detection logic changes.
index.js
└── cron (*/5 * * * *)
└── automator.checkWikipedia()
├── wikiService.fetchNewPages() → Wikipedia recentchanges API
│ rctype=new, rcnamespace=0 (main articles only)
├── [deduplicate] page.timestamp ≤ lastTimestamp → skip
├── urlHelper.toWikipediaUrl(pageid) → https://en.wikipedia.org/?curid=<id>
├── categoryDetector.detectCategory(title) → [...matched domain names]
└── [filter] matchedCategories.length === 0 → skip silently
Fetch — wikiService.fetchNewPages() calls the Wikipedia API for up to 10 newly created main-namespace articles. The rcnamespace=0 parameter is applied at the API layer — Talk, User, and Template pages never consume slots from the 10-result quota.
Deduplicate — lastTimestamp is a module-scoped watermark. Any article with a timestamp at or before the watermark was already reported in a prior tick and is skipped without re-evaluating keywords. The watermark advances to the newest article's timestamp after every run.
Classify — categoryDetector.detectCategory(title) lowercases the title and checks it against every keyword list in TECH_CATEGORIES. All matching domain names are returned. The check is a simple String.includes() — substring matching by design, so "tech" catches "technical", "technology", and "technician" without requiring exhaustive enumeration.
Report — Articles with at least one matched category are logged to stdout. The permanent link uses ?curid=<pageid> rather than a title-based URL — curid links survive Wikipedia page renames and are the canonical stable reference.
Polling interval — 5 minutes matches Wikipedia's recentchanges update frequency closely enough to catch new articles promptly without placing unnecessary load on the public API.
Tech domains and their keyword lists live entirely in src/config/categories.js:
export const TECH_CATEGORIES = {
"AI & Machine Learning": ["artificial intelligence", "llm", "neural network", ...],
"Programming & Software": ["algorithm", "open source", "devops", ...],
"Hardware & Systems": ["gpu", "semiconductor", "robotics", ...],
"Web & Cloud": ["kubernetes", "blockchain", "cybersecurity", ...],
"Data Science": ["pandas", "tensorflow", "big data", ...]
};To add a keyword — append it to the relevant array. No other file needs to change.
To add a new domain — add a new key with its keyword array. The detection loop in categoryDetector.js iterates Object.entries(TECH_CATEGORIES) dynamically, so the new domain is picked up automatically.
To change the polling interval — edit the cron expression in index.js line 6. The expression */5 * * * * means "every 5 minutes." crontab.guru is a useful reference for building alternative schedules.
AutoWiki/
├── index.js # Entry point — schedules cron + fires first run immediately
├── package.json
├── vitest.config.js # Test runner config — scoped to tests/
│
├── src/
│ ├── config/
│ │ └── categories.js # Keyword lists per tech domain (data-only, edit here to add domains)
│ ├── core/
│ │ └── automator.js # Core loop: fetch → deduplicate → classify → report
│ ├── services/
│ │ └── wikiService.js # Wikipedia API client (axios)
│ └── utils/
│ ├── categoryDetector.js # Keyword scanner — returns all matched domain names
│ └── urlHelper.js # Converts pageid → stable curid URL
│
└── tests/
├── automator.test.js
├── categoryDetector.test.js
├── urlHelper.test.js
└── wikiService.test.js
Watch mode (re-runs on file save — use during development):
npm testSingle run (CI / pre-commit):
npm run test:runAll external dependencies (axios, wikiService, categoryDetector) are mocked via Vitest's vi.mock(). Tests run fully offline — no network connection required.
| Test file | Coverage |
|---|---|
automator.test.js |
Deduplication watermark, category-miss skipping, timestamp advancement, console output |
categoryDetector.test.js |
Keyword matching, multi-category detection, case-insensitivity, substring matching behavior |
urlHelper.test.js |
curid URL construction, large pageids, uniqueness |
wikiService.test.js |
Successful API response parsing, rcnamespace=0 param, error fallback to empty array |
- Node.js v18 or later — ESM module support (
import/export) is required - npm v9 or later
| Package | Version | Purpose |
|---|---|---|
axios |
^1.13.6 | HTTP client for Wikipedia API requests |
node-cron |
^4.2.1 | Cron scheduler for 5-minute polling interval |
vitest |
^4.1.1 | Test runner (dev only) |
ISC