Skip to content

johnlester-0369/AutoWiki

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

25 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AutoWiki

Wikipedia publishes hundreds of new articles every day. Finding the ones relevant to tech — without endlessly refreshing a feed full of noise — is the problem AutoWiki solves.

License: ISC Node.js Tested with Vitest

AutoWiki is a zero-config Node.js bot that polls Wikipedia's recentchanges API every 5 minutes and surfaces newly created articles that match five tech domains: AI & Machine Learning, Programming & Software, Hardware & Systems, Web & Cloud, and Data Science. Articles with no tech signal are silently skipped. Matching ones are logged to stdout with a stable permanent link that survives Wikipedia page renames.


Get Running in 60 Seconds

git clone https://github.com/johnlester-0369/AutoWiki.git
cd AutoWiki
npm install
npm start

That's it — no API keys, no environment variables, no configuration files. AutoWiki connects to Wikipedia's public API immediately and begins polling.

What you'll see:

⏱ Checking Wikipedia...
🆕 New Tech Article Detected:
  Title:      Transformer architecture in NLP
  Categories: AI & Machine Learning, Programming & Software
  URL:        https://en.wikipedia.org/?curid=82767441
  Time:       2026-03-24T05:12:00Z
--------------------------
⏱ Checking Wikipedia...

Articles that match no tech keywords produce no output — the bot runs silently until something relevant appears.


What AutoWiki Detects

Tech domain classification runs against the article title at the keyword level. A single article can match multiple domains simultaneously.

Domain Example Keywords
AI & Machine Learning artificial intelligence, llm, neural network, transformer, nlp
Programming & Software algorithm, open source, devops, git, computer science
Hardware & Systems gpu, semiconductor, quantum computing, robotics, fpga
Web & Cloud kubernetes, blockchain, cybersecurity, docker, microservices
Data Science data science, big data, pandas, tensorflow, pytorch

The full keyword lists live in src/config/categories.js. Adding a new domain or keyword requires editing that file only — no detection logic changes.


How It Works

index.js
  └── cron (*/5 * * * *)
        └── automator.checkWikipedia()
              ├── wikiService.fetchNewPages()       → Wikipedia recentchanges API
              │                                        rctype=new, rcnamespace=0 (main articles only)
              ├── [deduplicate] page.timestamp ≤ lastTimestamp → skip
              ├── urlHelper.toWikipediaUrl(pageid)  → https://en.wikipedia.org/?curid=<id>
              ├── categoryDetector.detectCategory(title) → [...matched domain names]
              └── [filter] matchedCategories.length === 0 → skip silently

FetchwikiService.fetchNewPages() calls the Wikipedia API for up to 10 newly created main-namespace articles. The rcnamespace=0 parameter is applied at the API layer — Talk, User, and Template pages never consume slots from the 10-result quota.

DeduplicatelastTimestamp is a module-scoped watermark. Any article with a timestamp at or before the watermark was already reported in a prior tick and is skipped without re-evaluating keywords. The watermark advances to the newest article's timestamp after every run.

ClassifycategoryDetector.detectCategory(title) lowercases the title and checks it against every keyword list in TECH_CATEGORIES. All matching domain names are returned. The check is a simple String.includes() — substring matching by design, so "tech" catches "technical", "technology", and "technician" without requiring exhaustive enumeration.

Report — Articles with at least one matched category are logged to stdout. The permanent link uses ?curid=<pageid> rather than a title-based URL — curid links survive Wikipedia page renames and are the canonical stable reference.

Polling interval — 5 minutes matches Wikipedia's recentchanges update frequency closely enough to catch new articles promptly without placing unnecessary load on the public API.


Configuration

Tech domains and their keyword lists live entirely in src/config/categories.js:

export const TECH_CATEGORIES = {
    "AI & Machine Learning": ["artificial intelligence", "llm", "neural network", ...],
    "Programming & Software": ["algorithm", "open source", "devops", ...],
    "Hardware & Systems":     ["gpu", "semiconductor", "robotics", ...],
    "Web & Cloud":            ["kubernetes", "blockchain", "cybersecurity", ...],
    "Data Science":           ["pandas", "tensorflow", "big data", ...]
};

To add a keyword — append it to the relevant array. No other file needs to change.

To add a new domain — add a new key with its keyword array. The detection loop in categoryDetector.js iterates Object.entries(TECH_CATEGORIES) dynamically, so the new domain is picked up automatically.

To change the polling interval — edit the cron expression in index.js line 6. The expression */5 * * * * means "every 5 minutes." crontab.guru is a useful reference for building alternative schedules.


Project Structure

AutoWiki/
├── index.js                        # Entry point — schedules cron + fires first run immediately
├── package.json
├── vitest.config.js                # Test runner config — scoped to tests/
│
├── src/
│   ├── config/
│   │   └── categories.js           # Keyword lists per tech domain (data-only, edit here to add domains)
│   ├── core/
│   │   └── automator.js            # Core loop: fetch → deduplicate → classify → report
│   ├── services/
│   │   └── wikiService.js          # Wikipedia API client (axios)
│   └── utils/
│       ├── categoryDetector.js     # Keyword scanner — returns all matched domain names
│       └── urlHelper.js            # Converts pageid → stable curid URL
│
└── tests/
    ├── automator.test.js
    ├── categoryDetector.test.js
    ├── urlHelper.test.js
    └── wikiService.test.js

Testing

Watch mode (re-runs on file save — use during development):

npm test

Single run (CI / pre-commit):

npm run test:run

All external dependencies (axios, wikiService, categoryDetector) are mocked via Vitest's vi.mock(). Tests run fully offline — no network connection required.

Test file Coverage
automator.test.js Deduplication watermark, category-miss skipping, timestamp advancement, console output
categoryDetector.test.js Keyword matching, multi-category detection, case-insensitivity, substring matching behavior
urlHelper.test.js curid URL construction, large pageids, uniqueness
wikiService.test.js Successful API response parsing, rcnamespace=0 param, error fallback to empty array

Prerequisites

  • Node.js v18 or later — ESM module support (import/export) is required
  • npm v9 or later

Dependencies

Package Version Purpose
axios ^1.13.6 HTTP client for Wikipedia API requests
node-cron ^4.2.1 Cron scheduler for 5-minute polling interval
vitest ^4.1.1 Test runner (dev only)

License

ISC

About

AutoWiki is a zero-config Node.js bot that polls Wikipedia's recentchanges API every 5 minutes.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors