A standalone, lightweight web scraping API to extract markdown content and categorized links from any URL.
Inspired by deepcrawl.
Deployable anywhere: Cloud Run, Vercel Serverless, Cloudflare Workers, or bare Node.js.
# Install dependencies
npm install
# Set your API key
export ONCRAWL_API_KEY=your-secret-key
# Start the server
npm run devThe server starts on http://localhost:3000.
curl http://localhost:3000/Extract markdown content and metadata from a URL.
GET (returns markdown text):
curl -H "Authorization: Bearer your-api-key" \
"http://localhost:3000/read?url=https://example.com"POST (returns JSON with options):
curl -X POST -H "Authorization: Bearer your-api-key" \
-H "Content-Type: application/json" \
-d '{"url":"https://example.com","metadata":true}' \
http://localhost:3000/readPOST body options:
| Field | Type | Default | Description |
|---|---|---|---|
url |
string | required | URL to scrape |
markdown |
boolean | true | Include markdown content |
cleanedHtml |
boolean | false | Include cleaned HTML |
rawHtml |
boolean | false | Include raw HTML |
metadata |
boolean | true | Include page metadata |
Extract and categorize links from a URL.
GET (returns internal links):
curl -H "Authorization: Bearer your-api-key" \
"http://localhost:3000/links?url=https://example.com"POST (returns JSON with options):
curl -X POST -H "Authorization: Bearer your-api-key" \
-H "Content-Type: application/json" \
-d '{"url":"https://example.com","includeExternal":true,"includeMedia":true}' \
http://localhost:3000/linksPOST body options:
| Field | Type | Default | Description |
|---|---|---|---|
url |
string | required | URL to extract links from |
includeExternal |
boolean | false | Include external links |
includeMedia |
boolean | false | Include media (images, videos, documents) |
metadata |
boolean | true | Include page metadata |
| Variable | Description | Required |
|---|---|---|
ONCRAWL_API_KEY |
API key for authentication | Yes |
PORT |
Server port (default: 3000) | No |
The fastest way to deploy is using the Deploy to Vercel button. It will automatically prompt you for the required ONCRAWL_API_KEY.
Oncrawl is built with Hono and is automatically detected as a Hono project by Vercel.
vercelgcloud run deploy oncrawl \
--source . \
--set-env-vars ONCRAWL_API_KEY=your-keyMIT