The Memory of Your Internet — Archive Everything You Browse.
English | 中文
A self-hosted personal web archiving system that automatically captures and preserves web pages you visit in Chrome — HTML, CSS, JavaScript, images, and all. When the original page goes offline, you can still browse your archived copy with styles and layout intact.
Chrome + Tampermonkey ──HTTP POST──▶ Go Server ──▶ PostgreSQL (metadata)
(auto-capture on │ + File System (assets)
page load) │
▼
Web UI ──▶ Browse / Search / Replay
- A Tampermonkey userscript runs in your browser, automatically capturing the full DOM and resources once the page finishes loading. If significant DOM changes occur afterward, it submits one additional update.
- The Go server receives the snapshot, downloads any cross-origin resources the browser couldn't fetch, deduplicates everything by content hash, and stores it locally.
- A built-in Web UI lets you list, search, and replay any archived page — fully offline, no external dependencies.
- High-fidelity replay — CSSOM serialization, computed styles inlining, and anti-refresh protection reproduce pages as close to the original as possible
- Full-page capture — HTML, CSS, JS, images, fonts; resource URLs are rewritten to local paths
- Cross-origin resource recovery — server-side extraction and download of resources blocked by CORS
- Content-hash deduplication — identical resources shared across pages are stored only once (SHA-256)
- Version history — same URL archived multiple times, distinguished by timestamp
- Timeline view — browse all snapshots of a URL on a visual timeline (like web.archive.org), with prev/next navigation between snapshots
- Smart dedup — session-level + server-level dedup prevents redundant captures; content-hash comparison skips unchanged pages
- Dynamic content support — captures the live DOM state; MutationObserver triggers one auto-update if significant changes occur after initial capture
- SPA-aware — detects SPA navigation, resets capture state per route
- Anti-refresh protection — archived pages are frozen: timers, WebSockets, and navigation APIs are neutralized
- Web UI — responsive interface to browse, full-text search (page content, URL, and title), filter by date range and domain, and replay archived pages
- RESTful API — programmatic access to all archiving and query operations
- PostgreSQL 14+
- Chrome or Firefox + Tampermonkey extension (v5.3+)
The fastest way to get started. Docker Compose will set up both the server and PostgreSQL automatically.
# Clone the repository
git clone https://github.com/icodeface/wayback-archiver.git
cd wayback-archiver
# Start all services
docker compose up -d
# View logs
docker compose logs -f waybackThe server will be available at http://localhost:8080. Skip to step 4 (Install the Userscript).
For detailed Docker configuration and deployment options, see docs/DOCKER.md.
Download the latest release from the Releases page:
- macOS:
wayback-server-darwin-amd64.tar.gz(Intel) orwayback-server-darwin-arm64.tar.gz(Apple Silicon) - Linux:
wayback-server-linux-amd64.tar.gzorwayback-server-linux-arm64.tar.gz - Windows:
wayback-server-windows-amd64.zip - Userscript:
wayback-userscript.js
Extract the archive:
# macOS/Linux
tar -xzf wayback-server-*.tar.gz
# Windows: extract the .zip fileBuilding from source? See docs/BUILD.md for manual compilation instructions.
# PostgreSQL 默认使用当前系统用户名作为数据库用户
# 如果你的系统用户名是 alice,以下命令等同于 createdb -U alice wayback
createdb wayback
# Run the schema (init_db.sql is included in the release archive)
psql wayback < init_db.sql# Optional: create .env file for custom configuration
# See Configuration section below for available options
./wayback-serverThe server starts at http://localhost:8080 by default.
If you need a proxy for downloading external resources:
export http_proxy=http://127.0.0.1:7897
export https_proxy=http://127.0.0.1:7897
./wayback-server- Download
wayback-userscript.jsfrom the Releases page - Open Tampermonkey dashboard in your browser
- Click "Create a new script"
- Paste the contents of
wayback-userscript.js - Save and enable
Chrome users: Right-click the Tampermonkey icon → Manage extension, then enable the "Allow user scripts" toggle. Firefox does not require this step.
That's it. Pages are automatically archived as soon as they load. Open http://localhost:8080 to browse your archive.
Puppeteer Integration: For automated archiving, see docs/PUPPETEER.md.
Environment variables (or .env file in the project root):
The server automatically loads .env from the working directory if it exists. You can also set environment variables directly.
| Variable | Default | Description |
|---|---|---|
DB_HOST |
localhost |
PostgreSQL host |
DB_PORT |
5432 |
PostgreSQL port |
DB_USER |
postgres |
Database user (PostgreSQL 默认使用系统用户名,建议不设置此变量) |
DB_PASSWORD |
(empty) | Database password |
DB_NAME |
wayback |
Database name |
DB_SSLMODE |
disable |
SSL mode |
SERVER_HOST |
127.0.0.1 |
Server bind address (0.0.0.0 = all interfaces, 127.0.0.1 = localhost only) |
SERVER_PORT |
8080 |
HTTP server port |
ALLOWED_ORIGINS |
http://localhost:8080,http://127.0.0.1:8080,null |
CORS allowed origins (comma-separated). For remote deployment, add your domain: https://your-domain.com,null |
DATA_DIR |
./data |
Storage directory for HTML and resources |
LOG_DIR |
./data/logs |
Log file directory |
AUTH_PASSWORD |
(empty) | HTTP Basic Auth password (disabled when empty, username: wayback). REQUIRED for remote deployment |
COMPRESSION_LEVEL |
-1 |
Compression level: 1 (fastest) to 9 (best), -1 (default/balanced). Response compression always enabled, auto-negotiated via Accept-Encoding |
Server-side (responses): Always enabled, auto-negotiated
- Clients that send
Accept-Encoding: gzipget compressed responses - Clients that don't support gzip get uncompressed responses
- No configuration needed - works automatically
Client-side (uploads): Configurable in browser/src/config.ts
- For local deployment (default): Keep
ENABLE_COMPRESSION: false- Localhost transfer is already fast
- No CPU overhead from compression
- For remote deployment: Set
ENABLE_COMPRESSION: true- 95%+ reduction for uploads (large HTML snapshots)
- Rebuild userscript:
cd browser && npm run build
For deploying to a remote server, see REMOTE_DEPLOYMENT.md for detailed instructions.
Quick setup:
# .env configuration
ALLOWED_ORIGINS=https://your-domain.com,null
AUTH_PASSWORD=your_secure_password
SERVER_HOST=0.0.0.0
# Browser config.ts
SERVER_URL: 'https://your-domain.com/api/archive'
AUTH_PASSWORD: 'your_secure_password'
ENABLE_COMPRESSION: true # Enable upload compression for remote deploymentSecurity Notes:
- Always use HTTPS for remote deployment
- Set a strong
AUTH_PASSWORD - Limit
ALLOWED_ORIGINSto trusted domains only - Both CORS and Basic Auth are required for security (defense in depth)
Performance Notes:
- Enable
ENABLE_COMPRESSIONin browser config for remote deployment - Reduces upload bandwidth by 95%+ (especially for large HTML snapshots)
- Response compression is automatic (no configuration needed)
- Minimal CPU overhead, significant network savings
| Method | Endpoint | Description |
|---|---|---|
GET |
/api/version |
Server version and build info |
POST |
/api/archive |
Create a page archive |
PUT |
/api/archive/:id |
Update an existing archive snapshot |
GET |
/api/pages |
List all archived pages |
GET |
/api/pages/:id |
Get page details |
GET |
/api/pages/:id/content |
Get page content as Markdown (for AI/LLM consumption) |
GET |
/api/search?q=keyword |
Search pages by URL or title |
GET |
/api/pages/timeline?url=URL |
Get all snapshots of a URL (timeline view) |
GET |
/api/logs |
List available log files |
GET |
/api/logs/latest |
Get latest log file content (supports ?tail=N&grep=keyword) |
GET |
/api/logs/:filename |
Get log file content (supports ?tail=N&grep=keyword) |
GET |
/view/:id |
Replay an archived page |
GET |
/timeline?url=URL |
Visual timeline page for a URL |
GET |
/logs |
Server logs viewer |
Returns { status, page_id, action } where action is created or unchanged (content identical, only last_visited updated).
Accepts the same body as POST. Replaces the snapshot content — old HTML and resource associations are removed, resources are re-processed. Returns { status, page_id, action } where action is updated or unchanged.
wayback-archiver/
├── Makefile # Build, test, cross-compile
├── bin/ # Build output (server binary + userscript)
├── browser/ # Tampermonkey userscript (TypeScript)
│ ├── src/
│ │ ├── main.ts # Entry point & orchestration
│ │ ├── config.ts # Constants
│ │ ├── types.ts # TypeScript interfaces
│ │ ├── page-filter.ts # URL filtering logic
│ │ ├── page-freezer.ts # Freeze page runtime state
│ │ ├── dom-collector.ts # DOM serialization
│ │ └── archiver.ts # Server communication
│ ├── dist/ # Built userscript
│ └── build.js # Bundle script
│
├── server/ # Go backend
│ ├── cmd/server/main.go # Entry point
│ ├── internal/
│ │ ├── api/ # HTTP handlers (modular)
│ │ ├── config/ # Environment-based config
│ │ ├── database/ # PostgreSQL operations
│ │ ├── logging/ # File-based logging with rotation
│ │ ├── models/ # Data models
│ │ └── storage/ # File storage & dedup
│ └── web/ # Web UI static files
│
├── .env.example # Configuration template
└── tests/ # Test suites
├── browser/ # Browser-side tests
└── server/ # Server-side & E2E tests
data/
├── html/ # HTML snapshots, organized by date
│ └── 2026/03/09/
│ └── <timestamp>_<hash>.html
├── logs/ # Server logs, rotated by size (10MB) and date (7-day retention)
│ ├── wayback-2026-03-12.001.log
│ └── wayback-2026-03-12.002.log
└── resources/ # Deduplicated static resources
└── ab/cd/
└── <sha256>.css
See docs/BUILD.md for build instructions, cross-compilation, and testing.
This project includes an Agent skill for AI-assisted querying and exploration of your archived pages. Use it to search, analyze, and interact with your archive through natural language.
- Some cross-origin resources may still fail due to server-side 403/404 responses
- Dynamically injected scripts (loaded via JS at runtime) may not be captured
- Tracking pixels and analytics URLs with dynamic parameters are not preserved (they don't affect page rendering)
- Very large media files (video, large images) will consume significant storage
MIT


