Wayback Archiver

The Memory of Your Internet — Archive Everything You Browse.

English | 中文

A self-hosted personal web archiving system that automatically captures and preserves web pages you visit in Chrome — HTML, CSS, JavaScript, images, and all. When the original page goes offline, you can still browse your archived copy with styles and layout intact.

How It Works

Chrome + Tampermonkey ──HTTP POST──▶ Go Server ──▶ PostgreSQL (metadata)
  (auto-capture on                    │               + File System (assets)
   page load)                         │
                                      ▼
                                   Web UI ──▶ Browse / Search / Replay

A Tampermonkey userscript runs in your browser, automatically capturing the full DOM and resources once the page finishes loading. If significant DOM changes occur afterward, it submits one additional update.
The Go server receives the snapshot, downloads any cross-origin resources the browser couldn't fetch, deduplicates everything by content hash, and stores it locally.
A built-in Web UI lets you list, search, and replay any archived page — fully offline, no external dependencies.

Features

High-fidelity replay — CSSOM serialization, computed styles inlining, and anti-refresh protection reproduce pages as close to the original as possible
Full-page capture — HTML, CSS, JS, images, fonts; resource URLs are rewritten to local paths
Cross-origin resource recovery — server-side extraction and download of resources blocked by CORS
Content-hash deduplication — identical resources shared across pages are stored only once (SHA-256)
Version history — same URL archived multiple times, distinguished by timestamp
Timeline view — browse all snapshots of a URL on a visual timeline (like web.archive.org), with prev/next navigation between snapshots
Smart dedup — session-level + server-level dedup prevents redundant captures; content-hash comparison skips unchanged pages
Dynamic content support — captures the live DOM state; MutationObserver triggers one auto-update if significant changes occur after initial capture
SPA-aware — detects SPA navigation, resets capture state per route
Anti-refresh protection — archived pages are frozen: timers, WebSockets, and navigation APIs are neutralized
Web UI — responsive interface to browse, full-text search (page content, URL, and title), filter by date range and domain, and replay archived pages
RESTful API — programmatic access to all archiving and query operations

Prerequisites

PostgreSQL 14+
Chrome or Firefox + Tampermonkey extension (v5.3+)

Quick Start

Option A: Docker (Recommended)

The fastest way to get started. Docker Compose will set up both the server and PostgreSQL automatically.

# Clone the repository
git clone https://github.com/icodeface/wayback-archiver.git
cd wayback-archiver

# Start all services
docker compose up -d

# View logs
docker compose logs -f wayback

The server will be available at http://localhost:8080. Skip to step 4 (Install the Userscript).

For detailed Docker configuration and deployment options, see docs/DOCKER.md.

Option B: Pre-built Binaries

1. Download Pre-built Binaries

Download the latest release from the Releases page:

macOS: wayback-server-darwin-amd64.tar.gz (Intel) or wayback-server-darwin-arm64.tar.gz (Apple Silicon)
Linux: wayback-server-linux-amd64.tar.gz or wayback-server-linux-arm64.tar.gz
Windows: wayback-server-windows-amd64.zip
Userscript: wayback-userscript.js

Extract the archive:

# macOS/Linux
tar -xzf wayback-server-*.tar.gz

# Windows: extract the .zip file

Building from source? See docs/BUILD.md for manual compilation instructions.

2. Database Setup

# PostgreSQL 默认使用当前系统用户名作为数据库用户
# 如果你的系统用户名是 alice，以下命令等同于 createdb -U alice wayback
createdb wayback

# Run the schema (init_db.sql is included in the release archive)
psql wayback < init_db.sql

3. Start the Server

# Optional: create .env file for custom configuration
# See Configuration section below for available options

./wayback-server

The server starts at http://localhost:8080 by default.

If you need a proxy for downloading external resources:

export http_proxy=http://127.0.0.1:7897
export https_proxy=http://127.0.0.1:7897
./wayback-server

4. Install the Userscript

Download wayback-userscript.js from the Releases page
Open Tampermonkey dashboard in your browser
Click "Create a new script"
Paste the contents of wayback-userscript.js
Save and enable

Chrome users: Right-click the Tampermonkey icon → Manage extension, then enable the "Allow user scripts" toggle. Firefox does not require this step.

5. Start Browsing

That's it. Pages are automatically archived as soon as they load. Open http://localhost:8080 to browse your archive.

Puppeteer Integration: For automated archiving, see docs/PUPPETEER.md.

Configuration

Environment variables (or .env file in the project root):

The server automatically loads .env from the working directory if it exists. You can also set environment variables directly.

Variable	Default	Description
`DB_HOST`	`localhost`	PostgreSQL host
`DB_PORT`	`5432`	PostgreSQL port
`DB_USER`	`postgres`	Database user (PostgreSQL 默认使用系统用户名，建议不设置此变量)
`DB_PASSWORD`	(empty)	Database password
`DB_NAME`	`wayback`	Database name
`DB_SSLMODE`	`disable`	SSL mode
`SERVER_HOST`	`127.0.0.1`	Server bind address (`0.0.0.0` = all interfaces, `127.0.0.1` = localhost only)
`SERVER_PORT`	`8080`	HTTP server port
`ALLOWED_ORIGINS`	`http://localhost:8080,http://127.0.0.1:8080,null`	CORS allowed origins (comma-separated). For remote deployment, add your domain: `https://your-domain.com,null`
`DATA_DIR`	`./data`	Storage directory for HTML and resources
`LOG_DIR`	`./data/logs`	Log file directory
`AUTH_PASSWORD`	(empty)	HTTP Basic Auth password (disabled when empty, username: `wayback`). REQUIRED for remote deployment
`COMPRESSION_LEVEL`	`-1`	Compression level: 1 (fastest) to 9 (best), -1 (default/balanced). Response compression always enabled, auto-negotiated via Accept-Encoding

Compression Settings

Server-side (responses): Always enabled, auto-negotiated

Clients that send Accept-Encoding: gzip get compressed responses
Clients that don't support gzip get uncompressed responses
No configuration needed - works automatically

Client-side (uploads): Configurable in browser/src/config.ts

For local deployment (default): Keep ENABLE_COMPRESSION: false
- Localhost transfer is already fast
- No CPU overhead from compression
For remote deployment: Set ENABLE_COMPRESSION: true
- 95%+ reduction for uploads (large HTML snapshots)
- Rebuild userscript: cd browser && npm run build

Remote Deployment

For deploying to a remote server, see REMOTE_DEPLOYMENT.md for detailed instructions.

Quick setup:

# .env configuration
ALLOWED_ORIGINS=https://your-domain.com,null
AUTH_PASSWORD=your_secure_password
SERVER_HOST=0.0.0.0

# Browser config.ts
SERVER_URL: 'https://your-domain.com/api/archive'
AUTH_PASSWORD: 'your_secure_password'
ENABLE_COMPRESSION: true  # Enable upload compression for remote deployment

Security Notes:

Always use HTTPS for remote deployment
Set a strong AUTH_PASSWORD
Limit ALLOWED_ORIGINS to trusted domains only
Both CORS and Basic Auth are required for security (defense in depth)

Performance Notes:

Enable ENABLE_COMPRESSION in browser config for remote deployment
Reduces upload bandwidth by 95%+ (especially for large HTML snapshots)
Response compression is automatic (no configuration needed)
Minimal CPU overhead, significant network savings

API

Method	Endpoint	Description
`GET`	`/api/version`	Server version and build info
`POST`	`/api/archive`	Create a page archive
`PUT`	`/api/archive/:id`	Update an existing archive snapshot
`GET`	`/api/pages`	List all archived pages
`GET`	`/api/pages/:id`	Get page details
`GET`	`/api/pages/:id/content`	Get page content as Markdown (for AI/LLM consumption)
`GET`	`/api/search?q=keyword`	Search pages by URL or title
`GET`	`/api/pages/timeline?url=URL`	Get all snapshots of a URL (timeline view)
`GET`	`/api/logs`	List available log files
`GET`	`/api/logs/latest`	Get latest log file content (supports `?tail=N&grep=keyword`)
`GET`	`/api/logs/:filename`	Get log file content (supports `?tail=N&grep=keyword`)
`GET`	`/view/:id`	Replay an archived page
`GET`	`/timeline?url=URL`	Visual timeline page for a URL
`GET`	`/logs`	Server logs viewer

POST /api/archive

Returns { status, page_id, action } where action is created or unchanged (content identical, only last_visited updated).

PUT /api/archive/:id

Accepts the same body as POST. Replaces the snapshot content — old HTML and resource associations are removed, resources are re-processed. Returns { status, page_id, action } where action is updated or unchanged.

Project Structure

wayback-archiver/
├── Makefile                  # Build, test, cross-compile
├── bin/                      # Build output (server binary + userscript)
├── browser/                  # Tampermonkey userscript (TypeScript)
│   ├── src/
│   │   ├── main.ts           # Entry point & orchestration
│   │   ├── config.ts         # Constants
│   │   ├── types.ts          # TypeScript interfaces
│   │   ├── page-filter.ts    # URL filtering logic
│   │   ├── page-freezer.ts   # Freeze page runtime state
│   │   ├── dom-collector.ts  # DOM serialization
│   │   └── archiver.ts       # Server communication
│   ├── dist/                 # Built userscript
│   └── build.js              # Bundle script
│
├── server/                   # Go backend
│   ├── cmd/server/main.go    # Entry point
│   ├── internal/
│   │   ├── api/              # HTTP handlers (modular)
│   │   ├── config/           # Environment-based config
│   │   ├── database/         # PostgreSQL operations
│   │   ├── logging/          # File-based logging with rotation
│   │   ├── models/           # Data models
│   │   └── storage/          # File storage & dedup
│   └── web/                  # Web UI static files
│
├── .env.example              # Configuration template
└── tests/                    # Test suites
    ├── browser/              # Browser-side tests
    └── server/               # Server-side & E2E tests

Storage Layout

data/
├── html/                     # HTML snapshots, organized by date
│   └── 2026/03/09/
│       └── <timestamp>_<hash>.html
├── logs/                     # Server logs, rotated by size (10MB) and date (7-day retention)
│   ├── wayback-2026-03-12.001.log
│   └── wayback-2026-03-12.002.log
└── resources/                # Deduplicated static resources
    └── ab/cd/
        └── <sha256>.css

Building from Source

See docs/BUILD.md for build instructions, cross-compilation, and testing.

Agent Integration

This project includes an Agent skill for AI-assisted querying and exploration of your archived pages. Use it to search, analyze, and interact with your archive through natural language.

Known Limitations

Some cross-origin resources may still fail due to server-side 403/404 responses
Dynamically injected scripts (loaded via JS at runtime) may not be captured
Tracking pixels and analytics URLs with dynamic parameters are not preserved (they don't affect page rendering)
Very large media files (video, large images) will consume significant storage

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 93 Commits
.github/workflows		.github/workflows
browser		browser
docs		docs
screenshot		screenshot
server		server
tests		tests
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README-zh.md		README-zh.md
README.md		README.md
docker-compose.yml		docker-compose.yml
package-lock.json		package-lock.json
package.json		package.json
skill.md		skill.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Wayback Archiver

How It Works

Features

Prerequisites

Quick Start

Option A: Docker (Recommended)

Option B: Pre-built Binaries

1. Download Pre-built Binaries

2. Database Setup

3. Start the Server

4. Install the Userscript

5. Start Browsing

Configuration

Compression Settings

Remote Deployment

API

POST /api/archive

PUT /api/archive/:id

Project Structure

Storage Layout

Building from Source

Agent Integration

Known Limitations

License

About

Uh oh!

Releases 9

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Wayback Archiver

How It Works

Features

Prerequisites

Quick Start

Option A: Docker (Recommended)

Option B: Pre-built Binaries

1. Download Pre-built Binaries

2. Database Setup

3. Start the Server

4. Install the Userscript

5. Start Browsing

Configuration

Compression Settings

Remote Deployment

API

POST /api/archive

PUT /api/archive/:id

Project Structure

Storage Layout

Building from Source

Agent Integration

Known Limitations

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 9

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages