diff --git a/README.md b/README.md index 36d7270..5abcd8e 100644 --- a/README.md +++ b/README.md @@ -1,492 +1,494 @@ -

Krawl

- -

- - -

-
- -

- A modern, customizable web honeypot server designed to detect and track malicious activity from attackers and web crawlers through deceptive web pages, fake credentials, and canary tokens. -

- -
- - License - - - Release - -
- -
- - GitHub Container Registry - - - Kubernetes - - - Helm Chart - -
-
- -## Table of Contents -- [Demo](#demo) -- [What is Krawl?](#what-is-krawl) -- [Krawl Dashboard](#krawl-dashboard) -- [Deployment Modes](#deployment-modes) -- [Quickstart](#quickstart) - - [Docker Run](#docker-run) - - [Docker Compose](#docker-compose) - - [Kubernetes](#kubernetes) - - [Uvicorn (Python)](#uvicorn-python) -- [Configuration](#configuration) - - [config.yaml](#configuration-via-configyaml) - - [Environment Variables](#configuration-via-environmental-variables) -- [Ban Malicious IPs](#use-krawl-to-ban-malicious-ips) -- [IP Reputation](#ip-reputation) -- [Forward Server Header](#forward-server-header) -- [Additional Documentation](#additional-documentation) -- [Deception using AI](#ai-generated-deception-pages) -- [Contributing](#contributing) - -## Demo -Tip: crawl the `robots.txt` paths for additional fun -### Krawl URL: [http://demo.krawlme.com](http://demo.krawlme.com) -### View the dashboard [http://demo.krawlme.com/das_dashboard](http://demo.krawlme.com/das_dashboard) - -## What is Krawl? - -**Krawl** is a cloud‑native deception server designed to detect, delay, and analyze malicious attackers, web crawlers and automated scanners. - -It creates realistic fake web applications filled with low‑hanging fruit such as admin panels, configuration files, and exposed fake credentials to attract and identify suspicious activity. - -![dashboard](img/deception-page.png) - -By wasting attacker resources, Krawl helps clearly distinguish malicious behavior from legitimate crawlers. - -It features: - -- **[AI Generated Deception Pages](docs/ai_generation.md)**: **Let attackers help generate your fake vulnerable attack surface** -- **Spider Trap Pages**: Infinite random links to waste crawler resources based on the [spidertrap project](https://github.com/adhdproject/spidertrap) -- **Fake Login Pages**: WordPress, phpMyAdmin, admin panels -- **Honeypot Paths**: Advertised in robots.txt to catch scanners -- **Fake Credentials**: Realistic-looking usernames, passwords, API keys -- **[Canary Token](docs/canary-token.md) Integration**: External alert triggering -- **Random server headers**: Confuse attacks based on server header and version -- **Real-time Dashboard**: Monitor suspicious activity -- **Customizable Wordlists**: Easy JSON-based configuration -- **Random Error Injection**: Mimic real server behavior - -You can easily expose Krawl alongside your other services to shield them from web crawlers and malicious users using a reverse proxy. For more details, see the [Reverse Proxy documentation](docs/reverse-proxy.md). - -![use case](img/use-case.png) - -## Krawl Dashboard - -Krawl provides a comprehensive dashboard, accessible at a **random secret path** generated at startup or at a **custom path** configured via `KRAWL_DASHBOARD_SECRET_PATH`. This keeps the dashboard hidden from attackers scanning your honeypot. - -The dashboard is organized in six tabs: - -- **Overview**: high-level view of attack activity: an interactive map of IP origins, recent suspicious requests, and top IPs, User-Agents, and paths. - -![geoip](img/geoip_dashboard.png) - -- **Attacks**: detailed breakdown of captured credentials, honeypot triggers, and detected attack types (SQLi, XSS, path traversal, etc.) with charts and tables. - -![attack_types](img/attack_types.png) - -- **IP Insight**: in-depth forensic view of a selected IP: geolocation, ISP/ASN info, reputation flags, behavioral timeline, attack type distribution, and full access history. - -![ipinsight](img/ip_insight_dashboard.png) - -Additionally, after authenticating with the dashboard password, two protected tabs become available: - -- **Tracked IPs**: maintain a watchlist of IP addresses you want to monitor over time. -- **IP Banlist**: manage IP bans, view detected attackers, and export the banlist in raw or IPTables format. -- **Deception**: manage AI generated pages, export them or import new ones. - -For more details, see the [Dashboard documentation](docs/dashboard.md). - -## Deployment Modes - -Krawl supports two deployment modes, controlled by the `mode` setting in `config.yaml` or the `KRAWL_MODE` environment variable. - -| | Standalone | Scalable | -|---|---|---| -| **Database** | SQLite (WAL mode) | PostgreSQL | -| **Cache** | In-memory Python dict | Redis (multi-tier TTL) | -| **Replicas** | 1 (single instance) | 1+ (horizontal scaling) | -| **External deps** | None | PostgreSQL + Redis | -| **Best for** | Dev, homelabs, <500k requests | Production, HA, >500k requests | - -**Standalone** — ideal for development environments or homelabs with low request counts. Zero additional configuration needed, just run Krawl and it works. -- Single container deployment — no external dependencies -- Lower RAM and resource usage - -**Scalable** — designed for production environments or high-traffic honeypots. The Helm chart defaults to this mode. -- Faster, more responsive dashboard thanks to Redis multi-tier caching -- Lower disk I/O with Redis acting as a hot-path cache in front of PostgreSQL -- Horizontal scaling — increase the number of Krawl replicas behind a load balancer - -For detailed configuration, Docker Compose examples, Kubernetes/Helm setup, and step-by-step migration instructions, see the [Deployment Modes documentation](docs/deployment-modes.md). - -## Quickstart - -### Docker Run - -Run Krawl in standalone mode with the latest image: - -```bash -docker run -d \ - -p 5000:5000 \ - -e KRAWL_DASHBOARD_SECRET_PATH="/my-secret-dashboard" \ - -e KRAWL_DASHBOARD_PASSWORD="my-secret-password" \ - -v krawl-data:/app/data \ - --name krawl \ - ghcr.io/blessedrebus/krawl:latest -``` - -Access the server at `http://localhost:5000` - -### Docker Compose - -Create a `docker-compose.yaml` with one of the two deployment modes. - -**Standalone** — just Krawl server with Sqlite storage: - -```yaml -services: - krawl: - image: ghcr.io/blessedrebus/krawl:latest - container_name: krawl-server - ports: - - "5000:5000" - environment: - - CONFIG_LOCATION=config.yaml - # - KRAWL_DASHBOARD_PASSWORD=my-secret-password - volumes: - - ./config.yaml:/app/config.yaml:ro - - krawl-data:/app/data - restart: unless-stopped - -volumes: - krawl-data: -``` - -**Scalable** — with PostgreSQL and Redis: - -> [!CAUTION] -> The example below uses **default passwords** (`krawl`/`krawl`). **Change them before deploying to production.** - -```yaml -services: - postgres: - image: postgres:16-alpine - environment: - POSTGRES_DB: krawl - POSTGRES_USER: krawl - POSTGRES_PASSWORD: krawl - volumes: - - postgres_data:/var/lib/postgresql/data - restart: unless-stopped - healthcheck: - test: ["CMD-SHELL", "pg_isready -U krawl -d krawl"] - interval: 10s - timeout: 5s - retries: 5 - - redis: - image: redis:7-alpine - volumes: - - redis_data:/data - restart: unless-stopped - healthcheck: - test: ["CMD", "redis-cli", "ping"] - interval: 10s - timeout: 5s - retries: 5 - - krawl: - image: ghcr.io/blessedrebus/krawl:latest - container_name: krawl-server - ports: - - "5000:5000" - environment: - - CONFIG_LOCATION=config.yaml - - KRAWL_MODE=scalable - - KRAWL_POSTGRES_HOST=postgres - - KRAWL_POSTGRES_PORT=5432 - - KRAWL_POSTGRES_USER=krawl - - KRAWL_POSTGRES_PASSWORD=krawl - - KRAWL_POSTGRES_DATABASE=krawl - - KRAWL_REDIS_HOST=redis - - KRAWL_REDIS_PORT=6379 - # - KRAWL_DASHBOARD_PASSWORD=my-secret-password - volumes: - - ./config.yaml:/app/config.yaml:ro - restart: unless-stopped - depends_on: - postgres: - condition: service_healthy - redis: - condition: service_healthy - -volumes: - postgres_data: - redis_data: -``` - -To deploy, just run -```bash -docker compose up -d -``` - -Production-ready compose files are also available in the [`docker/`](docker/) directory. For **development** (builds from source with hot-reload), use the compose files at the project root. - -For more details on both modes, see [Deployment Modes](docs/deployment-modes.md). - -### Kubernetes -**Krawl is also available natively on Kubernetes**. Installation can be done either [via manifest](kubernetes/README.md) or [using the Helm chart](helm/README.md). - -The Helm chart **defaults to scalable mode** with bundled PostgreSQL and Redis: - -```bash -helm install krawl oci://ghcr.io/blessedrebus/krawl-chart --version 2.1.0 \ - -n krawl-system --create-namespace \ - --set postgres.password=your-password \ - --set redis.password=your-redis-password \ - --set dashboardPassword=your-dashboard-password \ - --set config.dashboard.secret_path=/my-secret-dashboard -``` - -Minimal example values files are provided for both modes: -- [`values-minimal.yaml`](helm/values-minimal.yaml) — Scalable (default) -- [`values-standalone.yaml`](helm/values-standalone.yaml) — Standalone - -See [Deployment Modes](docs/deployment-modes.md) and [Chart documentation](helm/README.md) for full configuration and migration instructions. - -### Uvicorn (Python) - -Run Krawl directly with Python 3.13+ and uvicorn for local development or testing: - -```bash -pip install -r requirements.txt -uvicorn app:app --host 0.0.0.0 --port 5000 --app-dir src -``` - -Access the server at `http://localhost:5000` - - -## Configuration -Krawl uses a **configuration hierarchy** in which **environment variables take precedence over the configuration file**. This approach is recommended for Docker deployments and quick out-of-the-box customization. - -### Configuration via config.yaml -You can use the [config.yaml](config.yaml) file for advanced configurations, such as Docker Compose or Helm chart deployments. - -### Configuration via Environmental Variables - -| Environment Variable | Description | Default | -|----------------------|-------------|---------| -| `CONFIG_LOCATION` | Path to yaml config file | `config.yaml` | -| `KRAWL_PORT` | Server listening port | `5000` | -| `KRAWL_DELAY` | Response delay in milliseconds | `100` | -| `KRAWL_SERVER_HEADER` | HTTP Server header for deception | `""` | -| `KRAWL_LINKS_LENGTH_RANGE` | Link length range as `min,max` | `5,15` | -| `KRAWL_LINKS_PER_PAGE_RANGE` | Links per page as `min,max` | `10,15` | -| `KRAWL_CHAR_SPACE` | Characters used for link generation | `abcdefgh...` | -| `KRAWL_MAX_COUNTER` | Initial counter value | `10` | -| `KRAWL_CANARY_TOKEN_URL` | External canary token URL | None | -| `KRAWL_CANARY_TOKEN_TRIES` | Requests before showing canary token | `10` | -| `KRAWL_DASHBOARD_SECRET_PATH` | Custom dashboard path | Auto-generated | -| `KRAWL_DASHBOARD_PASSWORD` | Password for protected dashboard panels | Auto-generated | -| `KRAWL_DASHBOARD_CACHE_WARMUP` | Pre-compute dashboard data every 5 minutes for instant page loads | `true` | -| `KRAWL_DASHBOARD_WARMUP_PAGES` | Number of pages to pre-warm per table panel | `10` | -| `KRAWL_DASHBOARD_WARMUP_AGGREGATION` | Pre-compute full top_paths/top_ua aggregations for zero-query serving | `false` | -| `KRAWL_DASHBOARD_TOP_N_MIN_COUNT` | Minimum access count for top paths/user agents panels (set to 1 to disable) | `5` | -| `KRAWL_PROBABILITY_ERROR_CODES` | Error response probability (0-100%) | `0` | -| `KRAWL_DATABASE_PATH` | Database file location | `data/krawl.db` | -| `KRAWL_DATABASE_PERSIST_SUSPICIOUS_ONLY` | Only persist suspicious requests to the access log | `false` | -| `KRAWL_BACKUPS_PATH` | Path where database dump are saved | `backups` | -| `KRAWL_BACKUPS_CRON` | cron expression to control backup job schedule | `*/30 * * * *` | -| `KRAWL_BACKUPS_ENABLED` | Boolean to enable db dump job | `true` | -| `KRAWL_DATABASE_RETENTION_DAYS` | Days to retain data in database | `30` | -| `KRAWL_TARPIT_ENABLED` | Trap AI agents with slow responses and random text | `false` | -| `KRAWL_TARPIT_DELAY_SECONDS` | Extra delay in seconds added per response when tarpit is active | `5` | -| `KRAWL_HTTP_RISKY_METHODS_THRESHOLD` | Threshold for risky HTTP methods detection | `0.1` | -| `KRAWL_VIOLATED_ROBOTS_THRESHOLD` | Threshold for robots.txt violations | `0.1` | -| `KRAWL_UNEVEN_REQUEST_TIMING_THRESHOLD` | Coefficient of variation threshold for timing | `0.5` | -| `KRAWL_UNEVEN_REQUEST_TIMING_TIME_WINDOW_SECONDS` | Time window for request timing analysis in seconds | `300` | -| `KRAWL_USER_AGENTS_USED_THRESHOLD` | Threshold for detecting multiple user agents | `2` | -| `KRAWL_ATTACK_URLS_THRESHOLD` | Threshold for attack URL detection | `1` | -| `KRAWL_INFINITE_PAGES_FOR_MALICIOUS` | Serve infinite pages to malicious IPs | `true` | -| `KRAWL_MAX_PAGES_LIMIT` | Maximum page limit for crawlers | `250` | -| `KRAWL_BAN_DURATION_SECONDS` | Ban duration in seconds for rate-limited IPs | `600` | -| `KRAWL_AI_ENABLED` | Enable AI-generated deception pages | `false` | -| `KRAWL_AI_PROVIDER` | AI provider (`"openrouter"` or `"openai"`) | `"openrouter"` | -| `KRAWL_AI_OPENAI_BASE_URL` | Optional OpenAI Base URL for custom API endpoints | `"https://api.openai.com/v1"` | -| `KRAWL_AI_API_KEY` | API key for AI provider | `None` | -| `KRAWL_AI_MODEL` | AI model to use for page generation | `"nvidia/nemotron-3-super-120b-a12b:free"` | -| `KRAWL_AI_TIMEOUT` | Request timeout in seconds for AI API calls | `60` | -| `KRAWL_AI_MAX_DAILY_REQUESTS` | Max number of AI-generated pages per day (0 = unlimited) | `0` | -| `KRAWL_AI_PROMPT` | Custom prompt template for AI page generation | Default prompt | -| **Scalable mode** | | | -| `KRAWL_MODE` | Deployment mode (`standalone` or `scalable`) | `standalone` | -| `KRAWL_POSTGRES_HOST` | PostgreSQL hostname | `localhost` | -| `KRAWL_POSTGRES_PORT` | PostgreSQL port | `5432` | -| `KRAWL_POSTGRES_USER` | PostgreSQL username | `krawl` | -| `KRAWL_POSTGRES_PASSWORD` | PostgreSQL password | `krawl` | -| `KRAWL_POSTGRES_DATABASE` | PostgreSQL database name | `krawl` | -| `KRAWL_REDIS_HOST` | Redis hostname | `localhost` | -| `KRAWL_REDIS_PORT` | Redis port | `6379` | -| `KRAWL_REDIS_DB` | Redis database number | `0` | -| `KRAWL_REDIS_PASSWORD` | Redis password | None | -| `KRAWL_REDIS_CACHE_TTL` | TTL in seconds for dashboard warmup data | `600` | -| `KRAWL_REDIS_HOT_TTL` | TTL in seconds for hot-path data (ban info, IP categories) | `30` | -| `KRAWL_REDIS_TABLE_TTL` | TTL in seconds for paginated dashboard tables | `120` | - -For example - -```bash -# Set canary token -export CONFIG_LOCATION="config.yaml" -export KRAWL_CANARY_TOKEN_URL="http://your-canary-token-url" - -# Set number of pages range (min,max format) -export KRAWL_LINKS_PER_PAGE_RANGE="5,25" - -# Set analyzer thresholds -export KRAWL_HTTP_RISKY_METHODS_THRESHOLD="0.2" -export KRAWL_VIOLATED_ROBOTS_THRESHOLD="0.15" - -# Set custom dashboard path and password -export KRAWL_DASHBOARD_SECRET_PATH="/my-secret-dashboard" -export KRAWL_DASHBOARD_PASSWORD="my-secret-password" -``` - -Example of a Docker run with env variables (standalone mode): - -```bash -docker run -d \ - -p 5000:5000 \ - -e KRAWL_MODE=standalone \ - -e KRAWL_PORT=5000 \ - -e KRAWL_DELAY=100 \ - -e KRAWL_DASHBOARD_PASSWORD="my-secret-password" \ - -e KRAWL_CANARY_TOKEN_URL="http://your-canary-token-url" \ - --name krawl \ - ghcr.io/blessedrebus/krawl:latest -``` - -## Use Krawl to Ban Malicious IPs -Krawl uses a reputation-based system to classify attacker IP addresses and provides two ways to export IP lists for firewall integration. - -The `/api/export-ips` endpoint queries the database directly and supports filtering by IP category (`attacker`, `bad_crawler`, `regular_user`, `good_crawler`) and output format (`raw`, `iptables`, `nftables`): - -```bash -curl "https://your-krawl-instance//api/export-ips?categories=attacker&fwtype=raw" -``` - -This enables automatic blocking of malicious traffic across various platforms: -* [OPNsense and pfSense](https://ipv64.net/v64_blocklist_integration_guide) -* [RouterOS](https://rentry.co/krawl-routeros) -* [IPtables](plugins/iptables/README.md) and [Nftables](plugins/nftables/README.md) -* [Fail2Ban](plugins/fail2ban/README.md) - -For full API parameters, examples, and adding custom firewall formats, see the [Firewall Exporters documentation](docs/firewall-exporters.md). - -## IP Reputation -Krawl [uses tasks that analyze recent traffic to build and continuously update an IP reputation](src/tasks/analyze_ips.py) score. It runs periodically and evaluates each active IP address based on multiple behavioral indicators to classify it as an attacker, crawler, or regular user. Thresholds are fully customizable. - -![ip reputation](img/ip-reputation.png) - -The analysis includes: -- **Risky HTTP methods usage** (e.g. POST, PUT, DELETE ratios) -- **Robots.txt violations** -- **Request timing anomalies** (bursty or irregular patterns) -- **User-Agent consistency** -- **Attack URL detection** (e.g. SQL injection, XSS patterns) - -Each signal contributes to a weighted scoring model that assigns a reputation category: -- `attacker` -- `bad_crawler` -- `good_crawler` -- `regular_user` -- `unknown` (for insufficient data) - -The resulting scores and metrics are stored in the database and used by Krawl to drive dashboards, reputation tracking, and automated mitigation actions such as IP banning or firewall integration. - -## AI-Generated Deception Pages - -Krawl can automatically generate realistic deception pages using AI models from **OpenRouter** or **OpenAI** APIs. This feature creates unique, plausible honeypot pages on-the-fly to deceive attackers without manual page creation. - -**Key Features:** -- **Dynamic Generation**: Creates unique HTML pages for any request path -- **Smart Caching**: Caches generated pages to avoid redundant API calls -- **Daily Rate Limiting**: Control API costs with configurable request limits -- **Multiple Providers**: Support for OpenRouter (free options) and OpenAI -- **Graceful Fallback**: Falls back to standard honeypot when disabled or limit reached -- **Cached Serving**: Previously generated pages served even when AI is disabled - -**Quick Setup:** - -```yaml -ai: - enabled: true - provider: "openrouter" - openai_base_url: "your-custom-base-url" - api_key: "your-api-key" - model: "nvidia/nemotron-3-super-120b-a12b:free" - timeout: 60 - max_daily_requests: 10 -``` - -For detailed configuration and usage, see the [AI Generation documentation](docs/ai_generation.md). - -## Forward server header -If Krawl is deployed behind a proxy such as NGINX the **server header** should be forwarded using the following configuration in your proxy: - -```bash -location / { - proxy_pass https://your-krawl-instance; - proxy_pass_header Server; -} -``` - -## Additional Documentation - -| Topic | Description | -|-------|-------------| -| [AI Generation](docs/ai_generation.md) | Configure AI-generated deception pages using OpenRouter or OpenAI | -| [Deception Pages](docs/deception_pages.md) | Manage, import, and export deception pages; bulk operations and date-based filtering | -| [Deployment Modes](docs/deployment-modes.md) | Standalone (SQLite) vs Scalable (PostgreSQL + Redis) mode, configuration, and data migration | -| [Honeypot](docs/honeypot.md) | Full overview of honeypot pages: fake logins, directory listings, credential files, SQLi/XSS/XXE/command injection traps, and more | -| [Dashboard](docs/dashboard.md) | Access and explore the real-time monitoring dashboard | -| [API](docs/api.md) | External APIs used by Krawl for IP data, reputation, and geolocation | -| [Reverse Proxy](docs/reverse-proxy.md) | How to deploy Krawl behind NGINX or use decoy subdomains | -| [Database Backups](docs/backups.md) | Enable and configure the automatic database dump job | -| [Canary Token](docs/canary-token.md) | Set up external alert triggers via canarytokens.org | -| [Wordlist](docs/wordlist.md) | Customize fake usernames, passwords, and directory listings | -| [Architecture](docs/architecture.md) | Technical overview of the codebase, request pipeline, database schema, and background tasks | -| [Firewall Exporters](docs/firewall-exporters.md) | Export IP banlists in raw, iptables, or nftables format via REST API | - -## Contributing - -Contributions welcome! Please: -1. Fork the repository -2. Create a feature branch -3. Make your changes -4. Submit a pull request (explain the changes!) - - -## Disclaimer -> [!CAUTION] -> This is a deception/honeypot system. Deploy in isolated environments and monitor carefully for security events. Use responsibly and in compliance with applicable laws and regulations. - -## Star History -Star History Chart +

Krawl

+ +

+ + +

+
+ +

+ A modern, customizable web honeypot server designed to detect and track malicious activity from attackers and web crawlers through deceptive web pages, fake credentials, and canary tokens. +

+ +
+ + License + + + Release + +
+ +
+ + GitHub Container Registry + + + Kubernetes + + + Helm Chart + +
+
+ +## Table of Contents +- [Demo](#demo) +- [What is Krawl?](#what-is-krawl) +- [Krawl Dashboard](#krawl-dashboard) +- [Deployment Modes](#deployment-modes) +- [Quickstart](#quickstart) + - [Docker Run](#docker-run) + - [Docker Compose](#docker-compose) + - [Kubernetes](#kubernetes) + - [Uvicorn (Python)](#uvicorn-python) +- [Configuration](#configuration) + - [config.yaml](#configuration-via-configyaml) + - [Environment Variables](#configuration-via-environmental-variables) +- [Ban Malicious IPs](#use-krawl-to-ban-malicious-ips) +- [IP Reputation](#ip-reputation) +- [Forward Server Header](#forward-server-header) +- [Additional Documentation](#additional-documentation) +- [Deception using AI](#ai-generated-deception-pages) +- [Contributing](#contributing) + +## Demo +Tip: crawl the `robots.txt` paths for additional fun +### Krawl URL: [http://demo.krawlme.com](http://demo.krawlme.com) +### View the dashboard [http://demo.krawlme.com/das_dashboard](http://demo.krawlme.com/das_dashboard) + +## What is Krawl? + +**Krawl** is a cloud‑native deception server designed to detect, delay, and analyze malicious attackers, web crawlers and automated scanners. + +It creates realistic fake web applications filled with low‑hanging fruit such as admin panels, configuration files, and exposed fake credentials to attract and identify suspicious activity. + +![dashboard](img/deception-page.png) + +By wasting attacker resources, Krawl helps clearly distinguish malicious behavior from legitimate crawlers. + +It features: + +- **[AI Generated Deception Pages](docs/ai_generation.md)**: **Let attackers help generate your fake vulnerable attack surface** +- **Spider Trap Pages**: Infinite random links to waste crawler resources based on the [spidertrap project](https://github.com/adhdproject/spidertrap) +- **Fake Login Pages**: WordPress, phpMyAdmin, admin panels +- **Honeypot Paths**: Advertised in robots.txt to catch scanners +- **Fake Credentials**: Realistic-looking usernames, passwords, API keys +- **[Canary Token](docs/canary-token.md) Integration**: External alert triggering +- **Random server headers**: Confuse attacks based on server header and version +- **Real-time Dashboard**: Monitor suspicious activity +- **Customizable Wordlists**: Easy JSON-based configuration +- **Random Error Injection**: Mimic real server behavior + +You can easily expose Krawl alongside your other services to shield them from web crawlers and malicious users using a reverse proxy. For more details, see the [Reverse Proxy documentation](docs/reverse-proxy.md). + +![use case](img/use-case.png) + +## Krawl Dashboard + +Krawl provides a comprehensive dashboard, accessible at a **random secret path** generated at startup or at a **custom path** configured via `KRAWL_DASHBOARD_SECRET_PATH`. This keeps the dashboard hidden from attackers scanning your honeypot. + +The dashboard is organized in six tabs: + +- **Overview**: high-level view of attack activity: an interactive map of IP origins, recent suspicious requests, and top IPs, User-Agents, and paths. + +![geoip](img/geoip_dashboard.png) + +- **Attacks**: detailed breakdown of captured credentials, honeypot triggers, and detected attack types (SQLi, XSS, path traversal, etc.) with charts and tables. + +![attack_types](img/attack_types.png) + +- **IP Insight**: in-depth forensic view of a selected IP: geolocation, ISP/ASN info, reputation flags, behavioral timeline, attack type distribution, and full access history. + +![ipinsight](img/ip_insight_dashboard.png) + +Additionally, after authenticating with the dashboard password, two protected tabs become available: + +- **Tracked IPs**: maintain a watchlist of IP addresses you want to monitor over time. +- **IP Banlist**: manage IP bans, view detected attackers, and export the banlist in raw or IPTables format. +- **Deception**: manage AI generated pages, export them or import new ones. + +For more details, see the [Dashboard documentation](docs/dashboard.md). + +## Deployment Modes + +Krawl supports two deployment modes, controlled by the `mode` setting in `config.yaml` or the `KRAWL_MODE` environment variable. + +| | Standalone | Scalable | +|---|---|---| +| **Database** | SQLite (WAL mode) | PostgreSQL | +| **Cache** | In-memory Python dict | Redis (multi-tier TTL) | +| **Replicas** | 1 (single instance) | 1+ (horizontal scaling) | +| **External deps** | None | PostgreSQL + Redis | +| **Best for** | Dev, homelabs, <500k requests | Production, HA, >500k requests | + +**Standalone** — ideal for development environments or homelabs with low request counts. Zero additional configuration needed, just run Krawl and it works. +- Single container deployment — no external dependencies +- Lower RAM and resource usage + +**Scalable** — designed for production environments or high-traffic honeypots. The Helm chart defaults to this mode. +- Faster, more responsive dashboard thanks to Redis multi-tier caching +- Lower disk I/O with Redis acting as a hot-path cache in front of PostgreSQL +- Horizontal scaling — increase the number of Krawl replicas behind a load balancer + +For detailed configuration, Docker Compose examples, Kubernetes/Helm setup, and step-by-step migration instructions, see the [Deployment Modes documentation](docs/deployment-modes.md). + +## Quickstart + +### Docker Run + +Run Krawl in standalone mode with the latest image: + +```bash +docker run -d \ + -p 5000:5000 \ + -e KRAWL_DASHBOARD_SECRET_PATH="/my-secret-dashboard" \ + -e KRAWL_DASHBOARD_PASSWORD="my-secret-password" \ + -v krawl-data:/app/data \ + --name krawl \ + ghcr.io/blessedrebus/krawl:latest +``` + +Access the server at `http://localhost:5000` + +### Docker Compose + +Create a `docker-compose.yaml` with one of the two deployment modes. + +**Standalone** — just Krawl server with Sqlite storage: + +```yaml +services: + krawl: + image: ghcr.io/blessedrebus/krawl:latest + container_name: krawl-server + ports: + - "5000:5000" + environment: + - CONFIG_LOCATION=config.yaml + # - KRAWL_DASHBOARD_PASSWORD=my-secret-password + volumes: + - ./config.yaml:/app/config.yaml:ro + - krawl-data:/app/data + restart: unless-stopped + +volumes: + krawl-data: +``` + +**Scalable** — with PostgreSQL and Redis: + +> [!CAUTION] +> The example below uses **default passwords** (`krawl`/`krawl`). **Change them before deploying to production.** + +```yaml +services: + postgres: + image: postgres:16-alpine + environment: + POSTGRES_DB: krawl + POSTGRES_USER: krawl + POSTGRES_PASSWORD: krawl + volumes: + - postgres_data:/var/lib/postgresql/data + restart: unless-stopped + healthcheck: + test: ["CMD-SHELL", "pg_isready -U krawl -d krawl"] + interval: 10s + timeout: 5s + retries: 5 + + redis: + image: redis:7-alpine + volumes: + - redis_data:/data + restart: unless-stopped + healthcheck: + test: ["CMD", "redis-cli", "ping"] + interval: 10s + timeout: 5s + retries: 5 + + krawl: + image: ghcr.io/blessedrebus/krawl:latest + container_name: krawl-server + ports: + - "5000:5000" + environment: + - CONFIG_LOCATION=config.yaml + - KRAWL_MODE=scalable + - KRAWL_POSTGRES_HOST=postgres + - KRAWL_POSTGRES_PORT=5432 + - KRAWL_POSTGRES_USER=krawl + - KRAWL_POSTGRES_PASSWORD=krawl + - KRAWL_POSTGRES_DATABASE=krawl + - KRAWL_REDIS_HOST=redis + - KRAWL_REDIS_PORT=6379 + # - KRAWL_DASHBOARD_PASSWORD=my-secret-password + volumes: + - ./config.yaml:/app/config.yaml:ro + restart: unless-stopped + depends_on: + postgres: + condition: service_healthy + redis: + condition: service_healthy + +volumes: + postgres_data: + redis_data: +``` + +To deploy, just run +```bash +docker compose up -d +``` + +Production-ready compose files are also available in the [`docker/`](docker/) directory. For **development** (builds from source with hot-reload), use the compose files at the project root. + +For more details on both modes, see [Deployment Modes](docs/deployment-modes.md). + +### Kubernetes +**Krawl is also available natively on Kubernetes**. Installation can be done either [via manifest](kubernetes/README.md) or [using the Helm chart](helm/README.md). + +The Helm chart **defaults to scalable mode** with bundled PostgreSQL and Redis: + +```bash +helm install krawl oci://ghcr.io/blessedrebus/krawl-chart --version 2.1.0 \ + -n krawl-system --create-namespace \ + --set postgres.password=your-password \ + --set redis.password=your-redis-password \ + --set dashboardPassword=your-dashboard-password \ + --set config.dashboard.secret_path=/my-secret-dashboard +``` + +Minimal example values files are provided for both modes: +- [`values-minimal.yaml`](helm/values-minimal.yaml) — Scalable (default) +- [`values-standalone.yaml`](helm/values-standalone.yaml) — Standalone + +See [Deployment Modes](docs/deployment-modes.md) and [Chart documentation](helm/README.md) for full configuration and migration instructions. + +### Uvicorn (Python) + +Run Krawl directly with Python 3.13+ and uvicorn for local development or testing: + +```bash +pip install -r requirements.txt +uvicorn app:app --host 0.0.0.0 --port 5000 --app-dir src +``` + +Access the server at `http://localhost:5000` + + +## Configuration +Krawl uses a **configuration hierarchy** in which **environment variables take precedence over the configuration file**. This approach is recommended for Docker deployments and quick out-of-the-box customization. + +### Configuration via config.yaml +You can use the [config.yaml](config.yaml) file for advanced configurations, such as Docker Compose or Helm chart deployments. + +### Configuration via Environmental Variables + +| Environment Variable | Description | Default | +|----------------------|-------------|---------| +| `CONFIG_LOCATION` | Path to yaml config file | `config.yaml` | +| `KRAWL_PORT` | Server listening port | `5000` | +| `KRAWL_DELAY` | Response delay in milliseconds | `100` | +| `KRAWL_SERVER_HEADER` | HTTP Server header for deception | `""` | +| `KRAWL_LINKS_LENGTH_RANGE` | Link length range as `min,max` | `5,15` | +| `KRAWL_LINKS_PER_PAGE_RANGE` | Links per page as `min,max` | `10,15` | +| `KRAWL_CHAR_SPACE` | Characters used for link generation | `abcdefgh...` | +| `KRAWL_MAX_COUNTER` | Initial counter value | `10` | +| `KRAWL_CANARY_TOKEN_URL` | External canary token URL | None | +| `KRAWL_CANARY_TOKEN_TRIES` | Requests before showing canary token | `10` | +| `KRAWL_DASHBOARD_SECRET_PATH` | Custom dashboard path | Auto-generated | +| `KRAWL_DASHBOARD_PASSWORD` | Password for protected dashboard panels | Auto-generated | +| `KRAWL_DASHBOARD_CACHE_WARMUP` | Pre-compute dashboard data every 5 minutes for instant page loads | `true` | +| `KRAWL_DASHBOARD_WARMUP_PAGES` | Number of pages to pre-warm per table panel | `10` | +| `KRAWL_DASHBOARD_WARMUP_AGGREGATION` | Pre-compute full top_paths/top_ua aggregations for zero-query serving | `false` | +| `KRAWL_DASHBOARD_TOP_N_MIN_COUNT` | Minimum access count for top paths/user agents panels (set to 1 to disable) | `5` | +| `KRAWL_PROBABILITY_ERROR_CODES` | Error response probability (0-100%) | `0` | +| `KRAWL_DATABASE_PATH` | Database file location | `data/krawl.db` | +| `KRAWL_DATABASE_PERSIST_SUSPICIOUS_ONLY` | Only persist suspicious requests to the access log | `false` | +| `KRAWL_BACKUPS_PATH` | Path where database dump are saved | `backups` | +| `KRAWL_BACKUPS_CRON` | cron expression to control backup job schedule | `*/30 * * * *` | +| `KRAWL_BACKUPS_ENABLED` | Boolean to enable db dump job | `true` | +| `KRAWL_DATABASE_RETENTION_DAYS` | Days to retain data in database | `30` | +| `KRAWL_TARPIT_ENABLED` | Trap AI agents with slow responses and random text | `false` | +| `KRAWL_TARPIT_DELAY_SECONDS` | Extra delay in seconds added per response when tarpit is active | `5` | +| `KRAWL_HTTP_RISKY_METHODS_THRESHOLD` | Threshold for risky HTTP methods detection | `0.1` | +| `KRAWL_VIOLATED_ROBOTS_THRESHOLD` | Threshold for robots.txt violations | `0.1` | +| `KRAWL_UNEVEN_REQUEST_TIMING_THRESHOLD` | Coefficient of variation threshold for timing | `0.5` | +| `KRAWL_UNEVEN_REQUEST_TIMING_TIME_WINDOW_SECONDS` | Time window for request timing analysis in seconds | `300` | +| `KRAWL_USER_AGENTS_USED_THRESHOLD` | Threshold for detecting multiple user agents | `2` | +| `KRAWL_ATTACK_URLS_THRESHOLD` | Threshold for attack URL detection | `1` | +| `KRAWL_INFINITE_PAGES_FOR_MALICIOUS` | Serve infinite pages to malicious IPs | `true` | +| `KRAWL_MAX_PAGES_LIMIT` | Maximum page limit for crawlers | `250` | +| `KRAWL_BAN_DURATION_SECONDS` | Ban duration in seconds for rate-limited IPs | `600` | +| `KRAWL_AI_ENABLED` | Enable AI-generated deception pages | `false` | +| `KRAWL_AI_PROVIDER` | AI provider (`"openrouter"` or `"openai"`) | `"openrouter"` | +| `KRAWL_AI_OPENAI_BASE_URL` | Optional OpenAI Base URL for custom API endpoints | `"https://api.openai.com/v1"` | +| `KRAWL_AI_API_KEY` | API key for AI provider | `None` | +| `KRAWL_AI_MODEL` | AI model to use for page generation | `"nvidia/nemotron-3-super-120b-a12b:free"` | +| `KRAWL_AI_TIMEOUT` | Request timeout in seconds for AI API calls | `60` | +| `KRAWL_AI_MAX_DAILY_REQUESTS` | Max number of AI-generated pages per day (0 = unlimited) | `0` | +| `KRAWL_AI_PROMPT` | Custom prompt template for AI page generation | Default prompt | +| `KRAWL_CUSTOM_TEMPLATE_PATH` | Path inside the container to a custom HTML template. Template must include `{counter}` and `{content}` placeholders. | `/templates/custom_page.html` | +| **Scalable mode** | | | +| `KRAWL_MODE` | Deployment mode (`standalone` or `scalable`) | `standalone` | +| `KRAWL_POSTGRES_HOST` | PostgreSQL hostname | `localhost` | +| `KRAWL_POSTGRES_PORT` | PostgreSQL port | `5432` | +| `KRAWL_POSTGRES_USER` | PostgreSQL username | `krawl` | +| `KRAWL_POSTGRES_PASSWORD` | PostgreSQL password | `krawl` | +| `KRAWL_POSTGRES_DATABASE` | PostgreSQL database name | `krawl` | +| `KRAWL_REDIS_HOST` | Redis hostname | `localhost` | +| `KRAWL_REDIS_PORT` | Redis port | `6379` | +| `KRAWL_REDIS_DB` | Redis database number | `0` | +| `KRAWL_REDIS_PASSWORD` | Redis password | None | +| `KRAWL_REDIS_CACHE_TTL` | TTL in seconds for dashboard warmup data | `600` | +| `KRAWL_REDIS_HOT_TTL` | TTL in seconds for hot-path data (ban info, IP categories) | `30` | +| `KRAWL_REDIS_TABLE_TTL` | TTL in seconds for paginated dashboard tables | `120` | + +For example + +```bash +# Set canary token +export CONFIG_LOCATION="config.yaml" +export KRAWL_CANARY_TOKEN_URL="http://your-canary-token-url" + +# Set number of pages range (min,max format) +export KRAWL_LINKS_PER_PAGE_RANGE="5,25" + +# Set analyzer thresholds +export KRAWL_HTTP_RISKY_METHODS_THRESHOLD="0.2" +export KRAWL_VIOLATED_ROBOTS_THRESHOLD="0.15" + +# Set custom dashboard path and password +export KRAWL_DASHBOARD_SECRET_PATH="/my-secret-dashboard" +export KRAWL_DASHBOARD_PASSWORD="my-secret-password" +``` + +Example of a Docker run with env variables (standalone mode): + +```bash +docker run -d \ + -p 5000:5000 \ + -e KRAWL_MODE=standalone \ + -e KRAWL_PORT=5000 \ + -e KRAWL_DELAY=100 \ + -e KRAWL_DASHBOARD_PASSWORD="my-secret-password" \ + -e KRAWL_CUSTOM_TEMPLATE_PATH="/templates/custom_page.html" \ + -e KRAWL_CANARY_TOKEN_URL="http://your-canary-token-url" \ + --name krawl \ + ghcr.io/blessedrebus/krawl:latest +``` + +## Use Krawl to Ban Malicious IPs +Krawl uses a reputation-based system to classify attacker IP addresses and provides two ways to export IP lists for firewall integration. + +The `/api/export-ips` endpoint queries the database directly and supports filtering by IP category (`attacker`, `bad_crawler`, `regular_user`, `good_crawler`) and output format (`raw`, `iptables`, `nftables`): + +```bash +curl "https://your-krawl-instance//api/export-ips?categories=attacker&fwtype=raw" +``` + +This enables automatic blocking of malicious traffic across various platforms: +* [OPNsense and pfSense](https://ipv64.net/v64_blocklist_integration_guide) +* [RouterOS](https://rentry.co/krawl-routeros) +* [IPtables](plugins/iptables/README.md) and [Nftables](plugins/nftables/README.md) +* [Fail2Ban](plugins/fail2ban/README.md) + +For full API parameters, examples, and adding custom firewall formats, see the [Firewall Exporters documentation](docs/firewall-exporters.md). + +## IP Reputation +Krawl [uses tasks that analyze recent traffic to build and continuously update an IP reputation](src/tasks/analyze_ips.py) score. It runs periodically and evaluates each active IP address based on multiple behavioral indicators to classify it as an attacker, crawler, or regular user. Thresholds are fully customizable. + +![ip reputation](img/ip-reputation.png) + +The analysis includes: +- **Risky HTTP methods usage** (e.g. POST, PUT, DELETE ratios) +- **Robots.txt violations** +- **Request timing anomalies** (bursty or irregular patterns) +- **User-Agent consistency** +- **Attack URL detection** (e.g. SQL injection, XSS patterns) + +Each signal contributes to a weighted scoring model that assigns a reputation category: +- `attacker` +- `bad_crawler` +- `good_crawler` +- `regular_user` +- `unknown` (for insufficient data) + +The resulting scores and metrics are stored in the database and used by Krawl to drive dashboards, reputation tracking, and automated mitigation actions such as IP banning or firewall integration. + +## AI-Generated Deception Pages + +Krawl can automatically generate realistic deception pages using AI models from **OpenRouter** or **OpenAI** APIs. This feature creates unique, plausible honeypot pages on-the-fly to deceive attackers without manual page creation. + +**Key Features:** +- **Dynamic Generation**: Creates unique HTML pages for any request path +- **Smart Caching**: Caches generated pages to avoid redundant API calls +- **Daily Rate Limiting**: Control API costs with configurable request limits +- **Multiple Providers**: Support for OpenRouter (free options) and OpenAI +- **Graceful Fallback**: Falls back to standard honeypot when disabled or limit reached +- **Cached Serving**: Previously generated pages served even when AI is disabled + +**Quick Setup:** + +```yaml +ai: + enabled: true + provider: "openrouter" + openai_base_url: "your-custom-base-url" + api_key: "your-api-key" + model: "nvidia/nemotron-3-super-120b-a12b:free" + timeout: 60 + max_daily_requests: 10 +``` + +For detailed configuration and usage, see the [AI Generation documentation](docs/ai_generation.md). + +## Forward server header +If Krawl is deployed behind a proxy such as NGINX the **server header** should be forwarded using the following configuration in your proxy: + +```bash +location / { + proxy_pass https://your-krawl-instance; + proxy_pass_header Server; +} +``` + +## Additional Documentation + +| Topic | Description | +|-------|-------------| +| [AI Generation](docs/ai_generation.md) | Configure AI-generated deception pages using OpenRouter or OpenAI | +| [Deception Pages](docs/deception_pages.md) | Manage, import, and export deception pages; bulk operations and date-based filtering | +| [Deployment Modes](docs/deployment-modes.md) | Standalone (SQLite) vs Scalable (PostgreSQL + Redis) mode, configuration, and data migration | +| [Honeypot](docs/honeypot.md) | Full overview of honeypot pages: fake logins, directory listings, credential files, SQLi/XSS/XXE/command injection traps, and more | +| [Dashboard](docs/dashboard.md) | Access and explore the real-time monitoring dashboard | +| [API](docs/api.md) | External APIs used by Krawl for IP data, reputation, and geolocation | +| [Reverse Proxy](docs/reverse-proxy.md) | How to deploy Krawl behind NGINX or use decoy subdomains | +| [Database Backups](docs/backups.md) | Enable and configure the automatic database dump job | +| [Canary Token](docs/canary-token.md) | Set up external alert triggers via canarytokens.org | +| [Wordlist](docs/wordlist.md) | Customize fake usernames, passwords, and directory listings | +| [Architecture](docs/architecture.md) | Technical overview of the codebase, request pipeline, database schema, and background tasks | +| [Firewall Exporters](docs/firewall-exporters.md) | Export IP banlists in raw, iptables, or nftables format via REST API | + +## Contributing + +Contributions welcome! Please: +1. Fork the repository +2. Create a feature branch +3. Make your changes +4. Submit a pull request (explain the changes!) + + +## Disclaimer +> [!CAUTION] +> This is a deception/honeypot system. Deploy in isolated environments and monitor carefully for security events. Use responsibly and in compliance with applicable laws and regulations. + +## Star History +Star History Chart diff --git a/config.yaml b/config.yaml index 632272f..17d37fc 100644 --- a/config.yaml +++ b/config.yaml @@ -109,6 +109,14 @@ tarpit: enabled: false # Opt-in: trap AI agents with slow responses and random text delay_seconds: 5 # Extra delay (seconds) added to each response when tarpit is active +# Custom page template settings +# Operators can provide a custom HTML template file by setting +# `custom_template_path`. If this key is present and not null, Krawl +# will attempt to load the template from the provided path. The template +# may include `{counter}` and `{content}` placeholders but they are +# optional (no strict validation is performed). +# custom_template_path: "/path/to/custom_page.html" + ### AI-Generated Deception pages. Here it can be used either OpenRouter or OpenAI as provider, but also self-hosted LLMs with an API-compatible endpoint ### by setting the provider to "openai" and pointing the openai_base_url to the local LLM server URL. Note that the krawl-llm is the service name of the llama.cpp server in the docker-compose.yaml. ai: diff --git a/docker-compose.yaml b/docker-compose.yaml index f945f06..a3c3e51 100644 --- a/docker-compose.yaml +++ b/docker-compose.yaml @@ -104,6 +104,8 @@ services: - KRAWL_REDIS_HOST=redis - KRAWL_REDIS_PORT=6379 - KRAWL_DASHBOARD_WARMUP_AGGREGATION=true + # Optional: path inside the container where a custom HTML template can be mounted + - KRAWL_CUSTOM_TEMPLATE_PATH=/templates/custom_page.html # Uncomment to set a custom dashboard password (auto-generated if not set) # - KRAWL_DASHBOARD_PASSWORD=your-secret-password # Set this to change timezone @@ -111,6 +113,8 @@ services: volumes: - ./wordlists.json:/app/wordlists.json:ro - ./config.yaml:/app/config.yaml:ro + # Optional: mount your custom template file to /templates/custom_page.html + # - ./custom_page.html:/templates/custom_page.html:ro - ./logs:/app/logs - ./backups:/app/backups restart: unless-stopped diff --git a/helm/Chart.yaml b/helm/Chart.yaml index 7434d66..776edbf 100644 --- a/helm/Chart.yaml +++ b/helm/Chart.yaml @@ -2,8 +2,8 @@ apiVersion: v2 name: krawl-chart description: A Helm chart for Krawl honeypot server type: application -version: 2.1.2 -appVersion: 2.1.2 +version: 2.1.3 +appVersion: 2.1.3 keywords: - honeypot - security diff --git a/helm/templates/custom-template-configmap.yaml b/helm/templates/custom-template-configmap.yaml new file mode 100644 index 0000000..d54f638 --- /dev/null +++ b/helm/templates/custom-template-configmap.yaml @@ -0,0 +1,15 @@ +{{- if .Values.customTemplate.enabled }} +apiVersion: v1 +kind: ConfigMap +metadata: + name: {{ include "krawl.fullname" . }}-custom-template + labels: + {{- include "krawl.labels" . | nindent 4 }} +data: + custom_page.html: | +{{- if .Values.customTemplate.content }} +{{ .Values.customTemplate.content | nindent 4 }} +{{- else }} + +{{- end }} +{{- end }} diff --git a/helm/templates/deployment.yaml b/helm/templates/deployment.yaml index d942971..43c30ee 100644 --- a/helm/templates/deployment.yaml +++ b/helm/templates/deployment.yaml @@ -65,6 +65,10 @@ spec: name: {{ include "krawl.fullname" . }}-ai key: ai-api-key {{- end }} + {{- if .Values.customTemplate.enabled }} + - name: KRAWL_CUSTOM_TEMPLATE_PATH + value: "/etc/krawl/templates/custom_page.html" + {{- end }} {{- if eq .Values.mode "scalable" }} - name: KRAWL_MODE value: "scalable" @@ -110,6 +114,11 @@ spec: mountPath: /app/wordlists.json subPath: wordlists.json readOnly: true + {{- if .Values.customTemplate.enabled }} + - name: custom-template + mountPath: /etc/krawl/templates + readOnly: true + {{- end }} {{- if and .Values.database.persistence.enabled (ne .Values.mode "scalable") }} - name: database mountPath: /app/data @@ -125,6 +134,11 @@ spec: - name: wordlists configMap: name: {{ include "krawl.fullname" . }}-wordlists + {{- if .Values.customTemplate.enabled }} + - name: custom-template + configMap: + name: {{ include "krawl.fullname" . }}-custom-template + {{- end }} {{- if and .Values.database.persistence.enabled (ne .Values.mode "scalable") }} - name: database {{- if .Values.database.persistence.existingClaim }} diff --git a/helm/values.yaml b/helm/values.yaml index 46a4f02..42b4a29 100644 --- a/helm/values.yaml +++ b/helm/values.yaml @@ -143,6 +143,32 @@ config: enabled: false # Opt-in: trap AI agents with slow responses and random text delay_seconds: 5 # Extra delay (seconds) added to each response when tarpit is active +# Custom template settings (optional) +# Helm will mount the custom template at `/etc/krawl/templates/custom_page.html` +customTemplate: + enabled: false + # Inline content for the custom template. If empty, create the ConfigMap without data + # Example HTML below — operators can replace this with their own template. + content: | + + + + + Custom Krawl Page + + + +

Welcome

+
Refresh cycle: {counter}
+ + + + # PostgreSQL settings (only used when mode=scalable) postgres: # Deploy PostgreSQL as part of this chart (set to false if using an external instance) diff --git a/kubernetes/krawl-all-in-one-deploy.yaml b/kubernetes/krawl-all-in-one-deploy.yaml index 118a417..ac06df2 100644 --- a/kubernetes/krawl-all-in-one-deploy.yaml +++ b/kubernetes/krawl-all-in-one-deploy.yaml @@ -118,6 +118,32 @@ data: enabled: false delay_seconds: 5 --- +# Custom template ConfigMap (optional) +apiVersion: v1 +kind: ConfigMap +metadata: + name: krawl-custom-template + namespace: krawl-system + labels: + app.kubernetes.io/name: krawl + app.kubernetes.io/instance: krawl + app.kubernetes.io/version: "2.1.0" +data: + custom_page.html: | + + + + + Krawl Custom + + + +

Krawl

+
{counter}
+
{content}
+ + +--- # Source: krawl-chart/templates/wordlists-configmap.yaml apiVersion: v1 kind: ConfigMap @@ -331,6 +357,8 @@ spec: value: "30" - name: KRAWL_REDIS_TABLE_TTL value: "120" + - name: KRAWL_CUSTOM_TEMPLATE_PATH + value: "/etc/krawl/templates/custom_page.html" volumeMounts: - name: config mountPath: /app/config.yaml @@ -340,6 +368,9 @@ spec: mountPath: /app/wordlists.json subPath: wordlists.json readOnly: true + - name: custom-template + mountPath: /etc/krawl/templates + readOnly: true resources: limits: cpu: 500m @@ -354,6 +385,9 @@ spec: - name: wordlists configMap: name: krawl-wordlists + - name: custom-template + configMap: + name: krawl-custom-template --- # Source: krawl-chart/templates/postgres.yaml apiVersion: apps/v1 diff --git a/kubernetes/manifests/configmap.yaml b/kubernetes/manifests/configmap.yaml deleted file mode 100644 index f04f5e1..0000000 --- a/kubernetes/manifests/configmap.yaml +++ /dev/null @@ -1,57 +0,0 @@ -# Source: krawl-chart/templates/configmap.yaml -apiVersion: v1 -kind: ConfigMap -metadata: - name: krawl-config - namespace: krawl-system - labels: - app.kubernetes.io/name: krawl - app.kubernetes.io/instance: krawl - app.kubernetes.io/version: "2.1.0" -data: - config.yaml: | - # Krawl Honeypot Configuration - server: - port: 5000 - delay: 100 - links: - min_length: 5 - max_length: 15 - min_per_page: 10 - max_per_page: 15 - char_space: "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789" - max_counter: 10 - canary: - token_url: null - token_tries: 10 - dashboard: - secret_path: null - cache_warmup: true - warmup_pages: 10 - warmup_aggregation: true - top_n_min_count: 5 - backups: - path: "backups" - cron: "*/30 * * * *" - enabled: false - logging: - level: "INFO" - database: - path: "data/krawl.db" - retention_days: 30 - persist_suspicious_only: false - behavior: - probability_error_codes: 0 - analyzer: - http_risky_methods_threshold: 0.1 - violated_robots_threshold: 0.1 - uneven_request_timing_threshold: 0.5 - uneven_request_timing_time_window_seconds: 300 - user_agents_used_threshold: 2 - attack_urls_threshold: 1 - crawl: - infinite_pages_for_malicious: true - max_pages_limit: 250 - ban_duration_seconds: 600 - ai: - enabled: false diff --git a/kubernetes/manifests/deployment.yaml b/kubernetes/manifests/deployment.yaml deleted file mode 100644 index 763fce0..0000000 --- a/kubernetes/manifests/deployment.yaml +++ /dev/null @@ -1,91 +0,0 @@ -# Source: krawl-chart/templates/deployment.yaml -apiVersion: apps/v1 -kind: Deployment -metadata: - name: krawl - namespace: krawl-system - labels: - app.kubernetes.io/name: krawl - app.kubernetes.io/instance: krawl - app.kubernetes.io/version: "2.1.0" -spec: - replicas: 1 - strategy: - type: RollingUpdate - selector: - matchLabels: - app.kubernetes.io/name: krawl - app.kubernetes.io/instance: krawl - template: - metadata: - labels: - app.kubernetes.io/name: krawl - app.kubernetes.io/instance: krawl - spec: - containers: - - name: krawl - image: "ghcr.io/blessedrebus/krawl:1.2.0" - imagePullPolicy: Always - ports: - - name: http - containerPort: 5000 - protocol: TCP - env: - - name: CONFIG_LOCATION - value: "config.yaml" - # Uncomment to use dashboard password from secret - # - name: KRAWL_DASHBOARD_PASSWORD - # valueFrom: - # secretKeyRef: - # name: krawl-dashboard - # key: dashboard-password - - name: KRAWL_MODE - value: "scalable" - - name: KRAWL_POSTGRES_HOST - value: "postgres" - - name: KRAWL_POSTGRES_PORT - value: "5432" - - name: KRAWL_POSTGRES_USER - value: "krawl" - - name: KRAWL_POSTGRES_PASSWORD - valueFrom: - secretKeyRef: - name: krawl-postgres - key: postgres-password - - name: KRAWL_POSTGRES_DATABASE - value: "krawl" - - name: KRAWL_REDIS_HOST - value: "redis" - - name: KRAWL_REDIS_PORT - value: "6379" - - name: KRAWL_REDIS_DB - value: "0" - - name: KRAWL_REDIS_CACHE_TTL - value: "600" - - name: KRAWL_REDIS_HOT_TTL - value: "30" - - name: KRAWL_REDIS_TABLE_TTL - value: "120" - volumeMounts: - - name: config - mountPath: /app/config.yaml - subPath: config.yaml - readOnly: true - - name: wordlists - mountPath: /app/wordlists.json - subPath: wordlists.json - readOnly: true - resources: - limits: - cpu: 500m - memory: 256Mi - requests: - cpu: 100m - memory: 64Mi - volumes: - - name: config - configMap: - name: krawl-config - - name: wordlists - configMap: - name: krawl-wordlists diff --git a/kubernetes/manifests/ingress.yaml b/kubernetes/manifests/ingress.yaml deleted file mode 100644 index fda9bd7..0000000 --- a/kubernetes/manifests/ingress.yaml +++ /dev/null @@ -1,23 +0,0 @@ -# Source: krawl-chart/templates/ingress.yaml -apiVersion: networking.k8s.io/v1 -kind: Ingress -metadata: - name: krawl - namespace: krawl-system - labels: - app.kubernetes.io/name: krawl - app.kubernetes.io/instance: krawl - app.kubernetes.io/version: "2.1.0" -spec: - ingressClassName: traefik - rules: - - host: "krawl.example.com" - http: - paths: - - path: / - pathType: Prefix - backend: - service: - name: krawl - port: - number: 5000 diff --git a/kubernetes/manifests/kustomization.yaml b/kubernetes/manifests/kustomization.yaml deleted file mode 100644 index 35f0777..0000000 --- a/kubernetes/manifests/kustomization.yaml +++ /dev/null @@ -1,21 +0,0 @@ -apiVersion: kustomize.config.k8s.io/v1beta1 -kind: Kustomization - -resources: - - namespace.yaml - - configmap.yaml - - wordlists-configmap.yaml - - secret-scalable.yaml - - pvc.yaml - - postgres-pvc.yaml - - postgres-service.yaml - - postgres-statefulset.yaml - - redis-pvc.yaml - - redis-service.yaml - - redis-statefulset.yaml - - deployment.yaml - - service.yaml - - network-policy.yaml - - ingress.yaml - -namespace: krawl-system diff --git a/kubernetes/manifests/namespace.yaml b/kubernetes/manifests/namespace.yaml deleted file mode 100644 index 1cdb578..0000000 --- a/kubernetes/manifests/namespace.yaml +++ /dev/null @@ -1,4 +0,0 @@ -apiVersion: v1 -kind: Namespace -metadata: - name: krawl-system diff --git a/kubernetes/manifests/network-policy.yaml b/kubernetes/manifests/network-policy.yaml deleted file mode 100644 index 4ce1400..0000000 --- a/kubernetes/manifests/network-policy.yaml +++ /dev/null @@ -1,35 +0,0 @@ -# Source: krawl-chart/templates/network-policy.yaml -apiVersion: networking.k8s.io/v1 -kind: NetworkPolicy -metadata: - name: krawl - namespace: krawl-system - labels: - app.kubernetes.io/name: krawl - app.kubernetes.io/instance: krawl - app.kubernetes.io/version: "2.1.0" -spec: - podSelector: - matchLabels: - app.kubernetes.io/name: krawl - app.kubernetes.io/instance: krawl - policyTypes: - - Ingress - - Egress - ingress: - - from: - - podSelector: {} - - namespaceSelector: {} - - ipBlock: - cidr: 0.0.0.0/0 - ports: - - port: 5000 - protocol: TCP - egress: - - ports: - - protocol: TCP - - protocol: UDP - to: - - namespaceSelector: {} - - ipBlock: - cidr: 0.0.0.0/0 diff --git a/kubernetes/manifests/postgres-pvc.yaml b/kubernetes/manifests/postgres-pvc.yaml deleted file mode 100644 index b1b42e6..0000000 --- a/kubernetes/manifests/postgres-pvc.yaml +++ /dev/null @@ -1,17 +0,0 @@ -# Source: krawl-chart/templates/postgres.yaml -apiVersion: v1 -kind: PersistentVolumeClaim -metadata: - name: krawl-postgres - namespace: krawl-system - labels: - app.kubernetes.io/name: krawl - app.kubernetes.io/instance: krawl - app.kubernetes.io/version: "2.1.0" - app.kubernetes.io/component: postgres -spec: - accessModes: - - ReadWriteOnce - resources: - requests: - storage: 5Gi diff --git a/kubernetes/manifests/postgres-service.yaml b/kubernetes/manifests/postgres-service.yaml deleted file mode 100644 index 1e82a59..0000000 --- a/kubernetes/manifests/postgres-service.yaml +++ /dev/null @@ -1,21 +0,0 @@ -# Source: krawl-chart/templates/postgres.yaml -apiVersion: v1 -kind: Service -metadata: - name: postgres - namespace: krawl-system - labels: - app.kubernetes.io/name: krawl - app.kubernetes.io/instance: krawl - app.kubernetes.io/version: "2.1.0" - app.kubernetes.io/component: postgres -spec: - type: ClusterIP - ports: - - port: 5432 - targetPort: postgresql - protocol: TCP - name: postgresql - selector: - app.kubernetes.io/name: krawl-postgres - app.kubernetes.io/instance: krawl diff --git a/kubernetes/manifests/postgres-statefulset.yaml b/kubernetes/manifests/postgres-statefulset.yaml deleted file mode 100644 index b0383ca..0000000 --- a/kubernetes/manifests/postgres-statefulset.yaml +++ /dev/null @@ -1,80 +0,0 @@ -# Source: krawl-chart/templates/postgres.yaml -apiVersion: apps/v1 -kind: StatefulSet -metadata: - name: krawl-postgres - namespace: krawl-system - labels: - app.kubernetes.io/name: krawl - app.kubernetes.io/instance: krawl - app.kubernetes.io/version: "2.1.0" - app.kubernetes.io/component: postgres -spec: - serviceName: postgres - replicas: 1 - selector: - matchLabels: - app.kubernetes.io/name: krawl-postgres - app.kubernetes.io/instance: krawl - template: - metadata: - labels: - app.kubernetes.io/name: krawl-postgres - app.kubernetes.io/instance: krawl - spec: - containers: - - name: postgres - image: "postgres:16-alpine" - imagePullPolicy: IfNotPresent - ports: - - name: postgresql - containerPort: 5432 - protocol: TCP - env: - - name: POSTGRES_DB - value: "krawl" - - name: POSTGRES_USER - value: "krawl" - - name: POSTGRES_PASSWORD - valueFrom: - secretKeyRef: - name: krawl-postgres - key: postgres-password - - name: PGDATA - value: /var/lib/postgresql/data/pgdata - volumeMounts: - - name: data - mountPath: /var/lib/postgresql/data - resources: - limits: - cpu: 500m - memory: 512Mi - requests: - cpu: 100m - memory: 256Mi - livenessProbe: - exec: - command: - - pg_isready - - -U - - "krawl" - - -d - - "krawl" - initialDelaySeconds: 30 - periodSeconds: 10 - timeoutSeconds: 5 - readinessProbe: - exec: - command: - - pg_isready - - -U - - "krawl" - - -d - - "krawl" - initialDelaySeconds: 10 - periodSeconds: 5 - timeoutSeconds: 3 - volumes: - - name: data - persistentVolumeClaim: - claimName: krawl-postgres diff --git a/kubernetes/manifests/pvc.yaml b/kubernetes/manifests/pvc.yaml deleted file mode 100644 index 5e4e850..0000000 --- a/kubernetes/manifests/pvc.yaml +++ /dev/null @@ -1,16 +0,0 @@ -# Source: krawl-chart/templates/pvc.yaml -apiVersion: v1 -kind: PersistentVolumeClaim -metadata: - name: krawl-db - namespace: krawl-system - labels: - app.kubernetes.io/name: krawl - app.kubernetes.io/instance: krawl - app.kubernetes.io/version: "2.1.0" -spec: - accessModes: - - ReadWriteOnce - resources: - requests: - storage: 1Gi diff --git a/kubernetes/manifests/redis-pvc.yaml b/kubernetes/manifests/redis-pvc.yaml deleted file mode 100644 index 4c1e691..0000000 --- a/kubernetes/manifests/redis-pvc.yaml +++ /dev/null @@ -1,17 +0,0 @@ -# Source: krawl-chart/templates/redis.yaml -apiVersion: v1 -kind: PersistentVolumeClaim -metadata: - name: krawl-redis - namespace: krawl-system - labels: - app.kubernetes.io/name: krawl - app.kubernetes.io/instance: krawl - app.kubernetes.io/version: "2.1.0" - app.kubernetes.io/component: redis -spec: - accessModes: - - ReadWriteOnce - resources: - requests: - storage: 1Gi diff --git a/kubernetes/manifests/redis-service.yaml b/kubernetes/manifests/redis-service.yaml deleted file mode 100644 index 38864b6..0000000 --- a/kubernetes/manifests/redis-service.yaml +++ /dev/null @@ -1,21 +0,0 @@ -# Source: krawl-chart/templates/redis.yaml -apiVersion: v1 -kind: Service -metadata: - name: redis - namespace: krawl-system - labels: - app.kubernetes.io/name: krawl - app.kubernetes.io/instance: krawl - app.kubernetes.io/version: "2.1.0" - app.kubernetes.io/component: redis -spec: - type: ClusterIP - ports: - - port: 6379 - targetPort: redis - protocol: TCP - name: redis - selector: - app.kubernetes.io/name: krawl-redis - app.kubernetes.io/instance: krawl diff --git a/kubernetes/manifests/redis-statefulset.yaml b/kubernetes/manifests/redis-statefulset.yaml deleted file mode 100644 index 14dd272..0000000 --- a/kubernetes/manifests/redis-statefulset.yaml +++ /dev/null @@ -1,56 +0,0 @@ -# Source: krawl-chart/templates/redis.yaml -apiVersion: apps/v1 -kind: StatefulSet -metadata: - name: krawl-redis - namespace: krawl-system - labels: - app.kubernetes.io/name: krawl - app.kubernetes.io/instance: krawl - app.kubernetes.io/version: "2.1.0" - app.kubernetes.io/component: redis -spec: - serviceName: redis - replicas: 1 - selector: - matchLabels: - app.kubernetes.io/name: krawl-redis - app.kubernetes.io/instance: krawl - template: - metadata: - labels: - app.kubernetes.io/name: krawl-redis - app.kubernetes.io/instance: krawl - spec: - containers: - - name: redis - image: "redis:7-alpine" - imagePullPolicy: IfNotPresent - ports: - - name: redis - containerPort: 6379 - protocol: TCP - volumeMounts: - - name: data - mountPath: /data - resources: - limits: - cpu: 250m - memory: 128Mi - requests: - cpu: 50m - memory: 64Mi - livenessProbe: - tcpSocket: - port: redis - initialDelaySeconds: 10 - periodSeconds: 10 - readinessProbe: - tcpSocket: - port: redis - initialDelaySeconds: 5 - periodSeconds: 5 - volumes: - - name: data - persistentVolumeClaim: - claimName: krawl-redis diff --git a/kubernetes/manifests/secret-scalable.yaml b/kubernetes/manifests/secret-scalable.yaml deleted file mode 100644 index 271b420..0000000 --- a/kubernetes/manifests/secret-scalable.yaml +++ /dev/null @@ -1,13 +0,0 @@ -# Source: krawl-chart/templates/secret-scalable.yaml -apiVersion: v1 -kind: Secret -metadata: - name: krawl-postgres - namespace: krawl-system - labels: - app.kubernetes.io/name: krawl - app.kubernetes.io/instance: krawl - app.kubernetes.io/version: "2.1.0" -type: Opaque -stringData: - postgres-password: "krawl" diff --git a/kubernetes/manifests/secret.yaml b/kubernetes/manifests/secret.yaml deleted file mode 100644 index 0099f7d..0000000 --- a/kubernetes/manifests/secret.yaml +++ /dev/null @@ -1,15 +0,0 @@ -# Source: krawl-chart/templates/secret.yaml -# Uncomment and set your dashboard password below. -# If not created, the password will be auto-generated and printed in the pod logs. -# -# apiVersion: v1 -# kind: Secret -# metadata: -# name: krawl-dashboard -# namespace: krawl-system -# labels: -# app.kubernetes.io/name: krawl -# app.kubernetes.io/instance: krawl -# type: Opaque -# stringData: -# dashboard-password: "your-secret-password" diff --git a/kubernetes/manifests/service.yaml b/kubernetes/manifests/service.yaml deleted file mode 100644 index a4a5a85..0000000 --- a/kubernetes/manifests/service.yaml +++ /dev/null @@ -1,25 +0,0 @@ -# Source: krawl-chart/templates/service.yaml -apiVersion: v1 -kind: Service -metadata: - name: krawl - namespace: krawl-system - labels: - app.kubernetes.io/name: krawl - app.kubernetes.io/instance: krawl - app.kubernetes.io/version: "2.1.0" -spec: - type: LoadBalancer - externalTrafficPolicy: Local - sessionAffinity: ClientIP - sessionAffinityConfig: - clientIP: - timeoutSeconds: 10800 - ports: - - port: 5000 - targetPort: http - protocol: TCP - name: http - selector: - app.kubernetes.io/name: krawl - app.kubernetes.io/instance: krawl diff --git a/kubernetes/manifests/wordlists-configmap.yaml b/kubernetes/manifests/wordlists-configmap.yaml deleted file mode 100644 index 9e87491..0000000 --- a/kubernetes/manifests/wordlists-configmap.yaml +++ /dev/null @@ -1,13 +0,0 @@ -# Source: krawl-chart/templates/wordlists-configmap.yaml -apiVersion: v1 -kind: ConfigMap -metadata: - name: krawl-wordlists - namespace: krawl-system - labels: - app.kubernetes.io/name: krawl - app.kubernetes.io/instance: krawl - app.kubernetes.io/version: "2.1.0" -data: - wordlists.json: | - {"api_keys":{"prefixes":["sk_live_","sk_test_","api_","key_","token_","access_","secret_","prod_",""]},"applications":{"names":["WebApp","API Gateway","Dashboard","Admin Panel","CMS","Portal","Manager","Console","Control Panel","Backend"]},"attack_patterns":{"command_injection":"(cmd=|exec=|command=|execute=|system=|ping=|host=|\u0026\u0026|\\|\\||;|\\$\\{|\\$\\(|`|\\bid\\b|\\bwhoami\\b|\\buname\\b|\\becho\\b|\\bwget\\b|\\bcurl\\b|\\bnc\\b|\\bnetcat\\b|\\bbash\\b|\\bkill\\b|\\bchmod\\b|\\bchown\\b|\\brm\\b|(?:;|\u0026\u0026|\\|\\||`|\\$\\(|\\$\\{)\\s*(?:ls|pwd|cat|sh|ps|cp|mv)\\b|/bin/bash|/bin/sh|cmd\\.exe|/bin/|/usr/bin/|/sbin/)","common_probes":"(/admin|/wp-admin|/phpMyAdmin|/phpmyadmin|/feedback|\\.env|/credentials\\.txt|/passwords\\.txt|\\.git|/backup\\.sql|/db_backup\\.sql)","login_attempt":"(/wp-login\\.php|/wp-login|/admin/login|/admin/signin|/user/login|/users/login|/account/login|/portal/login|/secure/login|/login\\.php|/login\\.asp|/login\\.aspx|/signin|/sign-in|/sign_in|/auth/login|/api/auth|/api/login|/api/signin|/api/token|/oauth/login|/sso/login|/xmlrpc\\.php|/session/new|action=login)","ldap_injection":"(\\*\\)|\\(\\||\\(\u0026)","lfi_rfi":"(file://|php://|expect://|data://|zip://|phar://|/etc/passwd|/etc/shadow|/proc/self|c:\\\\windows)","path_traversal":"(\\.\\.| %2e%2e|%252e|/etc/passwd|/etc/shadow|\\.\\.\\\\/|\\.\\./|/windows/system32|c:\\\\windows|/proc/self|\\.\\.\\.%2f|\\.\\.\\.%5c|etc/passwd|etc/shadow)","sql_injection":"('|\"|`|--|#|/\\*|\\*/|\\bunion\\b|\\bunion\\s+select\\b|\\bor\\b.*=.*|\\band\\b.*=.*|'.*or.*'.*=.*'|\\bsleep\\b|\\bwaitfor\\b|\\bdelay\\b|\\bbenchmark\\b|;.*select|;.*drop|;.*insert|;.*update|;.*delete|\\bexec\\b|\\bexecute\\b|\\bxp_cmdshell\\b|information_schema|table_schema|table_name)","xss_attempt":"(\u003cscript|\u003c/script|javascript:|onerror=|onload=|onclick=|onmouseover=|onfocus=|onblur=|\u003ciframe|\u003cimg|\u003csvg|\u003cembed|\u003cobject|\u003cbody|\u003cinput|eval\\(|alert\\(|prompt\\(|confirm\\(|document\\.|window\\.|\u003cstyle|expression\\(|vbscript:|data:text/html)","xxe_injection":"(\u003c!ENTITY|\u003c!DOCTYPE|SYSTEM\\s+[\"']|PUBLIC\\s+[\"']|\u0026\\w+;|file://|php://filter|expect://)"},"command_outputs":{"cat_config":"\u003c?php\n// Configuration file\n$db_host = 'localhost';\n$db_user = 'webapp';\n$db_pass = 'fake_password';\n?\u003e\n","download_size_max":10000,"download_size_min":100,"generic":["sh: 1: syntax error: unexpected end of file","Command executed successfully","","/bin/sh: {num}: not found","bash: command not found"],"gid_max":2000,"gid_min":1000,"id":["uid={uid}(www-data) gid={gid}(www-data) groups={gid}(www-data)","uid={uid}(nginx) gid={gid}(nginx) groups={gid}(nginx)","uid={uid}(apache) gid={gid}(apache) groups={gid}(apache)"],"ls":[["index.php","config.php","uploads","assets","README.md",".htaccess","admin"],["app.js","package.json","node_modules","public","views","routes"],["index.html","css","js","images","data","api"]],"network_commands":["bash: wget: command not found","curl: (6) Could not resolve host: example.com","Connection timeout","bash: nc: command not found","Downloaded {size} bytes"],"pwd":["/var/www/html","/home/webapp/public_html","/usr/share/nginx/html","/opt/app/public"],"uid_max":2000,"uid_min":1000,"uname":["Linux webserver 5.4.0-42-generic #46-Ubuntu SMP Fri Jul 10 00:24:02 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux","Linux app-server 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64 GNU/Linux","Linux prod-server 5.15.0-56-generic #62-Ubuntu SMP Tue Nov 22 19:54:14 UTC 2022 x86_64 GNU/Linux"],"whoami":["www-data","nginx","apache","webapp","nobody"]},"credential_fields":{"password_fields":["password","pass","passwd","pwd","passphrase"],"username_fields":["username","user","login","email","log","userid","account"]},"databases":{"hosts":["localhost","db.internal","mysql.local","postgres.internal","127.0.0.1","db-server-01","database.prod","sql.company.com"],"names":["production","prod_db","main_db","app_database","users_db","customer_data","analytics","staging_db","dev_database","wordpress","ecommerce","crm_db","inventory"]},"directory_listing":{"directories":["uploads/","backups/","logs/","temp/","cache/","private/","config/","admin/","database/","backup/","old/","archive/",".git/","keys/","credentials/"],"fake_directories":[{"name":"config","perms":"drwxr-xr-x","size":"4096"},{"name":"backup","perms":"drwxr-xr-x","size":"4096"},{"name":"logs","perms":"drwxrwxr-x","size":"4096"},{"name":"data","perms":"drwxr-xr-x","size":"4096"}],"fake_files":[{"name":"settings.conf","perms":"-rw-r--r--","size_max":8192,"size_min":1024},{"name":"database.sql","perms":"-rw-r--r--","size_max":102400,"size_min":10240},{"name":".htaccess","perms":"-rw-r--r--","size_max":1024,"size_min":256},{"name":"README.md","perms":"-rw-r--r--","size_max":2048,"size_min":512}],"files":["admin.txt","test.exe","backup.sql","database.sql","db_backup.sql","dump.sql","config.php","credentials.txt","passwords.txt","users.csv",".env","id_rsa","id_rsa.pub","private_key.pem","api_keys.json","secrets.yaml","admin_notes.txt","settings.ini","database.yml","wp-config.php",".htaccess","server.key","cert.pem","shadow.bak","passwd.old"]},"emails":{"domains":["example.com","company.com","localhost.com","test.com","domain.com","corporate.com","internal.net","enterprise.com","business.org"]},"error_codes":[400,401,403,404,500,502,503],"fake_passwd":{"gid_max":2000,"gid_min":1000,"shells":["/bin/bash","/bin/sh","/usr/bin/zsh"],"system_users":["root:x:0:0:root:/root:/bin/bash","daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin","bin:x:2:2:bin:/bin:/usr/sbin/nologin","sys:x:3:3:sys:/dev:/usr/sbin/nologin","sync:x:4:65534:sync:/bin:/bin/sync","www-data:x:33:33:www-data:/var/www:/usr/sbin/nologin","backup:x:34:34:backup:/var/backups:/usr/sbin/nologin","mysql:x:108:113:MySQL Server,,,:/nonexistent:/bin/false","sshd:x:109:65534::/run/sshd:/usr/sbin/nologin"],"uid_max":2000,"uid_min":1000},"fake_shadow":{"hash_length":86,"hash_prefix":"$6$rounds=656000$","salt_length":16,"system_entries":["root:$6$rounds=656000$fake_salt_here$fake_hash_data:19000:0:99999:7:::","daemon:*:19000:0:99999:7:::","bin:*:19000:0:99999:7:::","sys:*:19000:0:99999:7:::","www-data:*:19000:0:99999:7:::"]},"passwords":{"prefixes":["P@ssw0rd","Passw0rd","Admin","Secret","Welcome","System","Database","Secure","Master","Root"],"simple":["test","demo","temp","change","password","admin","letmein","welcome","default","sample"]},"proxy_headers":["CF-Connecting-IP","X-Forwarded-For","X-Real-IP"],"server_errors":{"apache":{"os":["Ubuntu","Debian","CentOS"],"template":"\u003c!DOCTYPE HTML PUBLIC \"-//IETF//DTD HTML 2.0//EN\"\u003e\n\u003chtml\u003e\u003chead\u003e\n\u003ctitle\u003e{code} {message}\u003c/title\u003e\n\u003c/head\u003e\u003cbody\u003e\n\u003ch1\u003e{message}\u003c/h1\u003e\n\u003cp\u003eThe requested URL was not found on this server.\u003c/p\u003e\n\u003chr\u003e\n\u003caddress\u003eApache/{version} ({os}) Server at {host} Port 80\u003c/address\u003e\n\u003c/body\u003e\u003c/html\u003e\n","versions":["2.4.41","2.4.52","2.4.54","2.4.57"]},"iis":{"template":"\u003c!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Strict//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd\"\u003e\n\u003chtml xmlns=\"http://www.w3.org/1999/xhtml\"\u003e\n\u003chead\u003e\n\u003cmeta http-equiv=\"Content-Type\" content=\"text/html; charset=iso-8859-1\"/\u003e\n\u003ctitle\u003e{code} - {message}\u003c/title\u003e\n\u003c/head\u003e\n\u003cbody\u003e\n\u003cdiv id=\"header\"\u003e\u003ch1\u003eServer Error\u003c/h1\u003e\u003c/div\u003e\n\u003cdiv id=\"content\"\u003e\n \u003ch2\u003e{code} - {message}\u003c/h2\u003e\n \u003ch3\u003eThe page cannot be displayed because an internal server error has occurred.\u003c/h3\u003e\n\u003c/div\u003e\n\u003c/body\u003e\n\u003c/html\u003e\n","versions":["10.0","8.5","8.0"]},"nginx":{"template":"\u003c!DOCTYPE html\u003e\n\u003chtml\u003e\n\u003chead\u003e\n\u003ctitle\u003e{code} {message}\u003c/title\u003e\n\u003cstyle\u003e\nbody {{\n width: 35em;\n margin: 0 auto;\n font-family: Tahoma, Verdana, Arial, sans-serif;\n}}\n\u003c/style\u003e\n\u003c/head\u003e\n\u003cbody\u003e\n\u003ch1\u003eAn error occurred.\u003c/h1\u003e\n\u003cp\u003eSorry, the page you are looking for is currently unavailable.\u003cbr/\u003e\nPlease try again later.\u003c/p\u003e\n\u003cp\u003eIf you are the system administrator of this resource then you should check the error log for details.\u003c/p\u003e\n\u003cp\u003e\u003cem\u003eFaithfully yours, nginx/{version}.\u003c/em\u003e\u003c/p\u003e\n\u003c/body\u003e\n\u003c/html\u003e\n","versions":["1.18.0","1.20.1","1.22.0","1.24.0"]}},"scoring_weights":{"attacker":{"risky_http_methods":6,"robots_violations":4,"uneven_request_timing":3,"different_user_agents":2,"attack_url":15},"good_crawler":{"risky_http_methods":1,"robots_violations":0,"uneven_request_timing":0,"different_user_agents":0,"attack_url":0},"bad_crawler":{"risky_http_methods":2,"robots_violations":6,"uneven_request_timing":0,"different_user_agents":6,"attack_url":4},"regular_user":{"risky_http_methods":0,"robots_violations":0,"uneven_request_timing":8,"different_user_agents":3,"attack_url":0}},"server_headers":["Apache/2.2.22 (Ubuntu)","nginx/1.18.0","Microsoft-IIS/10.0","LiteSpeed","Caddy","Gunicorn/20.0.4","uvicorn/0.13.4","Express","Flask/1.1.2","Django/3.1"],"sql_errors":{"mongodb":{"collection_errors":["ns not found"],"query_errors":["Failed to parse","unknown operator"]},"mssql":{"column_errors":["Invalid column name '{column}'"],"object_errors":["Invalid object name '{table}'"],"syntax_errors":["Incorrect syntax near","Unclosed quotation mark"]},"mysql":{"column_errors":["Unknown column '{column}' in 'field list'","Unknown column '{column}' in 'where clause'"],"syntax_errors":["You have an error in your SQL syntax","check the manual that corresponds to your MySQL server version"],"table_errors":["Table '{table}' doesn't exist","Unknown table '{table}'"]},"oracle":{"syntax_errors":["ORA-00933: SQL command not properly ended","ORA-00904: invalid identifier"],"table_errors":["ORA-00942: table or view does not exist"]},"postgresql":{"column_errors":["ERROR: column \"{column}\" does not exist"],"relation_errors":["ERROR: relation \"{table}\" does not exist"],"syntax_errors":["ERROR: syntax error at or near","ERROR: unterminated quoted string"]},"sqlite":{"column_errors":["no such column: {column}"],"syntax_errors":["near \"{token}\": syntax error"],"table_errors":["no such table: {table}"]}},"suspicious_patterns":["sqlmap","nessus","burp","zap","metasploit","nuclei","gobuster","dirbuster"],"usernames":{"prefixes":["admin","user","developer","root","system","db","api","service","deploy","test","prod","backup","monitor","jenkins","webapp"],"suffixes":["","_prod","_dev","_test","123","2024","_backup","_admin","01","02","_user","_service","_api"]},"users":{"roles":["Administrator","Developer","Manager","User","Guest","Moderator","Editor","Viewer","Analyst","Support"]},"xxe_responses":{"default_content":"root:x:0:0:root:/root:/bin/bash\nwww-data:x:33:33:www-data:/var/www:/usr/sbin/nologin","entity_processed":{"entity_values":["admin_credentials","database_connection","api_secret_key","internal_server_ip","encrypted_password"],"template":"\u003c?xml version=\"1.0\"?\u003e\n\u003cresponse\u003e\n \u003cstatus\u003esuccess\u003c/status\u003e\n \u003cmessage\u003eEntity processed successfully\u003c/message\u003e\n \u003centity_value\u003e{entity_value}\u003c/entity_value\u003e\n\u003c/response\u003e\n"},"error":{"messages":["External entity not allowed","XML parsing error","Invalid entity reference"],"template":"\u003c?xml version=\"1.0\"?\u003e\n\u003cresponse\u003e\n \u003cstatus\u003eerror\u003c/status\u003e\n \u003cmessage\u003e{message}\u003c/message\u003e\n\u003c/response\u003e\n"},"file_access":{"template":"\u003c?xml version=\"1.0\"?\u003e\n\u003cresponse\u003e\n \u003cstatus\u003esuccess\u003c/status\u003e\n \u003cdata\u003e{content}\u003c/data\u003e\n\u003c/response\u003e\n"}}} diff --git a/src/config.py b/src/config.py index be00ef7..84528e0 100644 --- a/src/config.py +++ b/src/config.py @@ -97,6 +97,10 @@ class Config: ai_reasoning_enabled: bool = True ai_reasoning_effort: str = "medium" + # Custom page template settings + # If `custom_template_path` is set (non-null), the custom template will be used. + custom_template_path: Optional[str] = None + # Deception pages import settings deception_import_pages: bool = True @@ -193,6 +197,12 @@ def from_yaml(cls) -> "Config": logging_cfg = data.get("logging", {}) ai = data.get("ai", {}) deception = data.get("deception", {}) + # Support legacy nested `page_template` or top-level `custom_template_path`. + page_template = data.get("page_template", {}) + custom_template_path = data.get("custom_template_path", None) + # If nested page_template is present and defines custom_template_path, prefer it + if not custom_template_path and isinstance(page_template, dict): + custom_template_path = page_template.get("custom_template_path", None) # Handle dashboard_secret_path - auto-generate if null/not set dashboard_path = dashboard.get("secret_path") @@ -315,6 +325,7 @@ def from_yaml(cls) -> "Config": ai_timeout=ai.get("timeout", 60), ai_max_daily_requests=ai.get("max_daily_requests", 0), deception_import_pages=deception.get("import_pages", True), + custom_template_path=custom_template_path, ) diff --git a/src/templates/html/main_page.html b/src/templates/html/main_page.html index 578e6f1..aeb36ec 100644 --- a/src/templates/html/main_page.html +++ b/src/templates/html/main_page.html @@ -5,7 +5,7 @@ Krawl me! @@ -100,7 +100,7 @@

Krawl me!

{counter}
diff --git a/src/templates/html_templates.py b/src/templates/html_templates.py index 50d94dc..a180723 100644 --- a/src/templates/html_templates.py +++ b/src/templates/html_templates.py @@ -5,7 +5,14 @@ Templates are loaded from the html/ subdirectory. """ -from .template_loader import load_template +import os +from pathlib import Path +from config import get_config +from logger import get_app_logger +from .template_loader import load_template, load_template_from_path + + +_logged_custom_template_path: str | None = None def login_form() -> str: @@ -64,4 +71,26 @@ def input_form() -> str: def main_page(counter: int, content: str) -> str: """Generate main Krawl page with links and canary token""" - return load_template("main_page", counter=counter, content=content) + # Prefer explicit environment variable, then config.yaml setting + custom_path = os.environ.get("KRAWL_CUSTOM_TEMPLATE_PATH") + if not custom_path: + try: + cfg = get_config() + if cfg.custom_template_path: + custom_path = cfg.custom_template_path + except Exception: + custom_path = None + + if custom_path: + global _logged_custom_template_path + if _logged_custom_template_path != custom_path: + get_app_logger().info(f"Using custom template path: {custom_path}") + _logged_custom_template_path = custom_path + try: + return load_template_from_path(custom_path, counter=counter, content=content) + except Exception: + # On any failure, fall back to bundled template + pass + + bundled_template = Path(__file__).parent / "html" / "main_page.html" + return load_template_from_path(str(bundled_template), counter=counter, content=content) diff --git a/src/templates/template_loader.py b/src/templates/template_loader.py index fe53bf5..18ad430 100644 --- a/src/templates/template_loader.py +++ b/src/templates/template_loader.py @@ -61,10 +61,39 @@ def load_template(name: str, **kwargs) -> str: # Apply substitutions if kwargs provided if kwargs: - template = template.format(**kwargs) + try: + template = template.format(**kwargs) + except Exception: + # If formatting fails, return template unchanged (do not validate placeholders) + pass return template +def load_template_from_path(file_path: str, **kwargs) -> str: + """ + Load a template from an absolute or relative file path and perform + non-strict placeholder substitution. Replaces occurrences of + `{key}` with the provided value for each kwarg. If the file does + not exist or cannot be read, raises FileNotFoundError. + + This function deliberately does not validate that placeholders + like `{counter}` or `{content}` are present; it performs simple + replacements and returns the file contents even if placeholders + are missing. + """ + p = Path(file_path) + if not p.exists(): + raise FileNotFoundError(f"Template file not found: {file_path}") + + text = p.read_text(encoding="utf-8") + + # Perform safe replacements without raising KeyError + for k, v in kwargs.items(): + text = text.replace(f"{{{k}}}", str(v)) + + return text + + def clear_cache() -> None: """Clear the template cache. Useful for testing or development.""" _template_cache.clear()