A high-performance, containerized Web Application Firewall (WAF) that combines traditional rule-based filtering with deep learning-based anomaly detection. It is designed for low-latency inference, robust logging, and continuous self-improvement.
cp env.example .env
# Adjust credentials in .env if needed (default: waf_user/waf_password)docker compose up -dNote: The initial build compiles the Go core and builds the Python inference images. This may take a few minutes.
The WAF Gateway protects the following vulnerable test applications:
| Application | Protected URL | Direct URL (Bypass WAF) | Description |
|---|---|---|---|
| Juice Shop | http://localhost:8081 |
http://localhost:3000 |
Modern web app vulnerabilities (OWASP Top 10) |
| DVWA | http://localhost:8082 |
http://localhost:9001 |
PHP/MySQL vulnerable app |
| WebGoat | http://localhost:8083 |
http://localhost:9002 |
Java/Spring vulnerable app |
| Service | URL | Credentials (Default) |
|---|---|---|
| Grafana | http://localhost:9091 |
admin / admin |
| Prometheus | http://localhost:9094 |
N/A |
| ClickHouse | http://localhost:8123 |
waf_user / waf_password |
| WAF Health | http://localhost:8080/health |
N/A |
The system follows a microservices architecture optimized for speed and observability.
- Technology: Nginx + OpenResty (LuaJIT).
- Role: Reverse proxy and enforcement point.
- Key Features:
- GeoIP Enrichment: Uses
resty.maxminddbto add location data (Country, City, Lat/Lon) to every request. - Fail-Open Design: If
waf-coreis unreachable or times out, the gateway allows the traffic to prevent service disruption. - Metadata Extraction: Extracts Headers, Method, Path, and Body to send to the Core.
- GeoIP Enrichment: Uses
- Technology: Go (Golang).
- Role: Central decision engine.
- Key Features:
- In-Memory Caching: Caches decisions for identical requests to reduce inference load.
- Session Tracking: Automatically extracts Session IDs (
connect.sid,phpsessid,jsessionid, etc.) and JWTs (Authorization: Bearer,tokencookie) for user-level analytics. - Async Logging: Pushes logs to ClickHouse asynchronously to minimize latency.
- Prometheus Exporter: Exposes detailed runtime metrics.
- Technology: Python, FastAPI, PyTorch, Transformers.
- Role: Anomaly detection.
- Model: Uses a Transformer model (e.g., GPT-2, BERT) loaded via
peft(Parameter-Efficient Fine-Tuning). - Logic: Computes a "perplexity" or "anomaly score" for the request. High scores indicate malicious payloads.
- Lifecycle: Loads the model specified in
models/registry.jsonat startup. Requires restart to reload models.
- ClickHouse: Stores high-volume request logs (
waf_logstable) and aggregated stats (region_stats,session_stats). - Auto-Retrainer:
- Periodically fetches benign samples from ClickHouse.
- Fine-tunes the base model using LoRA.
- Merges the adapter into the base model and updates the registry.
The waf-core service exposes the following metrics at :8080/metrics:
waf_requests_total{decision, reason}: Counter of allowed/blocked requests.waf_request_duration_seconds: Histogram of total processing time.waf_model_inference_duration_seconds: Histogram of time spent waiting for the Python inference service.waf_anomaly_score: Histogram of model scores (0.0 - 1.0).waf_ingestion_items_total: Counter of logs ingested into ClickHouse.
Pre-configured dashboards provide visibility into:
- Traffic Overview: RPS, Block Rate, Latency.
- Threat Intelligence: Top Attacking IPs, Countries, and User Agents.
- Model Performance: Score distribution and inference timing.
To manually trigger the learning loop (fetch data -> train -> promote):
docker compose exec auto-retrainer python auto_retrainer.py --train --promote --once
# After completion, restart inference to apply changes:
docker compose restart inferenceTo ingest historical Nginx access logs for initial training:
docker compose run --rm \
-v /absolute/path/to/your/access.log:/data/access.log \
ingestor \
python batch_ingestor.py /data/access.log.
├── waf-gateway/ # Nginx config, Lua scripts (access.lua)
├── waf-core/ # Go app (server.go, stats.go, clickhouse_ingestor.go)
├── inference/ # Python app (app.py, model loading)
├── training/ # Auto-retrainer logic (train_lora.py, auto_retrainer.py)
├── models/ # Shared volume for model artifacts and registry.json
├── monitoring/ # Configs for Prometheus and Grafana
├── ingestor/ # Batch ingestion scripts
└── docker-compose.yml
Key environment variables in .env:
WAF_THRESHOLD: Score threshold (0.0 - 1.0). Requests scoring above this are blocked. Default:0.6.INFER_CONCURRENCY: Max concurrent requests to the inference engine.CLICKHOUSE_*: Database credentials.MAXMIND_KEY: License key for GeoIP updates.
MIT