AI WAF Deployment

A high-performance, containerized Web Application Firewall (WAF) that combines traditional rule-based filtering with deep learning-based anomaly detection. It is designed for low-latency inference, robust logging, and continuous self-improvement.

🚀 Quick Start

1. Configure Environment

cp env.example .env
# Adjust credentials in .env if needed (default: waf_user/waf_password)

2. Start Services

docker compose up -d

Note: The initial build compiles the Go core and builds the Python inference images. This may take a few minutes.

3. Access Applications

The WAF Gateway protects the following vulnerable test applications:

Application	Protected URL	Direct URL (Bypass WAF)	Description
Juice Shop	`http://localhost:8081`	`http://localhost:3000`	Modern web app vulnerabilities (OWASP Top 10)
DVWA	`http://localhost:8082`	`http://localhost:9001`	PHP/MySQL vulnerable app
WebGoat	`http://localhost:8083`	`http://localhost:9002`	Java/Spring vulnerable app

4. Monitoring & Management

Service	URL	Credentials (Default)
Grafana	`http://localhost:9091`	`admin` / `admin`
Prometheus	`http://localhost:9094`	N/A
ClickHouse	`http://localhost:8123`	`waf_user` / `waf_password`
WAF Health	`http://localhost:8080/health`	N/A

🏗️ System Architecture

The system follows a microservices architecture optimized for speed and observability.

1. WAF Gateway (`waf-gateway`)

Technology: Nginx + OpenResty (LuaJIT).
Role: Reverse proxy and enforcement point.
Key Features:
- GeoIP Enrichment: Uses resty.maxminddb to add location data (Country, City, Lat/Lon) to every request.
- Fail-Open Design: If waf-core is unreachable or times out, the gateway allows the traffic to prevent service disruption.
- Metadata Extraction: Extracts Headers, Method, Path, and Body to send to the Core.

2. WAF Core (`waf-core`)

Technology: Go (Golang).
Role: Central decision engine.
Key Features:
- In-Memory Caching: Caches decisions for identical requests to reduce inference load.
- Session Tracking: Automatically extracts Session IDs (connect.sid, phpsessid, jsessionid, etc.) and JWTs (Authorization: Bearer, token cookie) for user-level analytics.
- Async Logging: Pushes logs to ClickHouse asynchronously to minimize latency.
- Prometheus Exporter: Exposes detailed runtime metrics.

3. Inference Engine (`inference`)

Technology: Python, FastAPI, PyTorch, Transformers.
Role: Anomaly detection.
Model: Uses a Transformer model (e.g., GPT-2, BERT) loaded via peft (Parameter-Efficient Fine-Tuning).
Logic: Computes a "perplexity" or "anomaly score" for the request. High scores indicate malicious payloads.
Lifecycle: Loads the model specified in models/registry.json at startup. Requires restart to reload models.

4. Data & Training (`clickhouse`, `auto-retrainer`)

ClickHouse: Stores high-volume request logs (waf_logs table) and aggregated stats (region_stats, session_stats).
Auto-Retrainer:
- Periodically fetches benign samples from ClickHouse.
- Fine-tunes the base model using LoRA.
- Merges the adapter into the base model and updates the registry.

📊 Observability

Prometheus Metrics

The waf-core service exposes the following metrics at :8080/metrics:

waf_requests_total{decision, reason}: Counter of allowed/blocked requests.
waf_request_duration_seconds: Histogram of total processing time.
waf_model_inference_duration_seconds: Histogram of time spent waiting for the Python inference service.
waf_anomaly_score: Histogram of model scores (0.0 - 1.0).
waf_ingestion_items_total: Counter of logs ingested into ClickHouse.

Grafana Dashboards

Pre-configured dashboards provide visibility into:

Traffic Overview: RPS, Block Rate, Latency.
Threat Intelligence: Top Attacking IPs, Countries, and User Agents.
Model Performance: Score distribution and inference timing.

🛠️ Manual Operations

Trigger Retraining

To manually trigger the learning loop (fetch data -> train -> promote):

docker compose exec auto-retrainer python auto_retrainer.py --train --promote --once
# After completion, restart inference to apply changes:
docker compose restart inference

Batch Log Ingestion

To ingest historical Nginx access logs for initial training:

docker compose run --rm \
  -v /absolute/path/to/your/access.log:/data/access.log \
  ingestor \
  python batch_ingestor.py /data/access.log

📂 Project Structure

.
├── waf-gateway/      # Nginx config, Lua scripts (access.lua)
├── waf-core/         # Go app (server.go, stats.go, clickhouse_ingestor.go)
├── inference/        # Python app (app.py, model loading)
├── training/         # Auto-retrainer logic (train_lora.py, auto_retrainer.py)
├── models/           # Shared volume for model artifacts and registry.json
├── monitoring/       # Configs for Prometheus and Grafana
├── ingestor/         # Batch ingestion scripts
└── docker-compose.yml

⚙️ Configuration

Key environment variables in .env:

WAF_THRESHOLD: Score threshold (0.0 - 1.0). Requests scoring above this are blocked. Default: 0.6.
INFER_CONCURRENCY: Max concurrent requests to the inference engine.
CLICKHOUSE_*: Database credentials.
MAXMIND_KEY: License key for GeoIP updates.

📜 License

MIT

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AI WAF Deployment

🚀 Quick Start

1. Configure Environment

2. Start Services

3. Access Applications

4. Monitoring & Management

🏗️ System Architecture

1. WAF Gateway (`waf-gateway`)

2. WAF Core (`waf-core`)

3. Inference Engine (`inference`)

4. Data & Training (`clickhouse`, `auto-retrainer`)

📊 Observability

Prometheus Metrics

Grafana Dashboards

🛠️ Manual Operations

Trigger Retraining

Batch Log Ingestion

📂 Project Structure

⚙️ Configuration

📜 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
.github		.github
clickhouse-init		clickhouse-init
inference		inference
ingestor		ingestor
models		models
monitoring		monitoring
scripts		scripts
training		training
waf-core		waf-core
waf-gateway		waf-gateway
.dockerignore		.dockerignore
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
env.example		env.example

Folders and files

Latest commit

History

Repository files navigation

AI WAF Deployment

🚀 Quick Start

1. Configure Environment

2. Start Services

3. Access Applications

4. Monitoring & Management

🏗️ System Architecture

1. WAF Gateway (waf-gateway)

2. WAF Core (waf-core)

3. Inference Engine (inference)

4. Data & Training (clickhouse, auto-retrainer)

📊 Observability

Prometheus Metrics

Grafana Dashboards

🛠️ Manual Operations

Trigger Retraining

Batch Log Ingestion

📂 Project Structure

⚙️ Configuration

📜 License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

1. WAF Gateway (`waf-gateway`)

2. WAF Core (`waf-core`)

3. Inference Engine (`inference`)

4. Data & Training (`clickhouse`, `auto-retrainer`)

Packages