A two-stage ML classifier for network attack detection, trained on the NSL-KDD dataset (or a synthetic fall-back if you don't have the dataset locally).
| Stage | Task | Model |
|---|---|---|
| 1 | Binary: attack vs. benign | Random Forest |
| 2 | Multi-class: attack category (DoS / Probing / R2L / U2R) | Gradient Boosting |
06_network_ids/
├── data/
│ └── README.md # how to grab NSL-KDD
├── models/
│ ├── train.py
│ └── *.pkl # generated
├── evaluation/
│ ├── evaluate.py
│ └── adversarial_eval.py
├── results/
├── requirements.txt
└── README.md
cd 06_network_ids
python -m venv .venv
source .venv/bin/activate # Linux / macOS
# .venv\Scripts\Activate.ps1 # Windows PowerShell
pip install -r requirements.txtDownload the dataset:
# Linux / macOS
wget https://iscxdownloads.cs.unb.ca/iscxdownloads/NSL-KDD/NSL-KDD.zip
unzip NSL-KDD.zip -d data/
# Windows (PowerShell)
Invoke-WebRequest https://iscxdownloads.cs.unb.ca/iscxdownloads/NSL-KDD/NSL-KDD.zip -OutFile NSL-KDD.zip
Expand-Archive NSL-KDD.zip data\You should now have data/KDDTrain+.txt.
If data/KDDTrain+.txt is not present, the training script automatically
falls back to a small synthetic NSL-KDD-shaped dataset so the project still
runs end-to-end:
python models/train.py # auto-detects whether NSL-KDD is present
python evaluation/evaluate.py
python evaluation/adversarial_eval.pyYou can also force the synthetic mode explicitly:
python models/train.py --syntheticSample output from adversarial_eval.py (synthetic data):
Epsilon Detection rate Bypass rate
--------------------------------------------
0.00 89.92% 10.08%
0.50 83.13% 16.87%
1.00 76.95% 23.05%
2.00 68.52% 31.48%
Detection rate falls roughly linearly as feature noise grows — useful as a baseline for evasion-robustness work.
The training script uses SMOTE for class balancing if imblearn is installed.
If it's not available, it skips SMOTE and prints a warning. This is fine for
a smoke test; install imbalanced-learn for the full pipeline.