An AI-powered cloud threat detection system that fuses multi-source security logs, detects anomalies using machine learning, maps threats to MITRE ATT&CK TTPs, enriches alerts with NVD CVEs, and uses Claude as an automated SOC analyst for plain-English alert triage.
AWS CloudTrail ──┐
BETH (K8s) ───┼──► Feature Engineering ──► Isolation Forest ──► MITRE ATT&CK Mapping ──► Claude SOC Analyst
Linux Auth ──┘
- Multi-source log fusion — AWS CloudTrail, BETH K8s syscalls, Linux SSH auth logs
- Unsupervised anomaly detection using Isolation Forest (AUROC 0.8935)
- MITRE ATT&CK TTP mapping with 4 rule-based detectors
- NVD CVE enrichment via NIST REST API for flagged events
- Claude-powered SOC analyst returning structured JSON: explanation, attack technique, recommended action
- Threshold tuning grid search to optimize Precision / Recall / F1
| Source | Dataset | Description |
|---|---|---|
| AWS CloudTrail | flaws.cloud logs | Real-world misconfigured AWS environment logs |
| K8s / Syscalls | BETH Dataset | Labelled Linux kernel syscall logs with evil column |
| Linux Auth | LogHub Linux_2k.log | Real SSH authentication logs with brute-force patterns |
| Model | AUROC | Notes |
|---|---|---|
| Isolation Forest | 0.8935 | Primary detector, n_estimators=200, contamination=0.005 |
| LSTM Autoencoder | ~0.49 | Experimental only, not used in final pipeline |
CloudGuard_v1.ipynb # Main notebook — run top to bottom in Colab
requirements.txt # Python dependencies
LICENSE # MIT License
README.md
| Cells | What it does |
|---|---|
| 0 | Install dependencies |
| 1-5 | Download datasets (CloudTrail, BETH, Linux auth) |
| 6-8 | Parse each source into unified schema |
| 9-11 | Fuse sources, engineer features, train/test split |
| 12-20 | Label fixing and data validation |
| 21-23 | Train Isolation Forest, evaluate, threshold tuning |
| 24-28 | LSTM Autoencoder (experimental) |
| 29 | Finalize models |
| 30 | NVD CVE enrichment function |
| 31 | MITRE ATT&CK TTP mapping rules |
| 32-34 | Claude SOC analyst setup and end-to-end test |
This project runs on Google Colab with a T4 GPU. Runtime > Change runtime type > T4 GPU
Click the key icon in the Colab left sidebar and add:
| Secret Name | Where to get it |
|---|---|
| KAGGLE_USERNAME | Your Kaggle username from kaggle.com/settings |
| KAGGLE_KEY | kaggle.com/settings > API > Create New Token |
| ANTHROPIC_API_KEY | console.anthropic.com > API Keys |
Upload CloudGuard_v1.ipynb to colab.research.google.com and run cells top to bottom.
pip install anthropic scikit-learn tensorflow keras pandas numpy
pip install fastapi uvicorn requests python-dotenv kaggle mitreattack-pythonTTP detected: T1485 - Data Destruction
CVEs found: 3
=== Claude SOC Analysis ===
Explanation : DeleteBucket was called at 3:14 AM from an IP not previously
seen in this account, targeting the production backup bucket.
This is a strong indicator of compromised credentials.
Attack Tech : T1485 - Data Destruction (MITRE ATT&CK)
Action : Immediately revoke the admin IAM credentials, enable S3
versioning and MFA delete on all buckets, and review
CloudTrail for prior reconnaissance from the flagged IP.
- Python 3.12
- scikit-learn (Isolation Forest)
- TensorFlow / Keras (LSTM Autoencoder)
- Anthropic Claude API
- mitreattack-python (MITRE ATT&CK)
- NIST NVD REST API (CVE enrichment)
- pandas, numpy
- Google Colab T4 GPU
No credentials are hardcoded in this notebook. All secrets are loaded at runtime via google.colab.userdata. Never commit API keys to source control.
This project is licensed under the MIT License - see the LICENSE file for details.
Copyright (c) 2026 Mathanprasath K
You are free to use, modify, and distribute this project, but you must include the original copyright notice and give credit to the author.