Aigis-Zero is a open-source EDR system built entirely from scratch in Rust. It monitors Linux endpoints for suspicious activity in real time, streams telemetry through a central fleet server, normalises and routes events via Apache Kafka, runs detection rules against a YARA engine, MITRE ATT&CK mapping, and ML based detection. It surfaces alerts to SOC team through a live React dashboard.
Most production EDR agents are written in C, C++, or Go. This project makes a deliberate choice of Rust for every backend component and the agent itself.
While C/C++ EDR agents are notoriously prone to memory corruption (a massive liability for a root-level service), and Go-based agents struggle with garbage collection pauses and heavy memory bloat under burst load, Rust gives us the best of both worlds.
We get the raw, low-overhead performance of C/C++ without the security risks of buffer overflows or use-after-free bugs. At the same time, we get the concurrency of Go but without a heavy runtime or unpredictable GC sweeps that drop event streams. By leveraging Tokio's async engine, the agent handles thousands of concurrent event streams on a razor-thin memory footprint.
graph TD
%% Endpoint Section
subgraph EP ["Endpoint (Linux)"]
OQ["osqueryd (eBPF Mode)"]
subgraph AZ ["aigis-zero Agent"]
EB[("Event Buffer (SQLite WAL)")]
CH["Command Handler"]
ISO["Isolation Module (nftables)"]
HL["Heartbeat Loop"]
end
end
%% Fleet Server Section
subgraph FS ["Fleet Server (Rust/Tonic)"]
NE["Node Enrollment"]
KH["Kafka Handler"]
HT["Health Tracker"]
end
%% Databases
subgraph DBs ["PostgreSQL Databases"]
DB_Nodes[("edr_nodes (Registry)")]
DB_Health[("node_health (Heartbeats)")]
DB_Logs[("edr_logs (Event Logs)")]
DB_Alerts[("edr_alerts (Alerts)")]
end
%% Kafka
subgraph KF ["Apache Kafka"]
K_Raw["aigis.events.raw"]
K_Typed["aigis.events.typed"]
K_Alerts["aigis.alerts"]
K_Health["aigis.health"]
end
%% Kafka Pipeline
subgraph KP ["Kafka Pipeline"]
Router["Event Router & Normaliser"]
end
%% Rule Engine
subgraph RE ["Rule Engine (YARA-X)"]
Scanner["Rule Scanner & MITRE Mapper"]
end
%% API Backend & Frontend
subgraph Operator ["Operator Console"]
API["API Backend (Axum / WebSockets)"]
UI["Frontend (React / Vite)"]
end
%% Communication Flows
OQ -->|"Thrift IPC"| AZ
EB -->|"gRPC Uplink"| KH
HL -->|"gRPC Heartbeat"| HT
CH <-->|"gRPC Bidirectional Stream"| FS
CH -->|"nftables rules"| ISO
%% Fleet Server DB writes
NE --> DB_Nodes
HT --> DB_Health
%% Kafka handling
KH --> K_Raw
K_Raw --> Router
Router --> K_Typed
Router --> DB_Logs
%% Rule scanning
K_Typed --> Scanner
Scanner --> K_Alerts
Scanner --> DB_Alerts
%% API and Dashboard
K_Alerts --> API
K_Health --> API
API <-->|"WebSockets (Live events)"| UI
UI -->|"IsolateCommand"| API
API -->|"Forward Commands"| FS
The codebase is organized as a single Cargo workspace containing 18 crates that separate the core services, the shared SDK, and the React frontend.
- sdk — Shared Protobuf definitions (
agent.proto,events.proto,fleet.proto) and common domain models. All other crates import from here. Strictly no business logic allowed. - agent — A single compiled binary (
aigis-zero) composed of 7 sub-crates:agent-bin: Bootstrap entry point, CLI config loader, and service lifecycle manager.agent-core: Tokio-backed orchestrator, backpressure-aware event loop, and exponential backoff retry loop (50ms base, ~12.8s max).osquery-client: Thrift IPC client interfacing with osqueryd via Unix sockets.event-buffer: SQLite-backed write-ahead log ensuring at-least-once telemetry delivery during network partitions.fleet-client: gRPC client managing Tonic bidirectional streams and heartbeats.isolation: Host quarantine control managing nftables drop rules with a fleet IP exemption.agent-tracing: Structured JSON telemetry logging using thetracingecosystem.
- fleet-server — Central fleet controller split into 8 crates:
fleet-server-bin: Entry point, env loading (dotenvy), DB schema migrations, and Tonic server initialization.grpc-listener: Implementation of the gRPCFleetServiceinterface.node-enrollment: Enrollment handler verifying nodes, issuing 24h JWTs, and recording registration state.health-tracker: Records node heartbeat timelines; isolates heartbeat state so agents can't overwrite quarantine flags.fleet-manager: Pure domain logic governing agent state machines and transitions.kafka-handler: Stream producer piping raw telemetry from agents directly into Kafka.postgres-interface: Data-access layer usingsqlxwith compile-time checked SQL and pessimistic locks (SELECT FOR UPDATE) for safe upserts.fleet-tracing: Shared logging initialization for the fleet server.
- kafka-pipeline — Dedicated pipeline consumer pulling from
aigis.events.raw, mapping events to typed topics by class, and saving normalized data intoedr_logs(using LZ4 compression and 5ms batching). - rule-engine — Event scanner checking normalized streams against YARA-X rules, indexing detections with MITRE ATT&CK codes, and publishing alerts to
aigis.alerts. (Currently in active development). - api-backend — Axum-based web gateway serving REST endpoints and managing live WebSocket connections for dashboard operators.
- frontend — React operator console built with TypeScript and Vite. Currently supports authentication, node tracking, and live alert feeds via Mock data.
- infra — Docker Compose manifests for KRaft/Zookeeper Kafka stacks and Postgres setups, along with Kubernetes deployment specs.
Agent
- Scheduled osquery polling with query intervals loaded from the fleet server via
ConfigUpdateCommand - SQLite write-ahead event buffer that survives network outages and agent restarts; configurable max-size with oldest-first eviction under pressure
- Bidirectional gRPC stream to fleet-server with exponential backoff reconnection
- Heartbeat loop reporting node status and buffered event count
- Fleet-commanded network isolation via nftables: drop-all policy with a single outbound carve-out for the fleet-server IP
- Structured JSON logging with per-component log level control
- Musl static binary for production deployment — zero glibc dependency, runs on any Linux kernel >= 4.18
- Cross-compiled release artifacts for
x86_64andaarch64via GitHub Actions
Fleet Server
- gRPC enrollment with 24-hour JWT token issuance
- Compile-time SQL verification via
sqlx— schema mismatches fail at build time, not at runtime - Strict
operator_status/agent_statusseparation: heartbeats cannot overwrite operator-assigned quarantine states - Time-series heartbeat tracking per node
- Kafka event forwarding with LZ4 compression
Kafka Pipeline
- Type-aware event routing to dedicated topics per event class (process, file, network, auth)
- Consumer group management with graceful shutdown via
CancellationToken
Rule Engine
- YARA-X based scanning — pure Rust, no
libyaraC dependency - MITRE ATT&CK technique and tactic mapping on alert records
- Structured
Alertwith threat score, severity, source, and triggering event reference
Infrastructure
- Three isolated PostgreSQL databases for node registry, event logs, and alerts
- Kafka with 12-partition event topics and 4-partition alert/health topics
- Kafka UI on port 8090 for local debugging
- Dev-mode KRaft Kafka (no Zookeeper) for faster local iteration
- Kubernetes manifests for fleet-server and supporting services
Active development is on the agent/bug-fixes-01 branch. This is the branch with the most commits and the most complete implementation across all components.
| Component | Status |
|---|---|
| SDK (protobuf definitions, shared models) | Complete |
| Agent binary (osquery polling, gRPC, buffer) | Complete |
| Agent network isolation (nftables) | Complete |
| Agent enrollment and JWT auth | Complete |
| Agent heartbeat | Complete |
| Agent config hot-reload | Scaffold — fleet command delivery works; client-side application in progress |
| Fleet server (enrollment, health tracking, Kafka forwarding) | Complete |
| Kafka pipeline (event router) | Complete |
| Kafka pipeline (normalisation and DB persistence) | In progress |
| Rule engine (YARA-X scanning, alert production) | Stubbed — binary compiles; rule loading and alerting in progress |
| API backend (REST and WebSocket) | Stubbed — binary compiles; route implementation in progress |
| Frontend (login, node list, live events tab) | Functional with mock data; WebSocket integration in progress |
| eBPF collector (aya) | Excluded from default workspace; under development on a separate branch |
| mTLS (agent to fleet) | Scaffold — cert paths in config; TLS handshake not yet wired |
| Kubernetes production deployment | Manifests present; not production-validated |
The zero-warning policy is enforced: cargo clippy --all-targets -- -D warnings and cargo fmt --check must pass before any merge.
| Tool | Minimum Version | Notes |
|---|---|---|
| Rust (stable) | 1.91 | Install via rustup |
| Docker and Docker Compose | Any recent | Required for the infra stack |
| Node.js | 18 | Required for frontend development |
| Linux kernel | 4.18 | Agent endpoint only; 5.10+ recommended |
| Architecture | x86_64 or aarch64 | Agent only |
| osquery | 5.23.0 | Agent endpoint only; installed by install.sh |
git clone -b agent/bug-fixes-01 https://github.com/swar09/project-edr.git
cd project-edr
cp .env.example .env
# Set POSTGRES_PASSWORD and any other required values in .env
cd infra
docker compose up -d
docker compose pskafka-init creates the required topics automatically on first start. Kafka UI is available at http://localhost:8090.
| Topic | Partitions | Purpose |
|---|---|---|
aigis.events.raw |
12 | Raw agent telemetry |
aigis.events.norm |
12 | Normalized events |
aigis.alerts |
4 | Detection alerts |
aigis.health |
4 | Node health |
| Database | Host Port | Purpose |
|---|---|---|
edr_nodes |
5433 | Node registry, enrollment, health |
edr_logs |
5432 | Normalized event log |
edr_alerts |
5434 | Detection alerts |
For lightweight local development (KRaft Kafka, no Zookeeper):
docker compose -f infra/docker-compose.dev.yml up -dsqlx performs compile-time query verification and requires DATABASE_URL to point to a live, migrated database.
export DATABASE_URL=postgres://edr:<password>@localhost:5433/edr_nodes
cargo build --workspace
cargo build --release --workspaceTo build against cached sqlx metadata without a live database:
export SQLX_OFFLINE=true
cargo build --workspaceCI checks:
cargo fmt --all -- --check
cargo clippy --workspace --all-targets -- -D warnings
cargo test --workspaceThe agent runs on Linux endpoints and requires root.
VERSION=agent-v0.1.0
ARCH=$(uname -m)
curl -fsSL \
"https://github.com/swar09/project-edr/releases/download/${VERSION}/aigis-zero-agent-linux-${ARCH}.tar.gz" \
-o aigis-zero-agent.tar.gz
tar -xzf aigis-zero-agent.tar.gz
cd aigis-zero-agent
sudo bash install.shThe installer handles osquery installation, directory setup, systemd unit registration, kernel tunables, and ulimits in a single run. See the agent/INSTALLATION_GUIDE.md for the full step-by-step breakdown.
Verify kernel prerequisites on the target endpoint:
uname -r # >= 4.18 required, >= 5.10 recommended
grep -E "CONFIG_BPF=y|CONFIG_BPF_SYSCALL=y" /boot/config-$(uname -r) 2>/dev/null || \
zcat /proc/config.gz 2>/dev/null | grep -E "CONFIG_BPF=y|CONFIG_BPF_SYSCALL=y"
ls /sys/kernel/btf/vmlinux && echo "BTF present"Disable auditd (required — auditd and osquery compete for the audit netlink socket, which only allows one consumer):
sudo systemctl stop auditd 2>/dev/null || true
sudo systemctl mask auditd 2>/dev/null || true
sudo systemctl mask --now systemd-journald-audit.socketInstall build dependencies:
# Debian/Ubuntu
sudo apt-get update
sudo apt-get install -y \
build-essential pkg-config libssl-dev \
libsystemd-dev libaudit-dev libcap-dev \
util-linux musl-tools
# RHEL/Rocky/Fedora
sudo dnf install -y \
gcc pkg-config openssl-devel \
audit-libs-devel systemd-devel \
util-linux-devel libcap-develInstall Rust:
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
source "$HOME/.cargo/env"
rustc --version # should be stable >= 1.91Build the agent:
# Native build (linked against system glibc)
cargo build --release --bin edr-agent
# Musl static build (recommended for production)
rustup target add x86_64-unknown-linux-musl
cargo build --release --target x86_64-unknown-linux-musl --bin edr-agent
# aarch64 musl (requires cross)
cargo install cross --git https://github.com/cross-rs/cross
cross build --release --target aarch64-unknown-linux-musl --bin edr-agentInstall osquery 5.23.0:
curl -fsSL https://pkg.osquery.io/linux/osquery-5.23.0_1.linux_x86_64.tar.gz \
-o osquery-5.23.0_1.linux_x86_64.tar.gz
sudo tar -xzf osquery-5.23.0_1.linux_x86_64.tar.gz -C /
sudo tee /etc/systemd/system/osqueryd.service << 'EOF'
[Unit]
Description=The osquery Daemon
After=network.target syslog.target
[Service]
Type=simple
TimeoutStartSec=0
ExecStartPre=/bin/mkdir -p /run/osquery
ExecStart=/usr/bin/osqueryd \
--flagfile=/etc/osquery/osquery.flags \
--config_path=/etc/osquery/osquery.conf
Restart=on-failure
KillMode=control-group
[Install]
WantedBy=multi-user.target
EOFInstall the agent binary, directories, configs, and systemd units:
sudo install -o root -g root -m 0755 \
target/x86_64-unknown-linux-musl/release/edr-agent \
/usr/sbin/aigis-zero
sudo mkdir -p /etc/aigis-zero /var/lib/aigis-zero /var/log/aigis-zero
sudo chmod 700 /etc/aigis-zero /var/lib/aigis-zero
sudo chmod 755 /var/log/aigis-zero
sudo install -o root -g root -m 640 agent/agent.toml /etc/aigis-zero/config.toml
# Edit config to set the fleet server host and port
sudo nano /etc/aigis-zero/config.toml
sudo install -o root -g root -m 644 \
agent/sysctl/60-aigis-zero.conf /etc/sysctl.d/
sudo sysctl --system
sudo install -o root -g root -m 644 \
agent/limits/99-aigis-zero.conf /etc/security/limits.d/
sudo mkdir -p /etc/osquery /var/osquery /var/log/osquery /run/osquery
sudo chmod 755 /etc/osquery && sudo chmod 750 /var/osquery && sudo chmod 755 /var/log/osquery /run/osquery
sudo install -o root -g root -m 644 agent/osquery/osquery.conf /etc/osquery/osquery.conf
sudo install -o root -g root -m 644 agent/osquery/osquery.flags /etc/osquery/osquery.flags
sudo touch /etc/osquery/extensions.load && sudo chmod 644 /etc/osquery/extensions.load
sudo install -o root -g root -m 644 \
agent/systemd/aigis-zero.service /etc/systemd/system/
sudo mkdir -p /etc/systemd/system/osqueryd.service.d
sudo install -o root -g root -m 644 \
agent/systemd/osqueryd.service.d/aigis-zero.conf \
/etc/systemd/system/osqueryd.service.d/aigis-zero.conf
sudo systemctl daemon-reload
sudo systemctl enable osqueryd aigis-zero
sudo systemctl start osqueryd
sudo systemctl start aigis-zero
sudo systemctl status osqueryd
sudo systemctl status aigis-zeroAgent configuration reference (/etc/aigis-zero/config.toml):
[agent]
log_level = "info" # trace | debug | info | warn | error
log_format = "json" # json | human
log_dir = "/var/log/aigis-zero"
data_dir = "/var/lib/aigis-zero"
event_buffer_db = "/var/lib/aigis-zero/events.db"
event_buffer_max = 500000 # max buffered events before oldest-drop
event_drain_batch = 100
event_drain_interval_secs = 5
[osquery]
socket_path = "/var/osquery/osquery.em"
conf_path = "/etc/osquery/osquery.conf"
flags_path = "/etc/osquery/osquery.flags"
connect_timeout_secs = 30
query_timeout_secs = 60
[fleet]
host = "<fleet-server-ip>"
port = 50051
heartbeat_interval_secs = 60
reconnect_interval_secs = 10
max_reconnect_attempts = 0 # 0 = retry forever
[isolation]
enabled = false # toggled by fleet-server IsolateCommandService management:
# Both services are fully independent — stopping one does not affect the other
systemctl status osqueryd
systemctl status aigis-zero
journalctl -u osqueryd -f
journalctl -u aigis-zero -f
systemctl stop osqueryd # aigis-zero continues buffering normally
systemctl stop aigis-zero # osqueryd continues collecting normallyUninstall:
# Method A: using the installer script
sudo bash uninstall.sh
sudo bash uninstall.sh --remove-osquery --purge-logs # full purge
# Method B: manual
sudo systemctl stop aigis-zero osqueryd
sudo systemctl disable aigis-zero osqueryd
sudo rm -f /usr/sbin/aigis-zero
sudo rm -rf /etc/aigis-zero /var/lib/aigis-zero
sudo rm -f /etc/systemd/system/aigis-zero.service
sudo rm -f /etc/systemd/system/osqueryd.service.d/aigis-zero.conf
sudo rm -f /etc/sysctl.d/60-aigis-zero.conf
sudo rm -f /etc/security/limits.d/99-aigis-zero.conf
sudo rm -f /etc/osquery/osquery.conf /etc/osquery/osquery.flags /etc/osquery/extensions.load
sudo rm -rf /var/osquery /run/osquery
sudo systemctl daemon-reloadTroubleshooting:
| Symptom | Likely cause | Resolution |
|---|---|---|
osqueryd: perf_event_open failed |
eBPF not enabled or kernel too old | Verify uname -r >= 4.18 and CONFIG_BPF_SYSCALL=y |
file_events table returns empty |
inotify watch limit too low | sudo sysctl -w fs.inotify.max_user_watches=524288 |
aigis-zero: connection refused on osquery socket |
osqueryd still starting | Wait for Extension manager started in journalctl -u osqueryd |
Permission denied on /var/osquery |
Directory ownership incorrect | sudo chown -R root:root /etc/osquery /var/osquery && sudo chmod 750 /var/osquery |
cargo build fails, DATABASE_URL not set |
sqlx compile-time check | Export DATABASE_URL pointing to the nodes DB or set SQLX_OFFLINE=true |
# Fleet server
export DATABASE_URL=postgres://edr:<password>@localhost:5433/edr_nodes
export KAFKA_BROKERS=localhost:29092
cargo run -p fleet-server-bin
# Kafka pipeline
export KAFKA_BROKERS=localhost:29092
cargo run -p kafka-pipeline
# Rule engine (stub)
cargo run -p rule-engine
# API backend (stub)
cargo run -p api-backendcd frontend
npm install
npm run dev # development server at http://localhost:5173
npm run build # production build to frontend/dist/mTLS (agent to fleet-server). The config scaffolding and cert paths exist in agent.toml and the fleet-server settings. The next step is wiring the TLS handshake in the Tonic channel builder on the agent side and configuring tonic server TLS on the fleet side. The design target is enrollment-issued certificates: each agent gets a short-lived cert signed by the fleet CA during RegisterAgent.
eBPF collector. The agent/crates/ebpf-collector crate is excluded from the default workspace build because it requires a kernel with BTF and the aya build toolchain. When this ships, the agent will collect process, network, and filesystem events directly from the kernel via eBPF programs, removing the dependency on osquery's audit-based collection and lifting the single-consumer constraint on the audit netlink socket.
Rule engine full implementation. YARA-X rule loading from the filesystem, consumer group wiring, alert production to aigis.alerts, and PostgreSQL persistence. The binary compiles; the business logic is the active workstream.
API backend routes. Full REST surface: node listing, node detail, alert query with filtering by severity and MITRE technique, and node isolation command forwarding to the fleet-server. WebSocket handler for live event streaming.
Frontend WebSocket integration. The dashboard shell is in place. Connecting it to the api-backend WebSocket for live node status and alert feed is the active frontend workstream.
Kafka normalisation and DB persistence. The event router is live. The next stage is the normalisation processor: consuming from typed topics, deserialising event payloads, and writing structured rows to edr_logs.
ML-based anomaly detection. The Alert proto already carries a source field for ml_model. A future workstream will add a statistical baseline model for process execution frequency and network behaviour, producing anomaly alerts alongside YARA rule hits.
Enrollment secret validation. The enrollment_secret field exists in agent.toml and the RegisterRequest proto. Fleet-server-side validation is not yet implemented.
Multi-tenancy. The current data model is single-tenant. Organisation-scoped node isolation and role-based operator access are planned.
Windows agent. The current agent is Linux-only. Windows support via ETW (Event Tracing for Windows) is on the long-term roadmap with no scheduled timeline.
- osquery documentation
- aya — eBPF for Rust
- Tonic — gRPC for Rust
- YARA-X
- MITRE ATT&CK Framework
- Apache Kafka documentation
- sqlx — async Rust SQL
- Axum — async web framework
- rdkafka — Rust Kafka client
- nftables documentation
- Tokio async runtime
This project is licensed under the MIT License.
Contributions are welcome! Please refer to the CONTRIBUTING.md guide for details on our code quality standards, branching conventions, and development workflow.
![]() Swar @swar09 Author & Maintainer |
Crafted in Rust. Full-stack ownership, zero compromise.
