Skip to content

rmednitzer/core-graph

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

313 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Core-Graph

A converged graph-vector knowledge platform built on PostgreSQL with Apache AGE and pgvector. Designed for EU-sovereign deployment with security, compliance, and operational assurance as structural properties.

Ask DeepWiki CI License

What it does

core-graph is a canonical convergence point for heterogeneous security and infrastructure data. Satellite systems publish structured entities through NATS JetStream into a PostgreSQL hub where they are stored as a graph (Apache AGE), enriched with vector embeddings (pgvector), and exposed through multiple interfaces.

Data domains (eight ontology layers)

Layer Description Standards
Threat intelligence TTPs, indicators, campaigns, threat actors, malware, vulnerabilities STIX 2.1, MITRE ATT&CK
Security events Normalised alerts and detections from Wazuh, EDR, IDS/IPS OCSF 1.1
OSINT Feed aggregation, entity extraction, deduplication STIX 2.1
Audit and compliance Evidence chains, control mapping (NIS2, BSI Grundschutz) OSCAL
AI memory Agent conversation context, reasoning traces, semantic embeddings MCP-aligned
Forensic timelines Bitemporal facts, chain of custody, immutable evidence CASE/UCO, STIX 2.1
Infrastructure and assets CMDB, network inventory, monitoring alerts Netbox/Prometheus aligned
Identity and access management IAM vertices with TLP:AMBER floor, Keycloak sync Keycloak/Cerbos aligned

Interfaces

  • MCP server -- primary AI agent interface (tool-based graph queries, semantic search)
  • REST API -- FastAPI-based CRUD and query endpoints for human consumers
  • TAXII 2.1 -- federated threat intelligence sharing with partner organisations

Status

Alpha: local development stack operational, schema stable, ingest pipeline functional, Helm chart and ArgoCD manifests ready.

Prerequisites

  • Python 3.13+
  • Docker and Docker Compose (for the local dev stack)
  • PostgreSQL 18+ with Apache AGE and pgvector (provided by the dev stack)
  • NATS Server 2.10+ (provided by the dev stack)

Quick start

git clone https://github.com/rmednitzer/core-graph.git
cd core-graph

# Install Python dependencies
pip install -e ".[dev,test]"

# Start the full dev stack (includes API on :8000)
make up

# Run migrations and load reference data
make migrate
make seed

The dev stack (make up) starts all services including the REST API on :8000. To run services locally instead (e.g. for hot-reload development), stop the stack first and start only infrastructure, then run the API outside Docker:

make down
docker compose -f deploy/docker/docker-compose.yml up -d postgres nats valkey spicedb cerbos minio

make serve          # REST API on :8000 (uvicorn --reload)
make mcp            # MCP server
make graph-writer   # Ingest graph writer

Deployment

Docker Compose (development)

make up       # start
make down     # stop
make reset    # drop + recreate database, re-run migrations and seeds

Helm chart (Kubernetes)

The Helm chart in deploy/k8s/helm/ bundles the API, graph writer, PostgreSQL, NATS JetStream, and Valkey. Each dependency can be disabled to point at external services.

# Lab (bundled dependencies, 2 API replicas)
helm install cg deploy/k8s/helm/

# Production (HA replicas, autoscaling, resource limits)
helm install cg deploy/k8s/helm/ -f deploy/k8s/helm/values-prod.yaml

# External PostgreSQL
helm install cg deploy/k8s/helm/ \
  --set postgres.enabled=false \
  --set postgres.external.host=my-pg.example.com \
  --set postgres.external.password=secret

See deploy/k8s/helm/values.yaml for the full configuration reference.

ArgoCD

Pre-built Application manifests are provided for both environments:

# Lab -- auto-sync, self-heal, CreateNamespace=true
kubectl apply -f deploy/k8s/helm/argocd/application-lab.yaml

# Production -- manual sync, change-control compliant
kubectl apply -f deploy/k8s/helm/argocd/application-prod.yaml

Air-gapped install (Zarf)

Zarf packages the Helm chart and all container images into a single signed tarball for disconnected clusters.

# Build (internet-connected machine)
zarf package create --confirm

# Deploy (air-gapped cluster)
zarf package deploy zarf-package-core-graph-amd64-0.1.0.tar.zst --confirm

# Deploy with production profile
zarf package deploy zarf-package-core-graph-amd64-0.1.0.tar.zst \
  --components="core-graph,prod-profile" --confirm

Architecture

  Satellites             NATS JetStream          Ingest Pipeline
  ----------             --------------          ---------------

  Wazuh (SIEM)    ──┐
  OpenCTI (TIP)   ──┤                        ┌─────────────────┐
  MISP (IOC DB)   ──┼──►  NATS JetStream  ──►│  NER + Entity   │
  OSINT Feeds     ──┤     (at-least-once)     │  Resolution +   │
  Netbox (CMDB)   ──┤                         │  Graph Writer   │
  Prometheus      ──┤                         └────────┬────────┘
  Keycloak (IdP)  ──┘                                  │
                                                       ▼
                                          ┌────────────────────────┐
                                          │    PostgreSQL 18+      │
                                          │  ┌────────┐ ┌────────┐ │
                                          │  │  AGE   │ │pgvector│ │
                                          │  │(graph) │ │(embed.)│ │
                                          │  └────────┘ └────────┘ │
                                          │  RLS · pgAudit · cron  │
                                          │  Bitemporal model      │
                                          └────────────┬───────────┘
                                                       │
                    ┌──────────────────────────────────┼──────────┐
                    │             API Layer            │          │
                    │   Cerbos (ABAC) + SpiceDB (ReBAC)          │
                    ├──────────┬───────────┬─────────────────────┤
                    │ MCP Server│ REST API │ TAXII 2.1           │
                    │(AI agents)│ (humans) │ (sharing)           │
                    └──────────┴───────────┴─────────────────────┘

  Evidence chain: audit_log ──► hash chain ──► MinIO WORM ──► cosign ──► Rekor

Key design decisions

  • PostgreSQL is the core. No Neo4j, no ArangoDB. Apache AGE for graph (openCypher), pgvector for embeddings (HNSW).
  • NATS JetStream as the message bus. At-least-once delivery, dead-letter queue with retry and archive.
  • Three-layer authorization: Cerbos (ABAC) evaluates TLP clearance and role policies, SpiceDB (ReBAC) evaluates compartment membership, PostgreSQL RLS enforces at the engine level. Even buggy application code cannot leak data.
  • Bitemporal model: four timestamps per fact (t_valid, t_invalid, t_recorded, t_superseded). Facts are invalidated, never deleted.
  • Evidence integrity: append-only audit log, SHA-256 hash chains, Merkle roots with RFC 3161 timestamps, MinIO WORM storage, cosign signing, Rekor transparency log.
  • EU-sovereign: all infrastructure runs on EU providers (Hetzner, self-hosted registries). No US cloud dependencies in production.

Development

Make targets

Target Description
make up Start Docker Compose dev stack
make down Stop dev stack
make reset Drop, recreate, migrate, and seed database
make migrate Run numbered SQL migrations
make seed Load reference data (MITRE ATT&CK, STIX, roles)
make serve REST API on :8000 (uvicorn --reload)
make mcp Run MCP server
make graph-writer Run graph writer worker
make psql Connect to dev database interactively
make test Run all tests (pytest + RLS enforcement)
make integration-test Run integration tests only
make lint Lint Python (ruff) and YAML policies
make bench Run performance benchmarks (NER, traversal, throughput)
make verify-chain Verify audit log hash chain
make verify-merkle Verify Merkle root chain
make stamp-merkle Request RFC 3161 timestamps for Merkle roots
make helm-validate Lint and template Helm charts
make deploy-lint Validate all deployment artifacts

Running tests

# All tests (unit + RLS enforcement)
make test

# Integration tests (requires running Docker stack)
make integration-test

# Specific test file
pytest tests/skills/test_asset_skills.py -v

# Linting
make lint

Repository layout

core-graph/
├── api/                 API layer
│   ├── rest/            FastAPI REST endpoints + middleware (OIDC, metrics, logging)
│   ├── mcp/             MCP server, skill registry, query templates
│   │   ├── skills/      Skill implementations (asset, compliance, identity, threat)
│   │   └── tools/       MCP tools (cypher query, entity resolve, vector search, ...)
│   ├── taxii/           TAXII 2.1 server for threat intel sharing
│   ├── authz/           SpiceDB (ReBAC) and Cerbos (ABAC) client modules
│   ├── utils/           AGE query guard, Cypher safety validation
│   └── db.py            Shared connection pool (psycopg-pool)
├── ingest/              Ingest pipeline
│   ├── connectors/      Satellite adapters (Wazuh, OpenCTI, MISP, OSINT, Netbox,
│   │                    Prometheus, Keycloak) -- all extend AdapterBase
│   ├── ner/             Named entity recognition (tier 1: regex + STIX patterns)
│   ├── resolver/        Entity resolution and deduplication
│   ├── dlq/             Dead-letter queue processor
│   ├── enrichment.py    ingest.* → enriched.* normalisation (+ enrichment_worker.py)
│   └── graph_writer.py  Batch graph writer with bitemporal versioning
├── schema/
│   ├── migrations/      Numbered SQL files (001_ through 027_), idempotent
│   └── seed/            Reference data (MITRE ATT&CK, STIX vocabularies, roles)
├── policies/            Cerbos YAML policies (threat entities, evidence, incidents, IAM)
├── evidence/            Evidence integrity
│   ├── chain/           Merkle root computation and hash chain verification
│   └── signing/         cosign signing, MinIO WORM storage, RFC 3161 timestamps
├── deploy/
│   ├── docker/          Docker Compose dev stack + hardened PostgreSQL config
│   ├── k8s/             Helm chart, ArgoCD manifests
│   ├── nats/            NATS server config (dev + prod)
│   └── grafana/         Dashboards and provisioning
├── tests/               Schema, RLS, ingest, integration, skills, TAXII tests
├── scripts/             Bootstrap, validation, benchmarks, MinIO init
├── docs/                Architecture, compliance, ontology, operations, runbooks
└── zarf.yaml            Air-gapped deployment package definition

Documentation

Detailed documentation lives in docs/:

Area Documents
Architecture Overview, Authorization model, RLS + AGE integration, IAM layer, Data residency
ADRs 0001 Establish ADR practice, 0002 Hybrid retrieval, 0003 Edge TLP denormalisation, 0004 Salience formula, 0005 Memory supersession, 0006 Code-base validation 2026-05, 0007 Modernization audit 2026-05, 0008 Authorization layering
Ontology Schema design, STIX mapping, OCSF normalization
Compliance NIS2 controls, BSI IT-Grundschutz
Operations Backup and restore, PostgreSQL hardening, Break-glass procedure, PG major upgrade, Database migration runbook
Runbooks Audit chain broken, Ingest pipeline stalled, DLQ overflow, RLS misconfiguration
Skills MCP skill registry

Conventions

  • Commits: Conventional Commits with scopes: feat:, fix:, docs:, schema:, policy:, deploy:, test:, skill:
  • Migrations: Numbered SQL files (001_, 002_, ...). No ORM.
  • Security: Parameterised SQL, AGE query templates (no string concatenation), RLS enforcement, Cerbos/SpiceDB authorization on every request.
  • Format: SI units, ISO 8601 dates (YYYY-MM-DD), 24h time, UTC unless explicitly local.

Contributing

See CONTRIBUTING.md for development workflow, code style, and PR guidelines.

Security

See SECURITY.md for vulnerability reporting and security design overview.

Licence

Apache-2.0. See LICENSE.

The core path (PostgreSQL + AGE + pgvector + NATS + Cerbos + cosign) is entirely Apache 2.0 / MIT / BSD / PostgreSQL Licence. Satellite components carry their own licences (GPL, AGPL) and operate as external services, not embedded in redistributable code.

About

Converged graph + vector knowledge platform on PostgreSQL (Apache AGE + pgvector) for security, threat intel, and AI memory

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Contributors

Languages