Config-aware · Macro-resolved · Function-pointer-callable · SQLite-native
KGraph indexes what the compiler sees — not what the parser guesses.
Kernel code is not regular code. It lives behind #ifdef CONFIG_*, inside SYSCALL_DEFINE* macros,
behind file_operations function-pointer tables that tree-sitter can't follow.
Existing tools (codegraph, semcode) parse syntax — KGraph parses compilation truth.
| What others miss | What KGraph captures |
|---|---|
All #ifdef branches (most are dead under your config) |
Only the code your defconfig actually compiles |
EXPORT_SYMBOL / SYSCALL_DEFINE* as opaque text |
Macro-expanded symbols with real names and positions |
f_op->read_iter() → "can't resolve" |
ops_bind edge: .read_iter = ext4_file_read_iter → concrete function |
Same-named static helpers across TU → name collision |
Per-TU disambiguation via clang symbol resolution |
Result: LLM agents find root-cause paths in 3 tool-calls / ~1.5k tokens where grep-based workflows burn 10k+ tokens and scatter across irrelevant branches.
Once your kernel has a compile_commands.json, the whole setup is three commands:
# 1. Install kgraph (bundles Python 3.10 + scip-clang, Linux x86-64 only)
npm install -g @ajksunkang-aios/kgraph
# 2. Wire kgraph's MCP server into your AI agents (auto-detects what's installed)
kgraph install
# 3. Build the code graph for this kernel (run inside the kernel source dir)
cd /path/to/linux
kgraph init .That's it. Restart your agent and ask it structural questions about the kernel — it calls KGraph's MCP tools instead of grepping.
> What functions call ext4_file_read_iter?
> What implements ->read_iter across the kernel?
> Show me the body of generic_file_read_iter.
npm install kgraph install kgraph init .
┌──────────────────┐ ┌──────────────────┐ ┌────────────────────┐
│ @ajksunkang-aios │ → │ configure agents │ → │ scip-clang → SQLite │
│ /kgraph │ │ (claude/cursor/ │ │ .kgraph/kgraph.db │
│ (Python bundled) │ │ codex/opencode/ │ │ ready for queries │
│ │ │ hermes) │ │ │
└──────────────────┘ └──────────────────┘ └────────────────────┘
Prerequisite: a kernel tree with
compile_commands.jsonbuilt using clang (make CC=clang LLVM=1). See Detailed Setup below for how to produce it, Docker included.
KGraph is compiler-aware — it indexes what the compiler actually sees, so it needs a clang compilation database. Docker gives you a clean, reproducible build environment:
docker run --platform linux/amd64 -it --rm \
-v "$(pwd):/workspace" -w /workspace ubuntu:latest
# Inside the container:
apt-get update && apt-get install -y clang llvm make bc flex bison libelf-dev libssl-dev
make CC=clang LLVM=1 x86_64_defconfig
make CC=clang LLVM=1 -j$(nproc)
./scripts/clang-tools/gen_compile_commands.pyThis produces compile_commands.json (~5–50 MB) listing exactly the .c files your
defconfig compiles, with the exact compiler flags. This config-awareness is what
makes KGraph different from syntax-only tools.
Generate index.scip with scip-clang
kgraph init runs this for you, but you can also run it directly. scip-clang is a
Linux x86-64 binary — run it in the same Docker/Linux environment:
# Inside the container, with scip-clang available:
./scip-tools/scip-clang --compdb-path ./compile_commands.json
# → produces index.scip (~hundreds of MB for a full defconfig)# Linux x86-64 (bundles Python 3.10 + scip-clang, no pre-requisites needed)
npm install -g @ajksunkang-aios/kgraphThe npm package includes a vendored Python 3.10 runtime and all dependencies — no need to install Python, pip, or protobuf separately.
kgraph installkgraph install runs a detect() pass that reads each agent's config file/dir,
identifies which AI agents are present on your system, and auto-configures the
detected ones. Supported agents and where they're configured:
| Agent | Config file | Format |
|---|---|---|
| Claude Code | ~/.claude.json + ~/.claude/settings.json |
JSON mcpServers + permissions |
| Cursor | ~/.cursor/mcp.json |
JSON mcpServers |
| Codex CLI | ~/.codex/config.toml |
TOML [mcp_servers.kgraph] |
| opencode | ~/.config/opencode/opencode.json |
JSONC mcp.kgraph |
| Hermes Agent | ~/.hermes/config.yaml |
YAML mcp_servers + toolsets |
kgraph detect # show what's detected, write nothing
kgraph install # auto-detect & configure installed agents
kgraph install --target claude,cursor # configure specific agents
kgraph install --location local # per-project config (./.mcp.json etc.)
kgraph uninstall # remove kgraph config from all agentsPrefer manual setup? See mcp/examples/ — ready-to-edit config
snippets for every agent.
cd /path/to/linux # the kernel source dir (where compile_commands.json lives)
kgraph init .kgraph init does the following automatically:
- venv — sets up a Python 3.10+ virtual environment with
protobuf>=7.35(upb) - Index — runs
scip-clang --compdb-path compile_commands.json→index.scip(skipped ifindex.scipalready exists) - Ingest — parses the SCIP protobuf, derives the call graph +
ops_bindedges, writes everything into./.kgraph/kgraph.db - Enrich — maps MAINTAINERS → subsystem labels
Everything stays inside the kernel tree (index.scip, .kgraph/kgraph.db) — the graph
is per-project, so each kernel you index gets its own database.
kgraph init . --skip-build # index.scip already exists, just ingest
kgraph init . --subsystem fs/ext4 # scope to a subsystem (faster)
kgraph init . --force # rebuild from scratchManual venv setup (if kgraph init can't find python3.10+)
# Find a python3.10+ on your system, then:
python3.10 -m venv /path/to/KGraph/.venv
source /path/to/KGraph/.venv/bin/activate
pip install "protobuf>=7.35.0,<8"
python -c "import google.protobuf; print(google.protobuf.__version__)" # → 7.35.0Restart your agent so the MCP server loads. It now has KGraph's tools — ask structural questions and it queries the graph instead of grepping:
> What functions call ext4_file_read_iter? → find_callers
> What does generic_file_read_iter call? → find_callees
> Show me the body of ext4_file_read_iter. → get_function_body
> What implements ->read_iter across the kernel? → find_ops_impls
> Where is vfs_read referenced? → find_references
KGraph exposes 13 tools — a minimal viable set covering the most common agent code-indexing needs. Every tool is config-aware and compiler-resolved.
| Tool | Purpose | Key params |
|---|---|---|
search_symbols(query) |
Fuzzy full-text search by name (FTS5) | kind, limit |
get_symbol(name) |
Exact-name lookup → definition + signature | kind, limit |
get_function_body(name) |
Read the actual source body from disk (with line numbers) | kind, context |
| Tool | Purpose | Key params |
|---|---|---|
find_callers(name) |
Who calls this function — includes ops_bind |
depth, limit |
find_callees(name) |
What this function calls — includes ops_bind |
depth, limit |
call_path(source, target) |
Call path between two functions | max_len |
get_callchain(name) |
Call chain from a function up to a root (syscall/entry), incl. ops_bind |
max_depth |
find_references(name) |
Every use site of a symbol, with enclosing function | limit |
| Tool | Purpose | Key params |
|---|---|---|
find_type_definition(name) |
Go-to-type-definition (type_of edges) |
— |
get_struct_layout(name) |
Struct fields (contains edges) |
— |
get_neighborhood(name) |
N-hop subgraph — token-efficient context pack | depth, edge_types, summary |
| Tool | Purpose | Key params |
|---|---|---|
find_ops_impls(field_name) |
★ Function-pointer field → all implementations | struct_type |
index_status() |
Index metadata + statistics | — |
★ find_ops_impls is the killer tool. It resolves indirect calls through kernel
function-pointer tables (VFS ops, driver ops, net proto ops) that grep and syntax-based
tools cannot follow. One call to find_ops_impls("read_iter") returns every filesystem
and driver read_iter implementation across the kernel:
ext4_file_operations → ext4_file_read_iter @ fs/ext4/file.c
shmem_file_operations → shmem_file_read_iter @ mm/shmem.c
socket_file_ops → sock_read_iter @ net/socket.c
... (16 found)
find_callers/find_callees accept depth and limit; get_neighborhood returns
compact name + file:line by default (summary=true). Agents stay within budget instead
of exploding into full subgraphs.
┌───────────────────────────────────────────────────────────────┐
│ Your Code Agent │
│ "What implements ->read_iter?" → calls KGraph tools directly │
└─────────────────────────────────┬─────────────────────────────┘
│
▼
┌───────────────────────────────────────────────────────────────┐
│ KGraph MCP Server (13 tools) │
│ search · get_symbol · get_function_body · callers · callees │
│ call_path · callchain · references · type_definition │
│ struct_layout · neighborhood · ops_impls · index_status │
│ │ │
│ ▼ │
│ SQLite knowledge graph (.kgraph/kgraph.db) │
│ symbols · occurrences · edges · ops_bind · subsystem │
└───────────────────────────────────────────────────────────────┘
- Build —
make CC=clang LLVM=1producescompile_commands.json(what the compiler actually compiles). - Index —
scip-clangemitsindex.scipwith full semantic symbol information per compilation unit. - Ingest — Python (protobuf 7.x / upb) parses SCIP into
IngestBatchobjects, derives call edges fromenclosing_range, derivesops_bindedges from function-pointer table initializations, writes into SQLite via theGraphStoreinterface. - Enrich —
KernelProfilemaps MAINTAINERS → subsystem labels, tags config-gated symbols. - Serve — MCP server exposes graph queries via recursive CTE on SQLite.
The IngestBatch → GraphStore boundary keeps the parser fully decoupled from storage,
so swapping SQLite for another backend (Neo4j, a custom embedded DB) means implementing
one new GraphStore — the parser, MCP tools, and agent integration don't change.
GraphView has two sides, both in the graphview/ directory:
Health dashboard — KGraph continuously proves it can build a correct index on Linux
mainline, automatically every day on free GitHub Actions runners. The
Linux Build & Index Probe clones torvalds/linux,
builds + indexes + ingests, runs a synthetic retrieval canary, and emits metrics.json.
The last 7 runs render as a dependency-free static 7-day list (buildable ✓/✗, canary M/N,
symbol/edge counts, timing), published to GitHub Pages by deploy-graphview.yml.
Interactive explorer (kgraph view) — a local read-only HTTP server (stdlib, zero new deps)
that serves the code graph over your own kgraph.db: search → center a symbol → explore its
callers/callees/neighborhood, resolve an ops table (read_iter → every implementation), or
trace a call chain up to a syscall root. Same-origin (server serves both page and API → no CORS).
kgraph view # or: python view/server.py --db <kgraph.db> --root <linux>
# → http://localhost:8000/graph.html (health: http://localhost:8000/)The explorer's three views: Neighborhood (Cytoscape.js graph), Ops table, Call chain. (A read-only Pages demo of pre-baked subgraphs is a planned follow-up; the local explorer is live.)
# Agent integration
kgraph install # auto-detect & configure installed agents
kgraph install --target <ids> # configure specific agents (claude,cursor,codex,opencode,hermes)
kgraph install --location <loc> # global (default) or local (per-project)
kgraph detect # show detected agents, write nothing
kgraph uninstall # remove kgraph config from agents
# Index lifecycle (run in the kernel source dir)
kgraph init <path> # index + ingest (--skip-build, --subsystem, --force)
kgraph ingest <path> # re-ingest from an existing index.scip
kgraph serve --mcp # start the MCP server (usually auto-launched by the agent)
kgraph view # local interactive graph explorer (HTTP + browser UI)
kgraph status <path> # show index statistics and health| codegraph | semcode | KGraph | |
|---|---|---|---|
| Parsing backend | tree-sitter | tree-sitter | scip-clang |
| Semantic depth | syntax-level | syntax-level | compiler-level |
| Config awareness | no (all branches) | no (all branches) | yes (only compiled code) |
| Macro resolution | heuristic | heuristic | clang preprocessor |
| Function pointer calls | heuristic name-match | heuristic name-match | ops_bind derived edges |
| Type resolution | name-based | name-based | clang-precise |
| Kernel domain knowledge | none | git/lore/vectors | MAINTAINERS/Kconfig/syscall |
| Storage | SQLite | LanceDB | SQLite |
| Target scope | 20+ languages, general | C/Rust, kernel | C, kernel-only, deep |
codegraph = breadth (many languages, fast install) semcode = kernel engineering (git/lore/vectors, syntax-level) KGraph = compilation truth (config-aware, macro-resolved, function-pointer-callable)
| Kernel | Build system | Status |
|---|---|---|
| Linux | Kbuild (CC=clang LLVM=1) |
MVP |
| Android | Soong + repo manifest.xml | Planned |
| Zephyr | CMake + west manifest.yml | Planned |
| FreeBSD | Make + src.conf |
Planned |
Adding a new kernel profile means writing a KernelProfile subclass —
build pipeline + domain enrichment — without touching the ingest or query core.
See DESIGN.md §6 for the profile architecture.
KGraph/
├── README.md / README.zh-CN.md # this file (EN / 中文)
├── npm-shim.js # npm bin entry point (thin launcher)
├── docs/
│ ├── DESIGN.md / DESIGN.zh-CN.md # full architecture & rationale
│ ├── TESTING.md # test design & coverage
│ ├── NPM-PACKAGING-DESIGN.md # npm packaging design
│ └── scip-parser-design.md # SCIP parser design notes
├── thirdparty/
│ └── scip.proto # canonical SCIP protobuf schema
├── scripts/
│ └── scip_pb2.py # generated protobuf bindings (7.x / upb)
├── src/
│ ├── parser/ # SCIP protobuf → IngestBatch
│ │ ├── models.py # data model (the parser↔storage contract)
│ │ ├── scip_parser.py # parse + enclosing match + ops_bind derivation
│ │ └── symbol_name.py # SCIP symbol-string parser
│ ├── storage/ # graph persistence
│ │ ├── graph_store.py # GraphStore interface (extension point)
│ │ └── sqlite_store.py # SQLite backend (WAL · FTS5 · recursive CTE)
│ └── installer/ # agent auto-config
│ ├── orchestrator.py # detect() / install() / uninstall()
│ ├── cli.py # `kgraph install` CLI
│ └── targets/ # claude · cursor · codex · opencode · hermes
├── mcp/
│ ├── server.py # MCP server (13 tools)
│ ├── source_reader.py # reads function bodies from disk
│ └── examples/ # per-agent manual config snippets
├── view/
│ └── server.py # `kgraph view` — local explorer (HTTP API + static)
├── bench/
│ └── health_check.py # synthetic retrieval canary + metrics collector
├── graphview/ # health dashboard + interactive explorer
│ ├── index.html · app.js # health 7-day list
│ ├── graph.html · graph.js # explorer (neighborhood · ops table · call chain)
│ └── data/metrics.jsonl # one row per CI run (auto-committed)
└── tests/
├── conftest.py # shared fixtures & synthetic SCIP benchmark
├── unit/ # unit tests (pure functions, parametrized)
├── integration/ # integration tests (synthetic data, no kernel needed)
│ ├── test_scip_pipeline.py # index.scip → parser → store (41 tests)
│ └── test_mcp_server.py # MCP tools → kgraph.db (31 tests)
└── real/ # real-kernel case tests (manual scripts)
└── ingest_real.py # full-kernel ingestion
If you're developing KGraph (not just using it as an end-user):
git clone https://github.com/ajksunkang-aios/KGraph.git
cd KGraph
# Create venv with any python3.10+ and install dependencies
python3.10 -m venv .venv
source .venv/bin/activate
pip install -r requirements-dev.txt # runtime (requirements.txt) + pytest
python -c "import google.protobuf; print(google.protobuf.__version__)" # → 7.35.x
# Regenerate scip_pb2.py only if you change thirdparty/scip.proto
protoc --proto_path=thirdparty --python_out=scripts thirdparty/scip.proto
# Run tests (all synthetic, no real kernel needed)
pytest tests/ -v # all tests
pytest tests/integration/ -v # integration tests only
pytest tests/unit/ -v # unit tests only
# Run real-kernel ingestion (requires index.scip from a kernel tree)
KGRAPH_ROOT=/path/to/linux python tests/real/ingest_real.pySee docs/TESTING.md for the full test design and coverage details.
kgraph uninstall # remove kgraph MCP config from all agents
rm -rf /path/to/linux/.kgraph # remove the graph database from a projectMIT
Made for kernel developers and AI agents who need to see what the compiler sees.