Skip to content

EPIC: RuVix Hypervisor Core — Coherence-Native Microhypervisor #328

@ruvnet

Description

@ruvnet

EPIC: RuVix Hypervisor Core — Coherence-Native Microhypervisor

Vision

RuVix is a Rust-first, bare-metal microhypervisor that treats coherence domains — not VMs — as the primary unit of execution. It uses dynamic mincut for placement/isolation/migration, proof-gated mutation for security, and witness-native control loops for full auditability. No KVM. No Linux dependency. Something genuinely new.

Why This Matters

  • No existing hypervisor uses graph partitioning (mincut) as a first-class scheduling primitive
  • Agent workloads need finer-grained isolation than VMs provide (sub-10us partition switch)
  • Reconstructable memory makes RAM less necessary — state is compressed, externalized, or rebuilt on demand
  • Witness-native operation means every state change is cryptographically linked — deterministic infrastructure observability

SOTA Positioning

System What It Does What RuVix Does Differently
KVM Linux virtualization API RuVix owns hardware directly, no Linux
Firecracker Minimalist KVM microVM RuVix replaces VM abstraction with coherence domains
seL4 Formally verified microkernel RuVix borrows capability discipline, adds graph-driven control
Theseus OS Intralingual Rust OS RuVix adds hypervisor-level isolation + coherence engine
Hyperlight WASM micro-VM (Microsoft) Validates no_std wasmtime; RuVix adds coherence + witness

Existing Foundation

The RuVector project already has 22 sub-crates, ~101K lines of Rust, 760 passing tests:

  • ruvix-cap — seL4-inspired capability system
  • ruvix-proof — 3-tier proof engine
  • ruvix-sched — coherence-aware scheduler
  • ruvix-aarch64 — AArch64 boot/MMU stubs
  • ruvector-mincut — dynamic mincut (the crown jewel)
  • ruvector-sparsifier — graph sparsification
  • ruvector-solver — sublinear sparse solvers

Design Constraints (Anti-Scope-Collapse)

# Constraint Rule
DC-1 Coherence engine is optional Kernel MUST boot and run without graph/mincut/solver
DC-2 Mincut never blocks scheduling 50us hard budget per epoch; stale-cut fallback
DC-3 Three-layer proof system P1 capability (<1us) + P2 policy (<100us) + P3 deep (deferred)
DC-4 Scheduler starts simple v1: priority = deadline + cut_pressure only
DC-5 Three systems, cleanly separated Kernel alone, +coherence, +agents — each optional layer up

Architecture

Layer 4: Persistent State
         witness log | compressed dormant memory | RVF checkpoints
Layer 3: Execution Adapters
         bare partition | WASM partition | service adapter
Layer 2: Coherence Engine (OPTIONAL — DC-1)
         graph state | mincut | pressure scoring | migration
Layer 1: RuVix Core (Rust, no_std)
         partitions | capabilities | scheduler | witnesses
Layer 0: Machine Entry (assembly, <500 LoC)
         reset vector | trap handlers | context switch

First-Class Kernel Objects

  1. Partition — coherence domain container (NOT a VM)
  2. Capability — unforgeable authority token
  3. Witness — 64-byte hash-chained audit record
  4. MemoryRegion — typed, tiered, owned (hot/warm/dormant/cold)
  5. CommEdge — inter-partition communication channel
  6. DeviceLease — time-bounded, revocable device access
  7. CoherenceScore — locality/coupling metric
  8. CutPressure — graph-derived isolation signal
  9. RecoveryCheckpoint — state snapshot for rollback

Implementation Phases

Phase 1: Foundation (M0-M1) — "Can it boot and isolate?"

  • M0: Bare-metal Rust boot on QEMU AArch64 virt. Reset -> EL2 -> serial -> MMU -> first witness
  • M1: Partition + capability object model. Create, destroy, switch. Simple deadline scheduler.

Phase 2: Differentiation (M2-M3) — "Can it prove and witness?"

  • M2: Witness logging (64-byte chained records) + P1/P2 proof verifier
  • M3: 2-signal scheduler (deadline + cut_pressure). Flow + Reflex modes. Zero-copy IPC.

Phase 3: Innovation (M4-M5) — "Can it think about coherence?"

  • M4: Dynamic mincut integration (DC-2 budget). Live coherence graph. Migration triggers.
  • M5: Memory tier management (4 tiers). Reconstruction from dormant state.

Phase 4: Expansion (M6-M7) — "Can agents run on it?"

  • M6: WASM agent runtime adapter. Agent lifecycle.
  • M7: Seed/Appliance hardware bring-up. All 6 success criteria end-to-end.

Success Criteria (v1)

# Criterion Target
1 Cold boot to first witness < 250ms
2 Hot partition switch < 10 microseconds
3 Remote memory traffic reduction >= 20% vs naive
4 Tail latency reduction >= 20% under mixed pressure
5 Witness completeness Full trail for every privileged action
6 Fault recovery Without global reboot

4-6 Week Acceptance Test

On track if: boot QEMU, two partitions, capability isolation, witness records, <10us switch. Before mincut. Before WASM. Before anything fancy.

ADR Chain

ADR Topic
ADR-132 RuVix Hypervisor Core (this EPIC)
ADR-133 Partition Object Model
ADR-134 Witness Schema and Log Format
ADR-135 Proof Verifier Design
ADR-136 Memory Hierarchy and Reconstruction
ADR-137 Bare-Metal Boot Sequence
ADR-138 Seed Hardware Bring-Up
ADR-139 Appliance Deployment Model
ADR-140 Agent Runtime Adapter

Key Risk: Mincut in no_std

ruvector-mincut depends on petgraph, rayon, dashmap — incompatible with no_std. Kernel-compatible subset is highest-risk item.

What Makes This Novel

  1. Kernel-level graph control loop — no OS does this
  2. Proof-gated infrastructure — mutation requires proof token
  3. Witness-native OS — every state change cryptographically linked
  4. Reconstructable memory — making RAM less necessary

Generated from RuVix research swarm (5 agents, 6,147 lines of analysis)

Metadata

Metadata

Assignees

No one assigned

    Labels

    architectureArchitectural changesepicMulti-milestone featureruvixRuVix Hypervisor Core

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions