Production AI Reliability & Operational Evidence | Observability · Evals · Replay
Platform artifacts for production AI and agent runtime systems: workflow orchestration, tool-call permission records, agent-step telemetry, release manifests, inference / decision event schemas, eval traces, replay / reconstruction packets, and rollout gates.
PhD research: operational evidence for production AI.
- Production AI Reliability (release manifests, eval-to-release gates, incident evidence)
- Agentic Runtime Evidence (tool-call permission records, workflow / agent-step evidence)
- Observability & Evals Infrastructure (eval / telemetry linkage, traces, quality loops)
- Operational Evidence & Replay (event identity, lineage, reconstruction packets)
- Policy / Permission Controls (rules / policy lifecycle, authorization evidence)
- Release Gates & Incident Reconstruction (canary, shadow, rollback, postmortem packets)
- Platform Engineering (distributed services, event streaming, Kubernetes, GitOps, multi-cloud)
Current strongest public proof:
- operational-evidence-plane — public reference implementation for production AI / agent-runtime operational evidence: release manifest, agent-step event, tool-call permission packet, operational trace / eval result, reconstruction packet, deterministic code-review demo, Bedrock translation. Apache-2.0. Citable archive: Zenodo DOI 10.5281/zenodo.20051037.
- decision-trace-reconstructor - reconstructs agent / automated-decision traces and reports missing or opaque decision facts across LangSmith, OpenTelemetry, Bedrock, OpenAI Agents, Anthropic, MCP, and other adapters.
Foundational operational-evidence artifacts:
- decision-event-schema - JSON Schema for decision / action events and reconstruction-oriented evidence identity.
- evidence-collector-sdk - collects and structures operational telemetry into decision evidence records.
- evidence-sufficiency-calc - computes whether available operational proof is sufficient for a decision context.
- governance-drift-toolkit - monitors degradation of governance evidence in delayed-label environments.
Supporting policy-as-code project:
- RuleHub - Policy-as-Code ecosystem for AI / ML guardrails, policy enforcement, and reproducible evidence.