Founder of AgentiCraft — infrastructure layer for production multi-agent systems.
I specialize in multi-agent systems architecture, LLM infrastructure, and distributed systems reliability. My work sits at the intersection of formal methods and production engineering — building systems that are provably correct, not just empirically okay.
Fault-Dependent Resilience in Multi-Agent LLM Systems
Extending classical network reliability theory to stochastic agent quality. The core result: an iff characterization of when topology choice actually matters — crash-stop faults make all mesh topologies equivalent (a mathematical identity), while Byzantine faults break that equivalence in ways determined by the coordination protocol, not the graph structure.
Validated across ~34,000 LLM experiments spanning 13 coordination topologies, two fault regimes, two task domains, and two model generations. Preparing for submission to a top-tier ML systems venue.
Standalone libraries from this research:
| Library | Description |
|---|---|
| stochastic-circuit-breaker | CUSUM-optimal circuit breaker for LLM agents and stochastic systems. 4-state FSM with statistically principled degradation detection and provably minimax detection delay. |
| reliability-polynomials | Generalized reliability polynomials where coefficients encode quality, not just connectivity. Fault-dependent crossover analysis, three theorems. |
Multi-Agent Systems — mesh coordination architecture, fault-dependent topology selection, Byzantine fault tolerance for LLM systems, stochastic service mesh, MCP/A2A protocol integration
Formal Methods — session type theory for deadlock-freedom guarantees, runtime property verification, CSP process algebra, refinement checking
LLM Infrastructure — provider-agnostic inference abstraction, statistical circuit breakers with CUSUM-optimal change detection, quality-weighted reliability theory
Distributed Systems — consensus protocols, fault injection and fault modeling, observability, Kubernetes-native deployment
Languages: Python (expert), C++, TypeScript, SQL, Bash
AI/ML: PyTorch, RAG, fine-tuning (LoRA, QLoRA), LLM evaluation, OpenTelemetry
Infrastructure: Kubernetes, Docker, Helm, CI/CD, service mesh, PostgreSQL, Redis, Qdrant
Cloud: AWS, GCP, Azure, Nebius AI Cloud
- B.Sc. Industrial Engineering & Management (Data Science concentration) — Tel Aviv University
- Advanced Data Science & AI Program — Nebius Academy (Y-DATA), Tel Aviv University
- Previously: AI & Infrastructure Engineer at Visual Arena (Gothenburg, Sweden)

