I build production AI products with a focus on:
- LLM evaluation — rubrics, LLM-as-judge, safety gates, regression reports
- AI memory — RAG, structured context, second-brain systems
- AI workflows — agents, automation, tools, calendars, reports
- Product engineering — full-stack interfaces that make AI usable
- Delivery leadership — 6 years shipping software with cross-functional teams
LLM output QA platform for testing AI responses before they reach users.
Includes: LLM-as-judge, claim grounding, safety gates, regression-style reports, human review.
Stack: Next.js · TypeScript · OpenAI · Zod · Vercel
Code: ai-evaluation-tool
AI second brain / life analytics system for memory, tasks, goals, emotions, and personal workflows.
Focus: AI memory, daily signals, structured reflection, personal operating system.
Stack: Next.js · TypeScript · AI memory architecture
Code: shadow-ai-second-brain
Experiments around retrieval, structured memory, and project context for AI agents.
Focus: RAG, memory comparison, document/project context, retrieval quality.
Code: rag-memory-playground
Workspace concept for managing AI agents, projects, task queues, statuses, logs, and memory.
Focus: multi-agent workflows, orchestration, project memory, AI operating systems.
Code: Agent-Studio-App
AI: LLM evaluation · RAG · AI memory · prompt systems · agents · workflow automation
Engineering: TypeScript · Next.js · React · Node.js · REST APIs · Azure · Vercel
Delivery: stakeholder communication · QA workflows · technical planning · production delivery
6 years in production software delivery — from implementation work to senior team lead.
I like building AI systems that survive real users, messy workflows, edge cases, and business constraints.
- Portfolio: shatalov.dev
- GitHub: @Qalipso


