I build AI systems that actually work in production — not just in notebooks.
My focus is the full stack between a research paper and a deployed model: fine-tuning, agent orchestration, MLOps pipelines, and the infrastructure that keeps everything running reliably. I care about what happens after the demo, which means proper logging, testing, CI/CD, and experiment tracking from day one.
Most of my work is open source. If something I built saves you a week of debugging, that's the point.
LLM Fine-Tuning — domain-specific models trained on high-quality, custom-built datasets. Not generic instruction tuning. Purpose-built models that reason differently because they were trained differently. My current flagship is FinReasoner — a Qwen 2.5 14B model fine-tuned on 12,500 gold-standard SEC filing analyses across 580 companies and 5 years of data. The dataset took about 30,000 LLM calls and a multi-agent generator-auditor pipeline to build. The model does causal financial reasoning, not summarization.
AI Agents — single agents, multi-agent systems, agentic RAG, voice agents, browser agents. Built with LangGraph, CrewAI, and AutoGen depending on what the task needs. Every project has a live demo on Hugging Face Spaces so you can try it before cloning anything.
MLOps & LLMOps — experiment tracking with MLflow, pipeline orchestration with Apache Airflow, containerization with Docker and Docker Compose, CI/CD with GitHub Actions and GitLab CI. Every project is dockerized and pushed to Docker Hub. Every ML experiment is tracked. Nothing runs only on my machine.
🤖 AI Agents — ai-engineering-hub
Five production-grade agent projects, all dockerized, all with live demos on Hugging Face Spaces.
| Project | What it does | Demo |
|---|---|---|
| Agentic RAG Assistant | Self-correcting Q&A with hallucination detection and hybrid search | ▶ Try it |
| Voice AI Assistant | Speak → AI response → downloadable audio. Multi-language. | ▶ Live app |
| AI Podcast Generator | News URLs → produced podcast episode with MP3 | ▶ Try it |
| YouTubeScriptMaster | Any YouTube video → structured script, insights, markdown export | ▶ Try it |
| AI Presentation Generator | Plain text → full PowerPoint deck with AI-generated images | ▶ Try it |
Stack across these projects: LangGraph · CrewAI · AutoGen · Groq · OpenAI · Anthropic · Streamlit · Docker · Loguru · Pytest
🧠 Fine-Tuning — FinReasoner
A domain-specialized financial reasoning model. Not a chatbot, not a summarizer — a model trained to do the hard part of equity research: connecting a metric change to a root cause to a forward implication, grounded in evidence, without hallucinating.
What makes it different from prompt-engineering a general model: The dataset was purpose-built over 40,000 LLM calls through a multi-agent generator-auditor pipeline. Every training record links a numerical change to a textual cause to a market outcome. The model doesn't learn to sound like a financial analyst — it learns to reason like one.
| Dimension | Detail |
|---|---|
| Base model | Qwen 2.5 14B Instruct |
| Method | rsLoRA → SFT → DPO alignment |
| Training data | 12,500 gold-standard records (score ≥ 80) |
| Universe | 580 companies · 5 years · 10-K, 10-Q, 10-Q/A |
| Infrastructure | Lambda Labs A100 80GB |
| Framework | Unsloth · TRL · PEFT · BitsAndBytes (4-bit NF4) |
| Published | fcyber/FinReasoner-qwen2.5-14b-instruct |
Three model checkpoints published: Phase 1 SFT · Phase 3 DPO · Final production
⚙️ MLOps — mlops-hub
MLOps projects built the way production systems actually need to be built — not just model training scripts, but full pipelines with experiment tracking, orchestration, testing, and CI/CD.
Every project in this hub:
- Tracks all experiments in MLflow — parameters, metrics, artifacts, model registry
- Orchestrates multi-step pipelines with Apache Airflow — scheduled runs, dependency management, failure handling
- Uses Redis for caching, task queuing, and real-time feature serving where applicable
- Is fully containerized with Docker and Docker Compose, pushed to Docker Hub
- Has structured logging via Loguru and a test suite with Pytest
- Ships with a CI/CD pipeline — GitHub Actions or GitLab CI depending on the project
More projects shipping. The first is live — link in the repo.
A few habits that show up across every project:
Logging with Loguru, not print statements. Structured logs from day one. When something breaks in production at 2am, you want context, not a traceback you can't reproduce.
Pytest on everything. Unit tests on the functions that matter, integration tests on the pipelines. Not 100% coverage for its own sake — tests on the things that actually break.
Docker Compose for local development. If it doesn't run in a container, it doesn't ship. Every project has a docker-compose.yml that spins up the full stack including dependencies. No "works on my machine" situations.
MLflow for every experiment. Parameters, metrics, artifacts, model versions — all tracked. Going back to a run from three weeks ago should take 10 seconds, not half an hour of git archaeology.
CI/CD from the start. Tests run on every push. Docker images build and push automatically on merge to main. Nothing gets deployed manually.
I write about what I build — the real implementation details, the problems that took days to debug, the design decisions that look obvious in hindsight.
- Why Integrating Apple's Ecosystem into a Local RAG Project Is Harder Than It Looks
- More pieces in progress on FinReasoner, the PPTX agent, and the MLOps stack
