Building production-grade AI for the next billion users — not just the next billion dollars.
I build AI systems that run on hardware most researchers discard.
My work sits at the intersection of edge AI, agentic systems, and low-resource NLP — with a focus on making powerful AI accessible on 4 GB VRAM consumer hardware, in Indian languages, without cloud dependency.
- 🧠 Building CHAARI 2.0 — a privacy-first bilingual agentic AI OS companion (Hinglish, 4 GB VRAM, cryptographic mesh, arXiv in prep)
- ⚡ Researching NMOS — 70B+ inference on 4 GB VRAM via anticipatory behavioral signal loading
- 🎓 MSc Information Technology @ Lovely Professional University (May 2026)
- 💼 ex-Intern @ LG Electronics (predictive maintenance, >95% recall)
- 🏐 State & National Volleyball player — teamwork on and off the court
Comprehensive Hinglish AI Agentic Runtime Interface
Production-grade two-node agentic AI companion running entirely on RTX 2050 (4 GB VRAM). Built solo. No research lab. No cloud budget.
| Component | Detail |
|---|---|
| Scale | 39+ Python modules · 8,000+ lines of code · 369+ automated tests |
| Model | Fine-tuned Qwen 2.5 4.2B on custom Hinglish dataset · 30–40 tok/s on 4 GB VRAM |
| Safety | 7-layer Constitutional AI-inspired pipeline (code-based, not prompt-based) |
| Security | RSA-2048 two-node TCP mesh · nonce replay protection · 3-step handshake |
| RAG | RAPTOR 3-level hierarchical RAG · 1.14 GB vector index · sub-second retrieval |
| Voice | Full-duplex STT+TTS · sub-800ms conversation latency · sub-100ms tool calls |
| Vision | OCR + Llava 7B for screen/image understanding |
| Research | arXiv paper in preparation (cs.CL / cs.AI) |
Anticipatory Inference for LLMs Using User Interaction Signals
The Zero-Lag Hypothesis: Perceived Latency ≈ max(0, T_load − T_typing)
Running 70B+ parameter models on 4 GB VRAM by using human behavioral signals to mask the physical memory wall.
| Module | Role | Status |
|---|---|---|
| Scout (SmolLM2-135M) | Real-time shard affinity prediction | ✅ 90% accuracy |
| River | Async double-buffered prefetcher | ✅ Zero GPU stall |
| Memory | Paged-KV controller with H2O folding | ✅ Active |
| Engine | Speculative decoding orchestrator (K=15) | ✅ ~16 tok/s on 70B |
| Failure Memory | HNSW vector DB for misprediction learning | 🔄 Next phase |
| Project | Stack | Highlights |
|---|---|---|
| Autonomous Financial Research Agent | LangGraph · MCP · FinBERT | Multi-step reasoning workflow with neurosymbolic guardrails |
| HinglishSearch RAG | Endee VectorDB · Docker · CHAARI 2.0 | Semantic search for Hinglish documents · sub-second retrieval |
| Industrial Predictive Maintenance | PyTorch · LSTM · Isolation Forest | >95% recall · deployed at LG Electronics |
Core Languages
AI / ML
LLMs & GenAI
RAG & Vector DBs
Infrastructure
- 🏅 Train/Build Small Language Models — Google DeepMind (Advanced)
- 🏅 Enterprise AI Agents & Fundamentals — Google Cloud
- 🏅 Gemini Enterprise Applications — Google Cloud
- 🏅 Quantitative Research — JPMorgan Chase & Co. (Forage)
- 🏅 Data Analytics — Deloitte Australia (Forage)
"Building AI for the next billion users, not just the next billion dollars."
— Built in Rudrapur. Running everywhere.