Pankaj yadav AlfaPankaj

Pankaj Yadav

Generative AI Researcher & Systems Engineer

Building production-grade AI for the next billion users — not just the next billion dollars.

About Me

I build AI systems that run on hardware most researchers discard.

My work sits at the intersection of edge AI, agentic systems, and low-resource NLP — with a focus on making powerful AI accessible on 4 GB VRAM consumer hardware, in Indian languages, without cloud dependency.

🧠 Building CHAARI 2.0 — a privacy-first bilingual agentic AI OS companion (Hinglish, 4 GB VRAM, cryptographic mesh, arXiv in prep)
⚡ Researching NMOS — 70B+ inference on 4 GB VRAM via anticipatory behavioral signal loading
🎓 MSc Information Technology @ Lovely Professional University (May 2026)
💼 ex-Intern @ LG Electronics (predictive maintenance, >95% recall)
🏐 State & National Volleyball player — teamwork on and off the court

Flagship Projects

🤖 CHAARI 2.0 — Privacy-First Bilingual Agentic AI

Comprehensive Hinglish AI Agentic Runtime Interface

Production-grade two-node agentic AI companion running entirely on RTX 2050 (4 GB VRAM). Built solo. No research lab. No cloud budget.

Component	Detail
Scale	39+ Python modules · 8,000+ lines of code · 369+ automated tests
Model	Fine-tuned Qwen 2.5 4.2B on custom Hinglish dataset · 30–40 tok/s on 4 GB VRAM
Safety	7-layer Constitutional AI-inspired pipeline (code-based, not prompt-based)
Security	RSA-2048 two-node TCP mesh · nonce replay protection · 3-step handshake
RAG	RAPTOR 3-level hierarchical RAG · 1.14 GB vector index · sub-second retrieval
Voice	Full-duplex STT+TTS · sub-800ms conversation latency · sub-100ms tool calls
Vision	OCR + Llava 7B for screen/image understanding
Research	arXiv paper in preparation (cs.CL / cs.AI)

🧠 NMOS — Neural Memory Operating System (Active Research)

Anticipatory Inference for LLMs Using User Interaction Signals

The Zero-Lag Hypothesis: Perceived Latency ≈ max(0, T_load − T_typing)

Running 70B+ parameter models on 4 GB VRAM by using human behavioral signals to mask the physical memory wall.

Module	Role	Status
Scout (SmolLM2-135M)	Real-time shard affinity prediction	✅ 90% accuracy
River	Async double-buffered prefetcher	✅ Zero GPU stall
Memory	Paged-KV controller with H2O folding	✅ Active
Engine	Speculative decoding orchestrator (K=15)	✅ ~16 tok/s on 70B
Failure Memory	HNSW vector DB for misprediction learning	🔄 Next phase

Other Projects

Project	Stack	Highlights
Autonomous Financial Research Agent	LangGraph · MCP · FinBERT	Multi-step reasoning workflow with neurosymbolic guardrails
HinglishSearch RAG	Endee VectorDB · Docker · CHAARI 2.0	Semantic search for Hinglish documents · sub-second retrieval
Industrial Predictive Maintenance	PyTorch · LSTM · Isolation Forest	>95% recall · deployed at LG Electronics

Tech Stack

Core Languages

AI / ML

LLMs & GenAI

RAG & Vector DBs

Infrastructure

Certifications

🏅 Train/Build Small Language Models — Google DeepMind (Advanced)
🏅 Enterprise AI Agents & Fundamentals — Google Cloud
🏅 Gemini Enterprise Applications — Google Cloud
🏅 Quantitative Research — JPMorgan Chase & Co. (Forage)
🏅 Data Analytics — Deloitte Australia (Forage)

GitHub Stats

"Building AI for the next billion users, not just the next billion dollars."

— Built in Rudrapur. Running everywhere.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly