ML Engineer · Speech AI Researcher · RIT MS/AI '26
Building end-to-end ML systems — ASR pipelines, RAG architectures, credit risk models, LLM fine-tuning.
Currently @ RIT CLASP Lab · ACL 2026 under review · Open to relocation
Research Assistant RIT CLASP Lab · Rochester, NY Aug 2025 – Present
Data Scientist actyv.ai · Bengaluru, India Jul 2023 – Sep 2023
Data Science Intern actyv.ai · Bengaluru, India Jan 2023 – Jun 2023
- 🎙️ Built a Whisper + Pyannote speech diarization pipeline — 200h audio processed, 32% WER reduction — ACL 2026 under review
- 💳 Shipped XGBoost ensemble credit scoring model — 0.91 ROC-AUC, handling 4,000+ loan applications/day, 60% faster underwriting
- 📄 Multimodal document intelligence system — 0.89 weighted F1, 1,000+ daily API requests
| Project | Stack | Impact | |
|---|---|---|---|
| 🏆 | PhysioPrompt | BioRadio · Random Forest · Claude API | Best Demo — AWARE-AI Hackathon 2026 |
| ⚙️ | AdAudit | Azure AI · LangGraph · FAISS · GPT-4o | 92% rule-matching precision · 45% ↓ manual review |
| 🎙️ | Speech Pipeline | Whisper · Pyannote · DistilBERT · Docker | 32% WER ↓ · ACL 2026 (under review) |
| 🧠 | LLM Fine-Tuning | LoRA/QLoRA · HuggingFace · Ollama | 60% GPU memory ↓ · offline domain Q&A |
| 🧘 | YogiSync | MediaPipe · Random Forest · Gemini | 0.88 macro F1 · BrickHack 11 |
| Layer | Tools |
|---|---|
| Languages & Data | Python SQL Pandas NumPy |
| ML / DL | PyTorch Scikit-learn TensorFlow Whisper DistilBERT LoRA/QLoRA XGBoost |
| GenAI & Retrieval | LangChain LangGraph FAISS ChromaDB RAG |
| MLOps & Cloud | Docker Kubernetes MLflow FastAPI AWS Azure Airflow GitHub Actions |
| Metric | Value |
|---|---|
| WER reduction — speech pipeline | 32% |
| ROC-AUC — credit risk model | 0.91 |
| Rule-matching precision — AdAudit | 92% |
| GPU memory saved — LLM fine-tuning | 60% |
| F1 — dialogue-act classification | 0.87 |
| Loan applications processed / day | 4,000+ |
📬 shubh.sehgal.rit@gmail.com · linkedin.com/in/shubhsehgal2506 · Rochester, NY
