Skip to content
View Shubhmeep's full-sized avatar
🎯
Focusing
🎯
Focusing
  • Karol Bagh, Delhi

Block or report Shubhmeep

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Shubhmeep/README.md

hey, I'm

Shubh Sehgal

ML Engineer · Speech AI Researcher · RIT MS/AI '26

Python PyTorch AWS Docker LangChain Azure HuggingFace

Building end-to-end ML systems — ASR pipelines, RAG architectures, credit risk models, LLM fine-tuning.

Currently @ RIT CLASP Lab  ·  ACL 2026 under review  ·  Open to relocation


experience

Research Assistant   RIT CLASP Lab · Rochester, NY     Aug 2025 – Present
Data Scientist       actyv.ai · Bengaluru, India        Jul 2023 – Sep 2023
Data Science Intern  actyv.ai · Bengaluru, India        Jan 2023 – Jun 2023
  • 🎙️ Built a Whisper + Pyannote speech diarization pipeline — 200h audio processed, 32% WER reduction — ACL 2026 under review
  • 💳 Shipped XGBoost ensemble credit scoring model — 0.91 ROC-AUC, handling 4,000+ loan applications/day, 60% faster underwriting
  • 📄 Multimodal document intelligence system — 0.89 weighted F1, 1,000+ daily API requests

shipped

Project Stack Impact
🏆 PhysioPrompt BioRadio · Random Forest · Claude API Best Demo — AWARE-AI Hackathon 2026
⚙️ AdAudit Azure AI · LangGraph · FAISS · GPT-4o 92% rule-matching precision · 45% ↓ manual review
🎙️ Speech Pipeline Whisper · Pyannote · DistilBERT · Docker 32% WER ↓ · ACL 2026 (under review)
🧠 LLM Fine-Tuning LoRA/QLoRA · HuggingFace · Ollama 60% GPU memory ↓ · offline domain Q&A
🧘 YogiSync MediaPipe · Random Forest · Gemini 0.88 macro F1 · BrickHack 11

stack

Layer Tools
Languages & Data Python SQL Pandas NumPy
ML / DL PyTorch Scikit-learn TensorFlow Whisper DistilBERT LoRA/QLoRA XGBoost
GenAI & Retrieval LangChain LangGraph FAISS ChromaDB RAG
MLOps & Cloud Docker Kubernetes MLflow FastAPI AWS Azure Airflow GitHub Actions

numbers that matter

Metric Value
WER reduction — speech pipeline 32%
ROC-AUC — credit risk model 0.91
Rule-matching precision — AdAudit 92%
GPU memory saved — LLM fine-tuning 60%
F1 — dialogue-act classification 0.87
Loan applications processed / day 4,000+

Pinned Loading

  1. Earthquake-prediction-ML-pipeline Earthquake-prediction-ML-pipeline Public

    This project implements end-to-end machine learning pipelines, encompassing feature engineering, model training, and inference deployment. Leveraging AWS services, Airflow, Great Expectations, Hops…

    Python 1

  2. AWS-Airflow-DataIngestion-Pipeline AWS-Airflow-DataIngestion-Pipeline Public

    This project entails the development and deployment of a robust data ingestion pipeline leveraging Apache Airflow, orchestrated on an AWS EC2 instance. The pipeline is designed to efficiently extra…

    Python 6 1

  3. YogiSync YogiSync Public

    A Smart Yoga App that uses AI to provide real-time feedback and guidance for anyone looking to learn yoga or perfect their form.

    HTML

  4. Agentic_AI Agentic_AI Public

    This repo is my practice playground for Agentic AI—building and experimenting with LLM agents using LangChain + LangGraph. It includes small, focused examples of stateful workflows, tool calling, a…

    Jupyter Notebook