Data Analyst Β· ML Engineer Β· Business Intelligence
Bengaluru, Karnataka, India
I build data systems that drive decisions β predictive ML models, analytics pipelines, and BI dashboards on large-scale, real-world datasets.
What sets me apart: I've analyzed real production data on factory floors at Elecon Engineering and Jinal Engineering β surfaced process inefficiencies that caused 12β15% excess cycle time and presented findings to engineering teams. That operational grounding gives me the domain context to separate insight from noise.
Recent highlights:
- π XGBoost classifier on 278K+ IPL records β 77% accuracy, AUC-ROC 0.83 β deployed as a live Streamlit win-probability dashboard with full ETL β feature engineering β tuning β deployment pipeline
- π« CDC mortality risk predictor on 433K+ Americans β AUC-ROC 0.8256, 85% Recall β Streamlit app with Groq-powered AI clinical assistant (LLaMA 3.3 70B)
- π End-to-end SaaS analytics pipeline β identified 3 churn drivers, computed MRR, CAC, LTV & churn rate via SQL cohort analysis, delivered multi-view Tableau dashboard
π Currently: Data Science & Generative AI β Great Lakes Institute of Management (ML Β· Deep Learning Β· NLP Β· LLMs)
πΌ Open to: Data Analyst Β· ML Engineer Β· BI Analyst Β· Data Scientist β Bengaluru or Remote
Languages
ML & AI
Visualization & BI
Databases & Tools
PythonΒ·XGBoostΒ·Scikit-LearnΒ·StreamlitΒ·Pandas
- Trained an XGBoost model on 278,000+ ball-by-ball records across 10 IPL seasons
- Engineered 6 predictive features: dynamic run rate, wickets-in-hand, required run rate, pressure index, match phase, and venue factor
- Achieved 77% accuracy and AUC-ROC of 0.83, beating logistic regression baseline by 11 points
- Deployed a real-time Streamlit dashboard for live win-probability simulation
PythonΒ·SQLΒ·TableauΒ·PandasΒ·Seaborn
- Built an enterprise-scale SaaS dataset (5,000+ records) modeled on real CRM, billing & usage-log structures
- Surfaced 3 primary churn drivers through EDA β revealing revenue-risk signals previously undetected
- Computed MRR, Churn Rate, CAC & LTV using SQL cohort analysis
- Designed a multi-view Tableau dashboard for Sales, Product & Customer Success teams
PythonΒ·XGBoostΒ·StreamlitΒ·Groq / LLaMA 3.3 70BΒ·PlotlyΒ·SHAPΒ·Scikit-Learn
- Built a binary classification system trained on 433,000+ Americans from the CDC BRFSS 2023 survey to identify individuals at HIGH mortality risk using only self-reported behavioral & demographic data β no lab tests required
- Engineered 4 domain-driven features (
comorbidity_count,age_risk_tier,health_burden,ses_score) following strict zero data-leakage discipline β caught and fixed a 98.88% phantom-accuracy bug caused by target column leakage - Achieved AUC-ROC: 0.8256 and 85% Recall at threshold 0.40 with a tuned XGBoost model (RandomizedSearchCV, 50 iterations Γ 3 folds); outperformed Logistic Regression, Random Forest, LightGBM & Neural Network baselines
- Deployed a full-stack Streamlit app featuring local auth (SHA-256), risk gauge dial, SHAP explainability charts, radar/bar factor charts, and a Groq-powered AI clinical assistant (LLaMA 3.3 70B) for plain-language prediction explanations
- Applied CRISP-DM across 12 documented steps with 7-strategy logical imputation on 433K rows and SHAP TreeExplainer for model transparency
| Role | Company | Period |
|---|---|---|
| Data & Process Analyst Intern | Elecon Engineering Co. Ltd | Dec 2023 β Mar 2024 |
| Manufacturing Data Intern | Jinal Engineering | May 2023 β Jun 2023 |
- π« B.E. Mechanical Engineering β GCET, Anand, Gujarat (2021β2024)
- π« Diploma in Mechanical Engineering β N.G. Patel Polytechnic (2018β2021)
- π Data Science & Generative AI Program β Great Lakes Institute of Management (Oct 2025 β Present)
- Machine Learning Β· Deep Learning Β· NLP Β· Generative AI & LLMs Β· Statistical Modeling
Open to Data Analyst, ML E