Skip to content
View omnaik21's full-sized avatar
🎯
Focusing
🎯
Focusing

Block or report omnaik21

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
omnaik21/README.md

Hi, I'm Om Naik πŸ‘‹

Data Analyst Β· ML Engineer Β· Business Intelligence
Bengaluru, Karnataka, India


πŸ§‘β€πŸ’» About Me

I build data systems that drive decisions β€” predictive ML models, analytics pipelines, and BI dashboards on large-scale, real-world datasets.

What sets me apart: I've analyzed real production data on factory floors at Elecon Engineering and Jinal Engineering β€” surfaced process inefficiencies that caused 12–15% excess cycle time and presented findings to engineering teams. That operational grounding gives me the domain context to separate insight from noise.

Recent highlights:

  • 🏏 XGBoost classifier on 278K+ IPL records β†’ 77% accuracy, AUC-ROC 0.83 β€” deployed as a live Streamlit win-probability dashboard with full ETL β†’ feature engineering β†’ tuning β†’ deployment pipeline
  • πŸ«€ CDC mortality risk predictor on 433K+ Americans β†’ AUC-ROC 0.8256, 85% Recall β€” Streamlit app with Groq-powered AI clinical assistant (LLaMA 3.3 70B)
  • πŸ“Š End-to-end SaaS analytics pipeline β†’ identified 3 churn drivers, computed MRR, CAC, LTV & churn rate via SQL cohort analysis, delivered multi-view Tableau dashboard

πŸŽ“ Currently: Data Science & Generative AI β€” Great Lakes Institute of Management (ML Β· Deep Learning Β· NLP Β· LLMs)

πŸ’Ό Open to: Data Analyst Β· ML Engineer Β· BI Analyst Β· Data Scientist β€” Bengaluru or Remote


πŸ› οΈ Tech Stack

Languages

Python SQL

ML & AI

Scikit-Learn XGBoost Pandas NumPy SHAP Groq

Visualization & BI

Tableau Power BI Streamlit Plotly

Databases & Tools

MySQL Git Jupyter VS Code


πŸš€ Featured Projects

Python Β· XGBoost Β· Scikit-Learn Β· Streamlit Β· Pandas

  • Trained an XGBoost model on 278,000+ ball-by-ball records across 10 IPL seasons
  • Engineered 6 predictive features: dynamic run rate, wickets-in-hand, required run rate, pressure index, match phase, and venue factor
  • Achieved 77% accuracy and AUC-ROC of 0.83, beating logistic regression baseline by 11 points
  • Deployed a real-time Streamlit dashboard for live win-probability simulation

Python Β· SQL Β· Tableau Β· Pandas Β· Seaborn

  • Built an enterprise-scale SaaS dataset (5,000+ records) modeled on real CRM, billing & usage-log structures
  • Surfaced 3 primary churn drivers through EDA β€” revealing revenue-risk signals previously undetected
  • Computed MRR, Churn Rate, CAC & LTV using SQL cohort analysis
  • Designed a multi-view Tableau dashboard for Sales, Product & Customer Success teams

Python Β· XGBoost Β· Streamlit Β· Groq / LLaMA 3.3 70B Β· Plotly Β· SHAP Β· Scikit-Learn

  • Built a binary classification system trained on 433,000+ Americans from the CDC BRFSS 2023 survey to identify individuals at HIGH mortality risk using only self-reported behavioral & demographic data β€” no lab tests required
  • Engineered 4 domain-driven features (comorbidity_count, age_risk_tier, health_burden, ses_score) following strict zero data-leakage discipline β€” caught and fixed a 98.88% phantom-accuracy bug caused by target column leakage
  • Achieved AUC-ROC: 0.8256 and 85% Recall at threshold 0.40 with a tuned XGBoost model (RandomizedSearchCV, 50 iterations Γ— 3 folds); outperformed Logistic Regression, Random Forest, LightGBM & Neural Network baselines
  • Deployed a full-stack Streamlit app featuring local auth (SHA-256), risk gauge dial, SHAP explainability charts, radar/bar factor charts, and a Groq-powered AI clinical assistant (LLaMA 3.3 70B) for plain-language prediction explanations
  • Applied CRISP-DM across 12 documented steps with 7-strategy logical imputation on 433K rows and SHAP TreeExplainer for model transparency

πŸ’Ό Experience

Role Company Period
Data & Process Analyst Intern Elecon Engineering Co. Ltd Dec 2023 – Mar 2024
Manufacturing Data Intern Jinal Engineering May 2023 – Jun 2023

πŸŽ“ Education & Certifications

  • 🏫 B.E. Mechanical Engineering β€” GCET, Anand, Gujarat (2021–2024)
  • 🏫 Diploma in Mechanical Engineering β€” N.G. Patel Polytechnic (2018–2021)
  • πŸ“œ Data Science & Generative AI Program β€” Great Lakes Institute of Management (Oct 2025 – Present)
    • Machine Learning Β· Deep Learning Β· NLP Β· Generative AI & LLMs Β· Statistical Modeling

πŸ“Š GitHub Stats


Open to Data Analyst, ML E

Popular repositories Loading

  1. ipl-win-probability ipl-win-probability Public

    "Ball-by-ball IPL win probability predictor using XGBoost | Python + Tableau | 1169 matches"

    Jupyter Notebook 1

  2. -saas-churn-analytics -saas-churn-analytics Public

    End-to-end SaaS churn analytics project using SQL, Python & Tableau

    Jupyter Notebook 3

  3. Defect_analyzer_ai Defect_analyzer_ai Public

    Develop a web based app using Streamlit that allows users to upload image of a building structures and to analyze the defects using a Gemini Model

    Python

  4. ragdemo.ai ragdemo.ai Public

    Retrieval Augmented Generation(RAG) is a technique that enhances the capabilities of LLMs by combining information retrieval with text generation. Instead of relying on pre-traned knowledge, RAG fe…

    Python

  5. ML_model.ai ML_model.ai Public

    Ai powered Streamlit App to learn the data given and to predict the unseen data using ancient ML algorithms and Generate the suggestions and improvements to be done in the models Using Gemini AI al…

    Python

  6. cdc-mortality-risk-predictor cdc-mortality-risk-predictor Public

    Mortality risk prediction using CDC BRFSS 2023 β€” XGBoost | ROC-AUC 0.8256 | Recall 85% | Streamlit app with Groq AI assistant

    Jupyter Notebook