Skip to content
View MelihOrel's full-sized avatar

Block or report MelihOrel

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
MelihOrel/README.md

Hi, I'm Melih 👋

My favorite moment in any data project is the one where a "clean" result turns out not to be so clean after all. I'm a Data Scientist building end-to-end, production-quality machine learning pipelines on real-world data — and in every project, I spend real time getting to know the data before I write a single line of modeling code.

🔭 What I'm currently working on

I'm actively building a portfolio around real, messy, far-from-ideal datasets. The goal isn't just to train a model — it's to surface and fix the real bugs that show up during data collection, cleaning, and validation. Every repo follows the same principles:

  • End-to-end, production-grade pipelines with no placeholders
  • A validation-first methodology — profiling the data before touching the model
  • Honest, transparent reporting of results (including data limitations)

🧠 Featured projects

  • LAPD Crime Analysis — Spatiotemporal crime hotspot detection using KDE and Getis-Ord Gi*, paired with a Fairlearn-based fairness audit of a LightGBM case-clearance model. The model looked "fair" mainly because it was uniformly pessimistic across every demographic group — that was the real finding.
  • Semantic Search / RAG Engine (from scratch) — A RAG system built without LangChain or LlamaIndex: BM25, dense, and hybrid retrieval (via Qdrant), with a full evaluation suite (Recall@k, MRR, nDCG).
  • Statistical Detection of Global Warming — Testing the statistical significance of warming trends using Mann-Kendall, Theil-Sen estimators, and AR(1)-corrected OLS.
  • METABRIC Survival — XGBoost + SHAP pipeline for 5-year breast cancer survival prediction, handling medical data imbalance with ADASYN.
  • Explainable Airline Sentiment — An NLP pipeline that doesn't just predict, but explains why — using XAI to surface the reasoning behind each decision.

🛠️ Tech I work with

Python · scikit-learn · XGBoost / LightGBM · SHAP · Fairlearn · Qdrant · LangChain / LangGraph · pandas · PySAL · statsmodels · TensorFlow

📫 Get in touch

I share completed projects on LinkedIn, explaining the technical depth in plain language.


⭐️ If a repo catches your eye, feel free to dig in — every one of them is built on real data, real bugs, and real fixes.

Pinned Loading

  1. Fruit-classification-with-opencv-and-tensorflow Fruit-classification-with-opencv-and-tensorflow Public

    Jupyter Notebook 3

  2. explainable-airline-sentiment explainable-airline-sentiment Public

    ✈️ An NLP-driven sentiment analysis pipeline that classifies airline customer feedback while leveraging Explainable AI (XAI). It doesn't just predict sentiment—it reveals exactly why a model made i…

    Python 1

  3. Semantic-Search-RAG-Engine-Built-From-Scratch Semantic-Search-RAG-Engine-Built-From-Scratch Public

    🔍 A Semantic Search and RAG engine built entirely from scratch—no LangChain or LlamaIndex. It features custom chunking experiments, dense/sparse/hybrid retrieval (Qdrant + BM25), and rigorous IR ev…

    Python 1

  4. Statistical-Detection-Quantification-of-Global-Warming Statistical-Detection-Quantification-of-Global-Warming Public

    🌡️ A rigorous statistical pipeline for quantifying global warming trends. This project moves beyond simple visualization, employing robust time-series decomposition, anomaly detection, and advanced…

    Python 1

  5. Autonomous-Customer-Profiling-Financial-Clustering-Agent Autonomous-Customer-Profiling-Financial-Clustering-Agent Public

    🤖 An autonomous AI agent designed for end-to-end financial data clustering and customer profiling. Powered by LLMs and unsupervised machine learning, it automatically analyzes financial datasets, s…

    Python 1

  6. metabric-survival metabric-survival Public

    🧬 An explainable machine learning pipeline predicting 5-year breast cancer survival using the METABRIC genomic dataset. It leverages XGBoost, ADASYN for medical data imbalance, and SHAP (Explainabl…

    Python 1