From raw data to trained models โ every algorithm, every experiment, documented.
๐ข Maintained and updated by : Bhavya Kansal ย |ย ๐ visit at : bhavyakansal.dev ย |ย ๐ Patiala, Punjab, India
- About This Repository
- Who Is This For?
- Tech Stack
- Notebook Index
- Getting Started
- Repository Roadmap
- Datasets
- Acknowledgements
- Contributing
- Legal & License
- Contact
This is not just a notebook dump โ it is a structured, continuously updated ML knowledge base maintained and updated by Bhavya Kansal , an AI/ML Engineer and Developer , Built this repository to provide Begineer to Advance level and Sutructural Understanding of Machine Learning.
Every notebook in this repository:
- Is written from scratch with clean, readable code
- Covers theory + implementation โ not just copy-paste code
- Is beginner-friendly โ designed so anyone can open it and understand it
- Reflects real internship and coursework experiments, not toy examples
This repository is actively maintained and updated regularly with new algorithms, projects, and experiments as the learning journey progresses.
| Audience | How This Helps |
|---|---|
| ๐ Beginners | Learn ML concepts step-by-step with clean, documented code |
| ๐ฌ Students | Reference implementations for assignments and understanding |
| ๐ผ Practitioners | Quick refresher notebooks for standard algorithms |
| ๐งโ๐ป Developers | Baseline scikit-learn patterns to build production models from |
| Tool | Purpose |
|---|---|
| Core programming language | |
| Numerical computing & array ops | |
| Data manipulation & analysis | |
| Data visualization | |
| Statistical data visualization | |
| Machine Learning algorithms | |
| Interactive notebooks |
All notebooks are self-contained and can be opened directly on GitHub or run locally. Click any notebook name to open it.
The foundation of every ML pipeline โ cleaning, transforming, and preparing raw data.
| # | Notebook | Concepts Covered |
|---|---|---|
| 1 | Data Preprocessing | Missing values, data cleaning, pipelines |
| 2 | Scikit-Learn Data Preprocessing | sklearn preprocessing transformers |
| 3 | Encoding | Label encoding, One-Hot encoding |
| 4 | Feature Scaling | StandardScaler, MinMaxScaler, normalization |
| 5 | Function Transformation | Log, sqrt, box-cox transformations |
| 6 | Feature Elimination | Removing irrelevant/redundant features |
| 7 | Outlier Detection | IQR, Z-score, visualizing outliers |
| 8 | Outlier Analysis (Custom) | Custom outlier detection experiments |
Exploring data with NumPy, Pandas, and the classic Iris dataset.
| # | Notebook | Concepts Covered |
|---|---|---|
| 1 | NumPy Basics | Arrays, operations, broadcasting |
| 2 | Pandas Project | DataFrames, groupby, EDA workflow |
| 3 | NumPy with Iris Dataset | NumPy analysis on real dataset |
| 4 | Iris Data Exploration | EDA, pairplots, correlation heatmaps |
| 5 | ML Fundamentals | Core ML workflow introduction |
| 6 | Train-Test Split | Proper data splitting strategies |
Predicting continuous values โ from simple lines to complex polynomial curves.
| # | Notebook | Concepts Covered |
|---|---|---|
| 1 | Simple Linear Regression | OLS, slope/intercept, Rยฒ score |
| 2 | Multiple Linear Regression | Multi-feature regression, multicollinearity |
| 3 | Polynomial Regression | Degree tuning, overfitting demo |
| 4 | KNN Regression | K-Neighbors for continuous output |
| 5 | Ridge Regularisation | L2 penalty, reducing overfitting |
| 6 | Lasso Regularisation | L1 penalty, automatic feature selection |
| 7 | Polynomial Logistic | Logistic with polynomial features |
Teaching machines to sort, label, and decide.
| # | Notebook | Concepts Covered |
|---|---|---|
| 1 | Logistic Regression โ Part 1 | Binary classification, sigmoid, threshold |
| 2 | Logistic Regression โ Part 2 | Advanced logistic, multi-solver comparison |
| 3 | KNN Classification | K selection, euclidean distance, boundaries |
| 4 | Decision Tree Classification | Gini, entropy, tree visualization |
| 5 | Multiclass Classification | OvR, OvO strategies |
| 6 | Naive Bayes | Gaussian NB, conditional probability |
Margin maximization โ one of the most powerful classical ML algorithms.
| # | Notebook | Concepts Covered |
|---|---|---|
| 1 | Linear SVM | Hard/soft margin, C parameter |
| 2 | Polynomial SVM | Kernel trick with polynomial kernel |
| 3 | SVM Regression | SVR, epsilon-tube, continuous prediction |
| 4 | Polynomial Regression SVM | Combining polynomial features with SVR |
Decision boundaries built like trees โ interpretable and powerful.
| # | Notebook | Concepts Covered |
|---|---|---|
| 1 | Decision Tree Regression | MSE-based splits, tree depth control |
| 2 | Pre & Post Pruning | ccp_alpha, max_depth, preventing overfit |
Because building a model is only half the job.
| # | Notebook | Concepts Covered |
|---|---|---|
| 1 | Confusion Matrix | TP/FP/FN/TN, precision, recall, F1 |
| 2 | Cross Validation | K-Fold, StratifiedKFold, LOOCV |
| 3 | Dataset Imbalance | SMOTE, class_weight, oversampling |
| 4 | Hyperparameter Tuning | GridSearchCV, RandomizedSearchCV |
Make sure you have Python 3.x installed. Then install the required libraries:
pip install numpy pandas matplotlib seaborn scikit-learn jupyter# 1. Clone this repository
git clone https://github.com/BhavyaKansal20/MachineLearning.git
# 2. Navigate into the folder
cd MachineLearning
# 3. Launch Jupyter Notebook
jupyter notebookThen open any .ipynb file from the Jupyter interface in your browser.
Click the badge below or open any notebook on GitHub and change the URL domain from github.com to colab.research.google.com/github:
This repository is actively growing. Upcoming additions:
- Unsupervised Learning (K-Means, DBSCAN, Hierarchical Clustering)
- Dimensionality Reduction (PCA, t-SNE, LDA)
- Ensemble Methods (Random Forest, Gradient Boosting, XGBoost)
- Neural Networks (ANN from scratch with NumPy)
- NLP Basics (TF-IDF, Bag of Words)
- End-to-End ML Projects with real-world datasets
- Model deployment notebooks (Flask + Render)
โญ Star the repo to get notified when new notebooks are added!
Datasets used in these notebooks are maintained in a separate dedicated repository to keep this repo clean and lightweight.
๐ Dataset Repository: Datasets
Some notebooks use built-in scikit-learn datasets (Iris, Boston, etc.) which require no external download.
Contributions, improvements, and suggestions are warmly welcome!
How to contribute:
- Fork this repository
- Create a new branch:
git checkout -b feature/your-topic - Add your notebook or improvement
- Commit your changes:
git commit -m "Add: XGBoost notebook" - Push to your branch:
git push origin feature/your-topic - Open a Pull Request with a clear description
Please read the CONTRIBUTING.md and CODE_OF_CONDUCT.md before submitting.
MIT License
Copyright (c) 2026 Bhavya Kansal
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
THE SOFTWARE.
See the full LICENSE file.
All notebooks and code in this repository are intended strictly for educational and learning purposes. The implementations are for conceptual clarity and skill development, not production deployment without thorough validation.
Datasets used across these notebooks may be sourced from:
- Custom datasets of Bhavya Kansal
- Scikit-learn built-in datasets (BSD License)
- UCI Machine Learning Repository (varies per dataset)
- Publicly available open-source data
Refer to individual notebooks for specific dataset sources and their respective licenses. All will be Checked from this repository : Datasets
For responsible disclosure of any security concerns, please refer to the SECURITY.md file.
Bhavya Kansal | AI/ML Developer | Researcher & Collaborator | เคเคฏ เคถเฅเคฐเฅ เคฐเคพเคฎ ๐โค๏ธ
๐ Patiala, Punjab, India
