Skip to content

HariomSinghalPuri/Machine_Learning_Materials

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

15 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐Ÿค– ML_Materials

Work in Progress Python Jupyter License

โš ๏ธ Note: This project is currently under active development. New modules and content are being added regularly.

A comprehensive collection of Machine Learning materials, tutorials, and practical implementations designed for learning and mastering ML fundamentals and advanced techniques.

๐Ÿ“‹ Table of Contents

๐ŸŽฏ Overview

This repository serves as a complete learning resource for Machine Learning enthusiasts, covering everything from Python basics to advanced ML implementations. The materials are organized into comprehensive modules that build upon each other to provide a structured learning path.

What you'll learn:

  • Python programming fundamentals for ML
  • Data manipulation with NumPy and Pandas
  • Data visualization with Matplotlib and Seaborn
  • Data preprocessing techniques
  • Feature engineering and text processing
  • Real-world ML use cases and implementations

๐Ÿ“ Repository Structure

ML_materials/
โ”œโ”€โ”€ README.md
โ”œโ”€โ”€ requirements.txt
โ”œโ”€โ”€ CSV.ipynb
โ”œโ”€โ”€ JSON.ipynb
โ”œโ”€โ”€ Learn_Python.ipynb
โ”œโ”€โ”€ Rev_Arrays.ipynb
โ”œโ”€โ”€ Rev_Pandas.ipynb
โ”‚
โ”œโ”€โ”€ Module_1_Fundamentals/                    # โœ… Complete
โ”‚   โ”œโ”€โ”€ README.md
โ”‚   โ”œโ”€โ”€ 1_Learn_Python.ipynb
โ”‚   โ”œโ”€โ”€ 2_Numpy_ML.ipynb
โ”‚   โ”œโ”€โ”€ 3_Matplotlib_ML.ipynb
โ”‚   โ”œโ”€โ”€ 5_Pandas_Series_ML.ipynb
โ”‚   โ”œโ”€โ”€ 6_Pandas_DataFrame_ML.ipynb
โ”‚   โ”œโ”€โ”€ 7_Seaborn_ML.ipynb
โ”‚   โ””โ”€โ”€ datasets/
โ”‚       โ”œโ”€โ”€ batsman_runs_ipl.csv
โ”‚       โ”œโ”€โ”€ bollywood.csv
โ”‚       โ”œโ”€โ”€ data.csv
โ”‚       โ”œโ”€โ”€ data_for_Histograms.csv
โ”‚       โ”œโ”€โ”€ data_for_LinePlot.csv
โ”‚       โ”œโ”€โ”€ data_for_ScatterPlot.csv
โ”‚       โ”œโ”€โ”€ data_for_Timeseries.csv
โ”‚       โ”œโ”€โ”€ data_subplots.csv
โ”‚       โ”œโ”€โ”€ diabetes.csv
โ”‚       โ”œโ”€โ”€ fig1.png
โ”‚       โ”œโ”€โ”€ fig2.png
โ”‚       โ”œโ”€โ”€ ipl-matches.csv
โ”‚       โ”œโ”€โ”€ kohli_ipl.csv
โ”‚       โ”œโ”€โ”€ movies.csv
โ”‚       โ”œโ”€โ”€ Part_of_CSV_01.csv
โ”‚       โ”œโ”€โ”€ Part_of_CSV_01_with_no_index.csv
โ”‚       โ””โ”€โ”€ subs.csv
โ”‚
โ”œโ”€โ”€ Module_2_Preprocessing/               # โœ… Complete    
โ”‚   โ”œโ”€โ”€ 1_Importing_Datasets_through_Kaggle_API.ipynb
โ”‚   โ”œโ”€โ”€ 2_Handling_Missing_Values.ipynb
โ”‚   โ”œโ”€โ”€ 3_Data_Standardization.ipynb
โ”‚   โ”œโ”€โ”€ 4_Label_Encoding.ipynb
โ”‚   โ”œโ”€โ”€ 5_Train_Test_Split.ipynb
โ”‚   โ”œโ”€โ”€ 6_Handling_imbalanced_Dataset.ipynb
โ”‚   โ”œโ”€โ”€ 7_Feature_extraction_of_Text_data_using_Tf_idf_Vectorizer.ipynb
โ”‚   โ”œโ”€โ”€ 8_Numerical_Dataset_Pre_Processing_Use_Case.ipynb
โ”‚   โ”œโ”€โ”€ 9_Text_Data_Pre_Processing_Use_Case.ipynb
โ”‚   โ”œโ”€โ”€ ML_Use_Case_1_Rock_vs_Mine_Prediction.ipynb
โ”‚   โ”œโ”€โ”€ ML_Use_Case_2_Diabetes_Prediction.ipynb
โ”‚   โ”œโ”€โ”€ ML_Use_Case_3_Spam_Mail_Prediction_using_Machine_Learning.ipynb
โ”‚   โ”œโ”€โ”€ Dataset_Links.txt
โ”‚
โ”œโ”€โ”€ Module_3_Mathematical_Foundations/        # ๐Ÿšง In Progress
   โ”œโ”€โ”€ README.md
   โ”œโ”€โ”€ 1_Linear_Algebra_Part_1.ipynb
   โ”œโ”€โ”€ 2_Linear_Algebra_Part_2.ipynb
   โ”œโ”€โ”€ 3_Calculus_Part_1.ipynb
   โ”œโ”€โ”€ 4_Calculus_Part_2.ipynb
   โ”œโ”€โ”€ 5_Calculus_Part_3.ipynb
   โ”œโ”€โ”€ 6_Probability.ipynb
   โ”œโ”€โ”€ 7_Statistics.ipynb
.....Progresss.....

๐Ÿ“š Module 1: ML Fundamentals

Status: โœ… Complete
Focus: Building strong foundations in Python and data analysis libraries

๐Ÿ Core Learning Materials

Notebook Status Description Key Topics
1_Learn_Python.ipynb โœ… Python programming essentials Syntax, data structures, control flow
2_Numpy_ML.ipynb โœ… NumPy for numerical computing Arrays, vectorization, mathematical operations
3_Matplotlib_ML.ipynb โœ… Data visualization basics Plots, charts, customization
5_Pandas_Series_ML.ipynb โœ… Working with Pandas Series Data manipulation, indexing
6_Pandas_DataFrame_ML.ipynb โœ… DataFrame operations Data analysis, filtering, grouping
7_Seaborn_ML.ipynb โœ… Advanced statistical visualizations Statistical plots, styling

๐Ÿ“Š Practice Datasets

Real-world datasets for hands-on practice:

  • Sports Analytics: batsman_runs_ipl.csv, kohli_ipl.csv, ipl-matches.csv
  • Entertainment: bollywood.csv, movies.csv
  • Healthcare: diabetes.csv
  • Visualization Datasets: Various CSV files for different plot types
  • Sample Images: fig1.png, fig2.png for image processing examples

๐Ÿ”ง Module 2: Data Preprocessing & ML Use Cases

Focus: Advanced preprocessing techniques and practical ML implementations

๐Ÿ› ๏ธ Data Preprocessing Techniques

Notebook Status Technique Application
1_Importing_Datasets_through_Kaggle_API.ipynb โœ… Data acquisition Kaggle API integration
2_Handling_Missing_Values.ipynb โœ… Data cleaning Imputation strategies
3_Data_Standardization.ipynb โœ… Feature scaling Normalization, standardization
4_Label_Encoding.ipynb โœ… Categorical encoding One-hot, label encoding
5_Train_Test_Split.ipynb โœ… Data splitting Validation strategies
6_Handling_imbalanced_Dataset.ipynb โœ… Class balancing SMOTE, undersampling
7_Feature_extraction_of_Text_data_using_Tf_idf_Vectorizer.ipynb โœ… Text processing TF-IDF, feature extraction
8_Numerical_Dataset_Pre_Processing_Use_Case.ipynb โœ… End-to-end pipeline Complete numerical data workflow
9_Text_Data_Pre_Processing_Use_Case.ipynb โœ… Text preprocessing pipeline Complete text data workflow
Dataset_Links.txt โœ… Resource management Dataset source references
ML Use Case 1. Rock_vs_Mine_Prediction.ipynb โœ… Binary classification Sonar object detection
ML Use case 2. Diabetes_Prediction.ipynb โœ… Medical prediction Healthcare classification
ML Use Case 3. Spam_Mail_Prediction_using_Machine_Learning.ipynb โœ… Text classification Email filtering system

๐Ÿ“ Comprehensive Preprocessing Workflows

Workflow Status Focus Application
8_Numerical_Dataset_Pre_Processing_Use_Case.ipynb โœ… Complete numerical pipeline Feature selection, scaling, outlier handling
9_Text_Data_Pre_Processing_Use_Case.ipynb โœ… End-to-end text processing Tokenization, cleaning, vectorization

๐ŸŽฏ Real-World Use Cases

Project Status Domain Technique Accuracy Focus
๐Ÿชจ Rock vs Mine Prediction โœ… Defense/Marine Logistic Regression Sonar signal classification
๐Ÿฉบ Diabetes Prediction โœ… Healthcare Multiple algorithms Medical diagnosis support
๐Ÿ“ง Spam Mail Detection โœ… Cybersecurity NLP + Classification Email security

๐Ÿ“š Resource Files

File Status Purpose Content
Dataset_Links.txt โœ… Reference guide Curated dataset sources and URLs

๐Ÿ” Module 2 Learning Outcomes

By completing this module, you will:

  • Master essential data preprocessing techniques
  • Handle real-world data challenges (missing values, imbalanced datasets)
  • Implement feature engineering for both numerical and text data
  • Build complete ML pipelines from data acquisition to model evaluation
  • Apply ML to solve practical problems in healthcare, cybersecurity, and defense
  • Understand the importance of proper data splitting and validation
  • Work with external data sources through APIs

๐Ÿ“ˆ Technical Skills Covered

Data Preprocessing:

  • Missing value imputation strategies
  • Feature scaling and standardization
  • Categorical variable encoding
  • Handling imbalanced datasets with SMOTE
  • Text preprocessing and TF-IDF vectorization

Machine Learning Applications:

  • Binary classification problems
  • Multi-class classification
  • Text classification and NLP
  • Medical prediction systems
  • Security applications

Best Practices:

  • Proper train-test splitting
  • Cross-validation techniques
  • Feature selection methods
  • Model evaluation metrics
  • End-to-end pipeline development

๐Ÿงฎ Module 3: Mathematical Foundations for Machine Learning

Status: ๐Ÿšง In Progress
Focus: Essential mathematical concepts underlying machine learning algorithms

๐Ÿ“ Linear Algebra Fundamentals

Notebook Status Focus Area Key Concepts
1_Linear_Algebra_Part_1.ipynb โœ… Core tensor operations Scalars, vectors, matrices, tensor operations
2_Linear_Algebra_Part_2.ipynb โœ… Advanced matrix operations Eigendecomposition, SVD, PCA

๐Ÿ“Š Linear Algebra Part 1 - Core Concepts

Data Structures for Algebra:

  • Scalars (Rank 0 Tensors) in Python, PyTorch, TensorFlow
  • Vectors (Rank 1 Tensors) with NumPy operations
  • Vector norms (L1, L2, Max, Squared L2)
  • Matrices (Rank 2 Tensors) and higher-rank tensors
  • Orthogonal vectors and matrices

Common Tensor Operations:

  • Tensor transposition and arithmetic
  • Reduction operations and dot products
  • Solving linear systems
  • Matrix properties and operations

๐Ÿ” Linear Algebra Part 2 - Advanced Operations

Eigendecomposition:

  • Affine transformations and matrix applications
  • Eigenvectors and eigenvalues in multiple dimensions
  • Matrix determinants and eigendecomposition

Matrix Operations for ML:

  • Singular Value Decomposition (SVD)
  • Image compression applications
  • Moore-Penrose pseudoinverse
  • Principal Component Analysis (PCA)

๐Ÿ“ˆ Calculus for Machine Learning

Notebook Status Focus Area Key Concepts
3_Calculus_Part_1.ipynb โœ… Limits & derivatives Differentiation, automatic differentiation
4_Calculus_Part_2.ipynb โœ… Advanced calculus Partial derivatives, gradients, integrals
5_Calculus_Part_3.ipynb โœ… Symbolic computation SymPy library applications

๐Ÿ”ข Calculus Part 1 - Fundamentals

Limits & Derivatives:

  • Calculus of infinitesimals
  • Computing derivatives through differentiation
  • Automatic differentiation with PyTorch and TensorFlow

โšก Calculus Part 2 - ML Applications

Gradients for Machine Learning:

  • Partial derivatives of multivariate functions
  • Gradients of cost functions w.r.t. model parameters
  • Practical examples with cylinder volume calculations

Integrals:

  • Area under ROC curves
  • Integration applications in ML evaluation

๐Ÿ”ง Calculus Part 3 - Symbolic Math

SymPy Applications:

  • Symbolic mathematical computations
  • Advanced calculus operations
  • Mathematical modeling tools

๐ŸŽฒ Probability & Statistics

Notebook Status Focus Area Key Concepts
6_Probability.ipynb โœ… Probability theory & information Distributions, entropy, information theory
7_Statistics.ipynb โœ… Statistical analysis Frequentist & Bayesian statistics

๐ŸŽฏ Probability & Information Theory

Introduction to Probability:

  • Events, sample spaces, and probability combinations
  • Combinatorics and Law of Large Numbers
  • Expected value and measures of central tendency
  • Statistical measures: mean, median, mode, quantiles
  • Dispersion measures and correlation analysis

ML Distributions:

  • Uniform, Gaussian, and Central Limit Theorem
  • Log-normal, exponential, and Laplace distributions
  • Binomial, multinomial, and Poisson distributions
  • Mixture distributions and sampling techniques

Information Theory:

  • Shannon and differential entropy
  • Kullback-Leibler divergence
  • Cross-entropy applications

๐Ÿ“Š Statistical Analysis

Frequentist Statistics:

  • Central tendency and dispersion measures
  • Gaussian distribution and Central Limit Theorem
  • Statistical testing: z-scores, p-values, t-tests
  • ANOVA and correlation analysis
  • Multiple comparison corrections

Regression Analysis:

  • Linear least squares fitting
  • Ordinary least squares
  • Logistic regression fundamentals

Bayesian Statistics:

  • Bayes' theorem applications
  • Bayesian inference in ML

๐ŸŽ“ Module 3 Learning Outcomes

By completing this module, you will:

  • Master Linear Algebra: Understand tensors, matrix operations, and eigendecomposition
  • Apply Calculus: Use derivatives and gradients for optimization problems
  • Probability Mastery: Work with distributions and information theory
  • Statistical Analysis: Perform hypothesis testing and regression analysis
  • Mathematical ML: Connect mathematical concepts to machine learning applications
  • Tool Proficiency: Use NumPy, PyTorch, TensorFlow, and SymPy for mathematical computing

๐Ÿ”ฌ Technical Skills Covered

Linear Algebra:

  • Tensor operations and manipulations
  • Matrix decomposition techniques (SVD, eigendecomposition)
  • Principal Component Analysis (PCA)
  • Solving linear systems

Calculus:

  • Automatic differentiation
  • Gradient computation for optimization
  • Partial derivatives for multivariate functions
  • Symbolic mathematical computation

Probability & Statistics:

  • Statistical distributions and sampling
  • Hypothesis testing and confidence intervals
  • Bayesian inference
  • Information theory metrics
  • Regression analysis techniques

Programming Libraries:

  • NumPy: Numerical computations and linear algebra
  • PyTorch: Automatic differentiation and tensor operations
  • TensorFlow: Machine learning mathematical operations
  • SymPy: Symbolic mathematics and calculus

๐ŸŽจ Key Features

  • ๐Ÿ“– Comprehensive Documentation: Each notebook includes detailed explanations
  • ๐Ÿ”„ Progressive Learning: Concepts build upon previous knowledge
  • ๐Ÿ› ๏ธ Practical Examples: Real-world datasets and use cases
  • ๐Ÿ“Š Visualization Focus: Strong emphasis on data visualization
  • ๐Ÿ”ฌ Hands-on Practice: Interactive exercises and challenges
  • ๐ŸŽฏ Industry-Relevant: Current ML practices and techniques

๐Ÿ—บ๏ธ Roadmap

๐ŸŽฏ Planned Features (Coming Soon)

  • Module 4: Deep Learning Fundamentals
  • Module 5: MLOps and Model Deployment
  • Interactive web-based tutorials
  • Video explanations for complex concepts
  • Additional real-world projects

๐Ÿ“… Current Focus

  • Enhancing existing notebooks with more examples
  • Adding comprehensive documentation
  • Creating supplementary exercises
  • Improving code quality and best practices

๐Ÿ“Š Progress Tracking

Progress Progress Progress Progress

๐Ÿ“ Recent Updates

  • โœ… Added comprehensive data preprocessing notebooks
  • โœ… Implemented three real-world ML use cases
  • ๐Ÿšง Working on advanced feature engineering techniques
  • ๐Ÿ”„ Continuously improving documentation

Happy Learning! ๐Ÿš€

This repository is continuously updated with new materials and improvements. Check back regularly for the latest content!

Last Updated: August 2025

About

A comprehensive collection of Machine Learning materials, tutorials, and practical implementations designed for learning and mastering ML fundamentals and advanced techniques.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors