Skip to content

Immanuel2004/ML-project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

End-to-End ML Project

Student Exam Performance Prediction - End-to-End Machine Learning Project

Overview

This project implements a complete end-to-end machine learning pipeline to predict a student's mathematics score based on multiple demographic and academic indicators. The goal is to identify the most influential factors affecting student performance and to provide a deployable predictive system accessible through a web interface.

The solution is structured for production-grade scalability, maintainability, and modularity, following standard ML engineering best practices.


Objectives

  1. Build a robust regression model that predicts math scores from student characteristics and prior academic data.
  2. Establish a modular and reusable machine learning pipeline for training and inference.
  3. Deploy the model using Flask, enabling real-time predictions via a web-based form.
  4. Ensure maintainable code design with proper logging, exception handling, and configuration management.

Methodology

The project workflow consists of the following stages:

1. Data Ingestion

  • Load the Student Performance dataset (CSV format).
  • Handle missing or inconsistent data if present.
  • Split the dataset into training and test subsets.

2. Data Transformation

  • Encode categorical variables (e.g., gender, ethnicity, parental education).
  • Scale numerical features using standardization techniques.
  • Build and persist the preprocessing pipeline using pickle for future inference.

3. Model Training

  • Train multiple regression algorithms such as Linear Regression, Random Forest, and Gradient Boosting.
  • Evaluate models using Mean Absolute Error (MAE), Mean Squared Error (MSE), and R² score.
  • Select the best-performing model and persist it for deployment.

4. Model Evaluation

  • Evaluate model performance on unseen test data.
  • Analyze prediction errors and bias.
  • Validate the pipeline's reproducibility and stability.

5. Deployment

  • Integrate the trained model and preprocessor into a Flask application.
  • Accept user input through an HTML interface for real-time prediction.
  • Render predicted math scores dynamically on the web page.

Key Features

  • End-to-End ML Pipeline: From raw data ingestion to live model inference.
  • Modular Design: Each stage (data processing, model training, prediction) is encapsulated in separate classes and scripts.
  • Exception Handling and Logging: Centralized error management using custom exception and logging modules.
  • Web Deployment: Interactive prediction via Flask-based web interface.
  • Scalable Architecture: Supports integration with CI/CD and cloud deployment workflows.

Technologies Used

  • Programming Language: Python 3.10+
  • Machine Learning: scikit-learn, pandas, numpy
  • Model Deployment: Flask
  • Data Serialization: pickle
  • Web Frontend: HTML5, Bootstrap 5
  • Development Tools: Git, VS Code, Jupyter Notebook

Installation and Setup

Prerequisites

Ensure the following are installed on your system:

  • Python 3.10 or higher
  • pip package manager
  • Git

Setup Steps

  1. Clone the repository:

    git clone https://github.com/<your-username>/student-performance-prediction.git
    cd "End-to-End ML Project"
  2. Create and activate a virtual environment:

python3 -m venv venv
source venv/bin/activate      # Mac/Linux
venv\Scripts\activate         # Windows
  1. Install dependencies:
pip install -r requirements.txt
  1. Run the Flask application:
python3 app.py
  1. Open the web application in your browser:
http://127.0.0.1:5000/

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors