This project implements a complete end-to-end machine learning pipeline to predict a student's mathematics score based on multiple demographic and academic indicators. The goal is to identify the most influential factors affecting student performance and to provide a deployable predictive system accessible through a web interface.
The solution is structured for production-grade scalability, maintainability, and modularity, following standard ML engineering best practices.
- Build a robust regression model that predicts math scores from student characteristics and prior academic data.
- Establish a modular and reusable machine learning pipeline for training and inference.
- Deploy the model using Flask, enabling real-time predictions via a web-based form.
- Ensure maintainable code design with proper logging, exception handling, and configuration management.
The project workflow consists of the following stages:
- Load the Student Performance dataset (CSV format).
- Handle missing or inconsistent data if present.
- Split the dataset into training and test subsets.
- Encode categorical variables (e.g., gender, ethnicity, parental education).
- Scale numerical features using standardization techniques.
- Build and persist the preprocessing pipeline using
picklefor future inference.
- Train multiple regression algorithms such as Linear Regression, Random Forest, and Gradient Boosting.
- Evaluate models using Mean Absolute Error (MAE), Mean Squared Error (MSE), and R² score.
- Select the best-performing model and persist it for deployment.
- Evaluate model performance on unseen test data.
- Analyze prediction errors and bias.
- Validate the pipeline's reproducibility and stability.
- Integrate the trained model and preprocessor into a Flask application.
- Accept user input through an HTML interface for real-time prediction.
- Render predicted math scores dynamically on the web page.
- End-to-End ML Pipeline: From raw data ingestion to live model inference.
- Modular Design: Each stage (data processing, model training, prediction) is encapsulated in separate classes and scripts.
- Exception Handling and Logging: Centralized error management using custom exception and logging modules.
- Web Deployment: Interactive prediction via Flask-based web interface.
- Scalable Architecture: Supports integration with CI/CD and cloud deployment workflows.
- Programming Language: Python 3.10+
- Machine Learning: scikit-learn, pandas, numpy
- Model Deployment: Flask
- Data Serialization: pickle
- Web Frontend: HTML5, Bootstrap 5
- Development Tools: Git, VS Code, Jupyter Notebook
Ensure the following are installed on your system:
- Python 3.10 or higher
- pip package manager
- Git
-
Clone the repository:
git clone https://github.com/<your-username>/student-performance-prediction.git cd "End-to-End ML Project"
-
Create and activate a virtual environment:
python3 -m venv venv
source venv/bin/activate # Mac/Linux
venv\Scripts\activate # Windows- Install dependencies:
pip install -r requirements.txt- Run the Flask application:
python3 app.py- Open the web application in your browser:
http://127.0.0.1:5000/