Skip to content

fsafva13-coder/FUTURE_ML_01

Repository files navigation

📈 Sales & Demand Forecasting for Businesses

Future Interns — Machine Learning Internship | Task 1 CIN: FIT/MAR26/ML6085 · Repository: FUTURE_ML_01


🔍 Overview

This project builds a sales forecasting system using historical business data and machine learning. The goal is to predict future monthly revenue and present the results in a way that's genuinely useful for business planning — not just model accuracy.

Real businesses use forecasts like this to plan inventory, manage cash flow, prepare staffing, and avoid costly overstocking. This project simulates exactly that use case.


🎯 What This Project Does

  • Generates and prepares 4 years of realistic monthly sales data (2020–2023)
  • Engineers time-based features including lag variables, rolling statistics, and cyclical month encoding
  • Trains and compares 3 forecasting models
  • Evaluates each model using MAE, RMSE, and MAPE
  • Produces a 6-month forward forecast (Jan–Jun 2024) with confidence bands
  • Delivers clear, business-ready visualizations a non-technical stakeholder can act on

📊 Results

Model MAE RMSE MAPE
Linear Regression $234 $292 0.82% ✅
Gradient Boosting $779 $933 2.73%
Random Forest $1,157 $1,298 4.02%

Best Model: Linear Regression — 0.82% MAPE (< 1% error on unseen data)

6-Month Forecast (Jan–Jun 2024)

Month Predicted Revenue
January 2024 $20,981
February 2024 $19,886
March 2024 $25,414
April 2024 $27,690
May 2024 $32,056
June 2024 $33,273

📈 Projected H1 2024 Revenue: ~$159,300 — showing a consistent upward growth trend.


🗂️ Project Structure

FUTURE_ML_01/
│
├── FUTURE_ML_01.ipynb        # Main notebook (full pipeline)
├── sales_forecast.png        # 6-month forecast chart
├── eda_overview.png          # EDA: trends, seasonality, YoY comparison
├── model_evaluation.png      # Model comparison + actual vs predicted
├── feature_importance.png    # Feature importance (Gradient Boosting)
└── README.md

🛠️ Tools & Libraries

Tool Purpose
Python 3 Core language
Pandas & NumPy Data manipulation & feature engineering
Scikit-learn ML models & evaluation
Matplotlib & Seaborn Visualizations
Jupyter Notebook Development environment

✨ Key Features Implemented

  • ✅ Data cleaning & realistic time-series simulation
  • ✅ Time-based feature engineering (lag features, rolling mean/std, cyclical month encoding, trend index)
  • ✅ Forecasting using regression & ensemble methods
  • ✅ Proper time-series train/test split (no data leakage)
  • ✅ Model evaluation: MAE, RMSE, MAPE
  • ✅ Iterative future forecasting with rolling predictions
  • ✅ Business-friendly charts with annotated forecast values

💡 Business Takeaways

  1. Revenue is growing — average monthly sales trending from ~$18K (2020) to ~$30K+ (2024)
  2. December is peak season — holiday spike adds ~$300/day; stock up in November
  3. Q1 is the slowest period — ideal time for targeted promotions and clearance
  4. 3–6 month lag features are the strongest predictors — past sales are the best signal for future demand
  5. Retraining monthly with fresh data will keep forecast accuracy below 2%

👩‍💻 About

Intern: Fathima Safva Program: Future Interns Machine Learning Fellowship Duration: 10/03/2026 – 10/04/2026 LinkedIn: Future Interns

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors