Future Interns — Machine Learning Internship | Task 1
CIN: FIT/MAR26/ML6085 · Repository: FUTURE_ML_01
This project builds a sales forecasting system using historical business data and machine learning. The goal is to predict future monthly revenue and present the results in a way that's genuinely useful for business planning — not just model accuracy.
Real businesses use forecasts like this to plan inventory, manage cash flow, prepare staffing, and avoid costly overstocking. This project simulates exactly that use case.
- Generates and prepares 4 years of realistic monthly sales data (2020–2023)
- Engineers time-based features including lag variables, rolling statistics, and cyclical month encoding
- Trains and compares 3 forecasting models
- Evaluates each model using MAE, RMSE, and MAPE
- Produces a 6-month forward forecast (Jan–Jun 2024) with confidence bands
- Delivers clear, business-ready visualizations a non-technical stakeholder can act on
| Model | MAE | RMSE | MAPE |
|---|---|---|---|
| Linear Regression | $234 | $292 | 0.82% ✅ |
| Gradient Boosting | $779 | $933 | 2.73% |
| Random Forest | $1,157 | $1,298 | 4.02% |
Best Model: Linear Regression — 0.82% MAPE (< 1% error on unseen data)
| Month | Predicted Revenue |
|---|---|
| January 2024 | $20,981 |
| February 2024 | $19,886 |
| March 2024 | $25,414 |
| April 2024 | $27,690 |
| May 2024 | $32,056 |
| June 2024 | $33,273 |
📈 Projected H1 2024 Revenue: ~$159,300 — showing a consistent upward growth trend.
FUTURE_ML_01/
│
├── FUTURE_ML_01.ipynb # Main notebook (full pipeline)
├── sales_forecast.png # 6-month forecast chart
├── eda_overview.png # EDA: trends, seasonality, YoY comparison
├── model_evaluation.png # Model comparison + actual vs predicted
├── feature_importance.png # Feature importance (Gradient Boosting)
└── README.md
| Tool | Purpose |
|---|---|
| Python 3 | Core language |
| Pandas & NumPy | Data manipulation & feature engineering |
| Scikit-learn | ML models & evaluation |
| Matplotlib & Seaborn | Visualizations |
| Jupyter Notebook | Development environment |
- ✅ Data cleaning & realistic time-series simulation
- ✅ Time-based feature engineering (lag features, rolling mean/std, cyclical month encoding, trend index)
- ✅ Forecasting using regression & ensemble methods
- ✅ Proper time-series train/test split (no data leakage)
- ✅ Model evaluation: MAE, RMSE, MAPE
- ✅ Iterative future forecasting with rolling predictions
- ✅ Business-friendly charts with annotated forecast values
- Revenue is growing — average monthly sales trending from ~$18K (2020) to ~$30K+ (2024)
- December is peak season — holiday spike adds ~$300/day; stock up in November
- Q1 is the slowest period — ideal time for targeted promotions and clearance
- 3–6 month lag features are the strongest predictors — past sales are the best signal for future demand
- Retraining monthly with fresh data will keep forecast accuracy below 2%
Intern: Fathima Safva Program: Future Interns Machine Learning Fellowship Duration: 10/03/2026 – 10/04/2026 LinkedIn: Future Interns