🛒 A machine learning project that predicts weekly sales for Walmart stores using historical sales data, store features, and external factors like holidays, temperature, and fuel prices — helping retail businesses optimize inventory and planning.
Walmart operates 45 stores across different regions. Accurate sales forecasting helps:
- ❌ Avoid overstock — wasted inventory
- ❌ Avoid understock — lost sales
- ✅ Optimize supply chain decisions
- ✅ Plan staffing and promotions
Goal: Predict weekly department-level sales for each store.
| File | Description | Size |
|---|---|---|
train.csv |
Historical weekly sales per store/dept | Training data |
test.csv |
Stores/depts to predict | Test data |
features.csv |
External features per store per week | Store metadata |
stores.csv |
Store type and size | 45 stores |
| Feature | Description |
|---|---|
Store |
Store number (1-45) |
Dept |
Department number |
Date |
Week of sales |
Weekly_Sales |
Target — sales in USD |
IsHoliday |
Holiday week flag |
Temperature |
Regional temperature |
Fuel_Price |
Fuel cost in the region |
CPI |
Consumer Price Index |
Unemployment |
Unemployment rate |
MarkDown1-5 |
Promotional markdown data |
Type |
Store type (A/B/C) |
Size |
Store size in sq ft |
Data Loading (train + features + stores CSV)
↓
Exploratory Data Analysis
├── Sales trends over time
├── Holiday vs non-holiday impact
└── Store type & size analysis
↓
Feature Engineering
├── Date features (week, month, year)
├── Holiday flag encoding
├── Store type encoding
└── Lag features for time series
↓
Model Training
├── Linear Regression (baseline)
├── Random Forest Regressor
└── Gradient Boosting / XGBoost
↓
Evaluation
├── MAE — Mean Absolute Error
├── RMSE — Root Mean Squared Error
└── WMAE — Weighted MAE (holidays weighted 5x)
- 🎄 Holiday weeks have significantly higher sales
- 🏪 Type A stores (largest) generate highest revenue
- 📅 End of year (Nov-Dec) shows consistent sales spikes
- 📉 Markdowns correlate with increased sales activity
- 🌡️ Temperature & fuel price have minor but measurable impact
| Layer | Technology |
|---|---|
| Language | Python 3.x |
| Data Processing | Pandas, NumPy |
| Visualization | Matplotlib, Seaborn |
| ML Models | Scikit-learn |
| Notebook | Jupyter Notebook |
# Clone the repo
git clone https://github.com/tashfeen786/SalesForecasting.git
cd SalesForecasting
# Install dependencies
pip install pandas numpy matplotlib seaborn scikit-learn jupyter
# Run the notebook
jupyter notebook Task_07_SalesForecast.ipynbSalesForecasting/
│
├── Task_07_SalesForecast.ipynb # Main analysis & modeling notebook
├── Task_07_SalesForecasting.pdf # PDF export
├── train.csv # Training data
├── test.csv # Test data
├── features.csv # External store features
├── stores.csv # Store metadata
└── README.md
- ARIMA / SARIMA — time series specific models
- XGBoost / LightGBM — better accuracy
- Feature importance analysis with SHAP
- Streamlit dashboard for interactive forecasting
- Cross-validation with time-series split
Tashfeen Aziz — AI/ML Engineer & Python Developer
⭐ If you found this project helpful, please give it a star!