Skip to content

tashfeen786/SalesForecasting

Repository files navigation

📈 Sales Forecasting — Walmart Store Sales Prediction

Python Jupyter Scikit-Learn Domain Dataset License

🛒 A machine learning project that predicts weekly sales for Walmart stores using historical sales data, store features, and external factors like holidays, temperature, and fuel prices — helping retail businesses optimize inventory and planning.


🎯 Problem Statement

Walmart operates 45 stores across different regions. Accurate sales forecasting helps:

  • ❌ Avoid overstock — wasted inventory
  • ❌ Avoid understock — lost sales
  • ✅ Optimize supply chain decisions
  • ✅ Plan staffing and promotions

Goal: Predict weekly department-level sales for each store.


📂 Dataset — Walmart Store Sales (Kaggle)

File Description Size
train.csv Historical weekly sales per store/dept Training data
test.csv Stores/depts to predict Test data
features.csv External features per store per week Store metadata
stores.csv Store type and size 45 stores

Key Features

Feature Description
Store Store number (1-45)
Dept Department number
Date Week of sales
Weekly_Sales Target — sales in USD
IsHoliday Holiday week flag
Temperature Regional temperature
Fuel_Price Fuel cost in the region
CPI Consumer Price Index
Unemployment Unemployment rate
MarkDown1-5 Promotional markdown data
Type Store type (A/B/C)
Size Store size in sq ft

🔄 Pipeline

Data Loading (train + features + stores CSV)
        ↓
Exploratory Data Analysis
├── Sales trends over time
├── Holiday vs non-holiday impact
└── Store type & size analysis
        ↓
Feature Engineering
├── Date features (week, month, year)
├── Holiday flag encoding
├── Store type encoding
└── Lag features for time series
        ↓
Model Training
├── Linear Regression (baseline)
├── Random Forest Regressor
└── Gradient Boosting / XGBoost
        ↓
Evaluation
├── MAE — Mean Absolute Error
├── RMSE — Root Mean Squared Error
└── WMAE — Weighted MAE (holidays weighted 5x)

🧠 Key Insights from Data

  • 🎄 Holiday weeks have significantly higher sales
  • 🏪 Type A stores (largest) generate highest revenue
  • 📅 End of year (Nov-Dec) shows consistent sales spikes
  • 📉 Markdowns correlate with increased sales activity
  • 🌡️ Temperature & fuel price have minor but measurable impact

🛠️ Tech Stack

Layer Technology
Language Python 3.x
Data Processing Pandas, NumPy
Visualization Matplotlib, Seaborn
ML Models Scikit-learn
Notebook Jupyter Notebook

🚀 Getting Started

# Clone the repo
git clone https://github.com/tashfeen786/SalesForecasting.git
cd SalesForecasting

# Install dependencies
pip install pandas numpy matplotlib seaborn scikit-learn jupyter

# Run the notebook
jupyter notebook Task_07_SalesForecast.ipynb

🏗️ Project Structure

SalesForecasting/
│
├── Task_07_SalesForecast.ipynb    # Main analysis & modeling notebook
├── Task_07_SalesForecasting.pdf   # PDF export
├── train.csv                      # Training data
├── test.csv                       # Test data
├── features.csv                   # External store features
├── stores.csv                     # Store metadata
└── README.md

🔮 Future Improvements

  • ARIMA / SARIMA — time series specific models
  • XGBoost / LightGBM — better accuracy
  • Feature importance analysis with SHAP
  • Streamlit dashboard for interactive forecasting
  • Cross-validation with time-series split

👨‍💻 Author

Tashfeen Aziz — AI/ML Engineer & Python Developer

LinkedIn GitHub Email


If you found this project helpful, please give it a star!

About

📈 Walmart weekly sales forecasting using ML | Time series · Random Forest · Feature Engineering | Python · Scikit-Learn · Kaggle

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors