A comprehensive implementation of retail demand forecasting using LightGBM, featuring advanced time series feature engineering and custom evaluation metrics.
- Project Overview
- Technical Architecture
- Installation & Setup
- Implementation Details
- Feature Engineering
- Model Training
- Results & Visualization
- Time Series Processing
- Date-based feature extraction
- Lag feature generation
- Rolling mean calculations
- Exponential weighted means
- Advanced Analytics
- Store-level analysis
- Item-level forecasting
- Custom SMAPE evaluation
- Robust validation strategy
graph TD
A[Raw Sales Data] --> B[Feature Engineering]
B --> C1[Time Features]
B --> C2[Lag Features]
B --> C3[Rolling Means]
B --> C4[EWM Features]
C1,C2,C3,C4 --> D[LightGBM Model]
D --> E[Sales Forecasting]
E --> F[Performance Evaluation]
# requirements.txt
numpy>=1.20.0
pandas>=1.3.0
lightgbm>=3.3.0
matplotlib>=3.4.0
seaborn>=0.11.0
statsmodels>=0.13.0- Python 3.8+
- 8GB RAM (minimum)
- Storage for data processing
# Install dependencies
pip install numpy pandas lightgbm matplotlib seaborn statsmodels
# Load data
train = pd.read_csv('train.csv', parse_dates=['date'])
test = pd.read_csv('test.csv', parse_dates=['date'])def create_time_features(df):
df['month'] = df.date.dt.month
df['day_of_month'] = df.date.dt.day
df['day_of_year'] = df.date.dt.dayofyear
df['week_of_year'] = df.date.dt.weekofyear
df['day_of_week'] = df.date.dt.dayofweek
df['year'] = df.date.dt.year
df["is_wknd"] = df.date.dt.weekday // 4
df['is_month_start'] = df.date.dt.is_month_start.astype(int)
df['is_month_end'] = df.date.dt.is_month_end.astype(int)
return dfdef lag_features(dataframe, lags):
"""
Creates lagged features with random noise.
Args:
dataframe (pd.DataFrame): Input dataframe
lags (list): List of lag periods
"""
for lag in lags:
dataframe['sales_lag_' + str(lag)] = dataframe.groupby(["store", "item"])['sales'].transform(
lambda x: x.shift(lag)) + random_noise(dataframe)
return dataframedef roll_mean_features(dataframe, windows):
"""
Creates rolling mean features with triangular window.
Args:
dataframe (pd.DataFrame): Input dataframe
windows (list): List of window sizes
"""
for window in windows:
dataframe['sales_roll_mean_' + str(window)] = dataframe.groupby(["store", "item"])['sales'].transform(
lambda x: x.shift(1).rolling(window=window, min_periods=10, win_type="triang").mean())
return dataframedef smape(preds, target):
"""
Calculates Symmetric Mean Absolute Percentage Error.
Args:
preds: Model predictions
target: Actual values
Returns:
float: SMAPE score
"""
n = len(preds)
masked_arr = ~((preds == 0) & (target == 0))
preds, target = preds[masked_arr], target[masked_arr]
num = np.abs(preds - target)
denom = np.abs(preds) + np.abs(target)
smape_val = (200 * np.sum(num / denom)) / n
return smape_vallgb_params = {
'metric': {'mae'},
'num_leaves': 10,
'learning_rate': 0.02,
'feature_fraction': 0.8,
'max_depth': 5,
'verbose': 0,
'num_boost_round': 2000,
'early_stopping_rounds': 200,
'nthread': -1
}def plot_store_forecast(forecast_df, store_id, item_id):
"""
Plots forecasted sales for specific store and item.
Args:
forecast_df: DataFrame with forecasts
store_id: Store identifier
item_id: Item identifier
"""
forecast_df[
(forecast_df.store == store_id) &
(forecast_df.item == item_id)
].set_index("date").sales.plot(
figsize=(20,9),
legend=True,
label=f"Store {store_id} Item {item_id} Forecast"
)- Fork repository
- Create feature branch
- Implement changes
- Add tests
- Submit pull request
- Follow PEP 8
- Document all functions
- Maintain clean notebook outputs
This project is licensed under the MIT License.