python -m venv .venv ..venv\Scripts\Activate.ps1
RetailSense360 is a retail analytics project that combines data preprocessing, customer segmentation (RFM), market basket analysis, customer lifetime value (LTV) modeling, SQL business analysis, and a Power BI dashboard.
- Clean and standardize retail customer shopping data.
- Build customer segments using RFM and clustering.
- Discover product affinity rules for cross-sell opportunities.
- Train an LTV model for customer value estimation.
- Enable SQL-driven business analysis on PostgreSQL.
- Visualize KPIs and insights in Power BI.
RetailSense3601/
|- data/
| |- raw/
| | |- customer_shopping_behavior.csv
| |- processed/
| | |- cleaned_data.csv
| | |- rfm_output.csv
|- src/
| |- data_preprocessing.py
| |- db_connection.py
| |- rfm_analysis.py
| |- market_basket.py
| |- ltv_model.py
|- notebooks/
| |- 01_EDA.ipynb
| |- 02_RFM_Analysis.ipynb
| |- 03_Market_Basket.ipynb
| |- 04_LTV_Prediction.ipynb
|- models/
| |- ltv_model.pkl
|- reports/
| |- Problem Overview.pdf
| |- sqlquries.sql
|- Dashboard/
| |- Retail Sense 360 Dashboard.pbix
|- .env
|- .gitignore
|- README.md
- Python (pandas, scikit-learn, SQLAlchemy, joblib, python-dotenv)
- PostgreSQL (data storage and SQL analytics)
- Jupyter Notebook (EDA and analysis workflow)
- Power BI (interactive dashboarding)
- MLXtend (Apriori and association rules for market basket analysis)
- Create and activate virtual environment:
python -m venv .venv
.\.venv\Scripts\Activate.ps1- Install dependencies:
pip install pandas scikit-learn joblib sqlalchemy python-dotenv psycopg2-binary mlxtend jupyter- Configure environment variables in
.env:
DB_USER=your_user
DB_PASSWORD=your_password
DB_HOST=localhost
DB_PORT=5432
DB_NAME=your_database- Run preprocessing on raw data to generate clean features (see
src/data_preprocessing.pyand notebooks). - Perform EDA and analysis through notebooks in
notebooks/. - Generate RFM and segmentation using
src/rfm_analysis.py. - Run association-rule mining using
src/market_basket.py. - Train LTV model using
src/ltv_model.py(model saved asmodels/ltv_model.pkl). - Upload processed CSV to PostgreSQL using:
python src/db_connection.py- Execute business SQL queries from
reports/sqlquries.sql. - Open
Dashboard/Retail Sense 360 Dashboard.pbixin Power BI Desktop for reporting.
data_preprocessing.py: Cleans nulls/duplicates, normalizes columns, engineers frequency and monetary features.rfm_analysis.py: Computes customer-level RFM metrics and applies KMeans clustering.market_basket.py: Builds Apriori frequent itemsets and association rules with robust empty-case handling.ltv_model.py: Trains RandomForestRegressor for LTV prediction and exports trained model.db_connection.py: Loads DB credentials from.env, creates SQLAlchemy engine, and uploads processed CSV to PostgreSQL.
The SQL script includes analysis for:
- Revenue by gender, category, location, payment method, and season
- Subscriber vs non-subscriber spend comparison
- Discount and promo-code behavior
- Top products by rating and discount usage
- Repeat buyers and customer segmentation logic
- Window functions (running totals, ranking, lag analysis)
Dashboard/Retail Sense 360 Dashboard.pbix contains the Power BI dashboard layer to present key business and customer insights derived from processed data and SQL analysis.
- Keep raw data immutable under
data/raw/. - Regenerate
data/processed/cleaned_data.csvafter preprocessing changes. - Re-train
models/ltv_model.pklwhen feature logic or source data changes. - Consider adding a
requirements.txtfile for reproducible environment setup. - Consider adding a single
main.pypipeline entrypoint for end-to-end execution.