This project focuses on segmenting customers based on their purchasing behavior using RFM analysis and K-Means clustering. The goal is to identify the most profitable products and the characteristics of the most loyal customers. The insights derived will help the marketing team optimize targeting strategies, enhance product positioning, and maximize profitability.
- Identify the top 3 most profitable products based on revenue.
- Analyze customer loyalty based on purchasing behavior.
- Understand lifestage and spending patterns of the most loyal customers.
- Provide data-driven recommendations for marketing and sales strategies.
- Top Selling Products:
- Dorito Corn Chip Supreme
- Smiths Crinkle Chips (Original)
- Smiths Crinkle Chips (Salt & Vinegar)
- Most Loyal Customers:
- Older Singles/Couples & Older Families
- Primarily in the Mainstream & Budget segments
- Spending Patterns:
- Older families in the budget segment are the highest spenders.
- Younger singles spend the least.
- Implement personalized marketing campaigns for high-value customer segments.
- Introduce bundling strategies for best-selling products.
- Expand customer loyalty programs to encourage repeat purchases.
purchase_behaviour.csv - Contains customer demographic details based on loyalty card usage.
transaction_data.csv - Records all customer transactions, including products purchased and spending details.
https://drive.google.com/drive/folders/1JLHEIQp95b6Jo3iiXGYfIdKUrs8uWJn1
- Converted date format from integer to datetime.
- Removed anomalies (negative/zero values in
TOT_SALES&PROD_QTY). - Dropped duplicates and standardized product names.
- Merged transaction data with customer demographics.
- Identified top-selling and most profitable products.
- Analyzed customer spending patterns by lifestyle & premium category.
- Visualized transaction trends using bar charts and heatmaps.
- Recency: Days since last purchase.
- Frequency: Total transactions per customer.
- Monetary: Total spending per customer.
- Identified top loyal customers based on RFM scores.
- Standardized RFM data for clustering.
- Determined optimal clusters using Elbow Method & Silhouette Score.
- Segmented customers into 3 behavior-based groups.
- Analyzed lifestage & premium category distribution in clusters.
- Scatter Plot: Customer segments based on total transactions & spending
- 3D Plot: RFM-based customer segmentation
- Bar Charts: Spending analysis per cluster
- Heatmap: Spending patterns by LIFESTAGE & PREMIUM_CUSTOMER
- Programming Language: Python 🐍
- Libraries Used:
pandas- Data manipulationnumpy- Numerical computingmatplotlib&seaborn- Data visualizationscikit-learn- Machine learning (K-Means, StandardScaler)
To execute the analysis, open the Jupyter Notebook and run all cells:
jupyter notebook customer_segmentation.ipynb- Data Preprocessing: Load and clean customer transaction data.
- Feature Engineering: Compute RFM (Recency, Frequency, Monetary) values.
- Clustering: Apply K-Means clustering to segment customers.
- Visualization: Generate 2D & 3D cluster plots.
- Analysis: Interpret cluster characteristics and spending behavior.
- Cluster 0 → Young Singles/Couples (Low Spending, Low Transactions)
- Cluster 1 → Older Families (High Spending, High Transactions)
- Cluster 2 → Older Singles/Couples (Moderate Spending, Moderate Transactions)
- 2D Scatter Plot: Transaction Frequency vs. Spending
- 3D RFM Segmentation: Recency, Frequency, Monetary
- Heatmap: Spending by LIFESTAGE & PREMIUM_CUSTOMER
- Adjust the number of clusters in K-Means (
n_clusters=3). - Try different clustering techniques (Hierarchical, DBSCAN).
- Tune feature scaling and analyze the impact.
- Implement personalized marketing campaigns for high-value customer segments.
- Introduce bundling strategies for best-selling products.
- Expand customer loyalty programs to encourage repeat purchases.






