This project utilizes Unsupervised Machine Learning to identify distinct customer segments within a retail dataset. By segmenting customers based on multiple dimensions—such as age, annual income, and spending habits—we provide rich, actionable insights that enable marketing teams to design highly targeted and effective campaigns.
- Unsupervised Learning: Discovering hidden structures in data without the need for pre-defined labels.
- Clustering Fundamentals & K-Means: A deep dive into how the K-Means algorithm groups similar data points by minimizing intra-cluster variance.
- The Elbow Method: A critical technique used to programmatically determine the optimal number of clusters (
k) by analyzing the Within-Cluster Sum of Squares (WCSS). - Multi-dimensional EDA: Using 2D and 3D visualizations to explore complex relationships between features.
- Hierarchical Clustering: Implementing an alternative clustering strategy and using Dendrograms to validate the optimal number of segments.
- Analyzed distributions of Age, Annual Income, and Spending Scores.
- Utilized 2D and 3D scatter plots to visualize natural groupings and correlations before modeling.
Recognizing that segmentation is not a one-size-fits-all process, this project develops two distinct models:
- Income-based Model: Segments customers based on Annual Income vs. Spending Score.
- Age-based Model: Segments customers based on Age vs. Spending Score.
- Applied the Elbow Method to ensure the mathematical validity of the chosen cluster counts.
- Introduced Hierarchical Clustering as a secondary validation tool, using a dendrogram to confirm the hierarchical relationships within the data.
Translated abstract clusters into quantitative personas (e.g., "Target Customers," "Sensible Spenders," "Careless Consumers"), providing precise insights for targeted marketing.
- Python
- Pandas & NumPy for data manipulation.
- Scikit-Learn for K-Means and Hierarchical Clustering algorithms.
- Seaborn & Matplotlib for statistical data visualization.
- Plotly Express for interactive 3D cluster exploration.
This project demonstrates how different clustering approaches uncover different facets of customer behavior. By combining K-Means and Hierarchical methods, we achieve a more nuanced understanding of the customer base, moving beyond simple demographics to behavior-based insights.