This project implements a customer segmentation analysis using K-Means clustering. It groups customers based on their behavioral and financial attributes (Age, Annual Income, and Spending Score) to help identify distinct market segments.
- Synthetic Data Generation: Automatically generates a realistic dataset if none exists.
-
Data Preprocessing: Handles feature scaling using
StandardScaler. -
Optimal Cluster Determination:
-
Elbow Method: Visualization to determine the optimal
$K$ . - Silhouette Score: Quantitative metric to validate cluster separation.
-
Elbow Method: Visualization to determine the optimal
- K-Means Clustering: Groups customers into optimized segments.
-
Visualization:
- 2D Scatter Plots (Income vs. Spending Score, etc.)
- 3D Scatter Plots (Age vs. Income vs. Spending Score)
- Insight Extraction: Exports segmented data to CSV and prints cluster profiles.
Customer_Segmentation/
├── data/
│ ├── customer_data.csv # Input dataset (generated)
│ └── segmented_customers.csv # Output with Cluster labels
├── src/
│ ├── data_loader.py # Data generation and loading
│ ├── preprocessing.py # Feature scaling
│ ├── clustering.py # K-Means logic
│ └── visualization.py # Plotting functions
├── main.py # Entry point script
├── requirements.txt # Python dependencies
└── README.md # Project documentation
-
Clone the repository (if applicable) or navigate to the project folder.
-
Install dependencies:
pip install -r requirements.txt
Run the main analysis script:
python main.py- The script will generate
data/customer_data.csvif it doesn't exist. - It will display the Elbow Method and Silhouette Score plots.
- Note: You must close these plot windows for the script to continue.
- It will automatically select the optimal number of clusters (typically between 3 and 6).
- It will perform clustering and display 2D and 3D visualizations.
- Finally, it saves the results to
data/segmented_customers.csvand prints a summary of each cluster's characteristics.



