Customer Segmentation using K-Means Clustering

This project implements a customer segmentation analysis using K-Means clustering. It groups customers based on their behavioral and financial attributes (Age, Annual Income, and Spending Score) to help identify distinct market segments.

Features

Synthetic Data Generation: Automatically generates a realistic dataset if none exists.
Data Preprocessing: Handles feature scaling using StandardScaler.
Optimal Cluster Determination:
- Elbow Method: Visualization to determine the optimal $K$.
- Silhouette Score: Quantitative metric to validate cluster separation.
K-Means Clustering: Groups customers into optimized segments.
Visualization:
- 2D Scatter Plots (Income vs. Spending Score, etc.)
- 3D Scatter Plots (Age vs. Income vs. Spending Score)
Insight Extraction: Exports segmented data to CSV and prints cluster profiles.

Project Structure

Customer_Segmentation/
├── data/
│   ├── customer_data.csv       # Input dataset (generated)
│   └── segmented_customers.csv # Output with Cluster labels
├── src/
│   ├── data_loader.py          # Data generation and loading
│   ├── preprocessing.py        # Feature scaling
│   ├── clustering.py           # K-Means logic
│   └── visualization.py        # Plotting functions
├── main.py                     # Entry point script
├── requirements.txt            # Python dependencies
└── README.md                   # Project documentation

Installation

Clone the repository (if applicable) or navigate to the project folder.
Install dependencies:
```
pip install -r requirements.txt
```

Usage

Run the main analysis script:

python main.py

What to Expect

The script will generate data/customer_data.csv if it doesn't exist.
It will display the Elbow Method and Silhouette Score plots.
- Note: You must close these plot windows for the script to continue.
It will automatically select the optimal number of clusters (typically between 3 and 6).
It will perform clustering and display 2D and 3D visualizations.
Finally, it saves the results to data/segmented_customers.csv and prints a summary of each cluster's characteristics.

Technologies Used

Python 3.x
Pandas (Data manipulation)
Scikit-learn (Clustering and Preprocessing)
Matplotlib & Seaborn (Visualization)
NumPy (Numerical operations)

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
src		src
.gitignore		.gitignore
README.md		README.md
db1.png		db1.png
db2.png		db2.png
db3.png		db3.png
db4.png		db4.png
desktop.ini		desktop.ini
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Customer Segmentation using K-Means Clustering

Features

Project Structure

Installation

Usage

What to Expect

Technologies Used

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Customer Segmentation using K-Means Clustering

Features

Project Structure

Installation

Usage

What to Expect

Technologies Used

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages