Skip to content

captainmmd1304/Customer_Segmentation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Customer Segmentation using K-Means Clustering

This project implements a customer segmentation analysis using K-Means clustering. It groups customers based on their behavioral and financial attributes (Age, Annual Income, and Spending Score) to help identify distinct market segments.

Features

  • Synthetic Data Generation: Automatically generates a realistic dataset if none exists.
  • Data Preprocessing: Handles feature scaling using StandardScaler.
  • Optimal Cluster Determination:
    • Elbow Method: Visualization to determine the optimal $K$.
    • Silhouette Score: Quantitative metric to validate cluster separation.
  • K-Means Clustering: Groups customers into optimized segments.
  • Visualization:
    • 2D Scatter Plots (Income vs. Spending Score, etc.)
    • 3D Scatter Plots (Age vs. Income vs. Spending Score)
  • Insight Extraction: Exports segmented data to CSV and prints cluster profiles.

Project Structure

Customer_Segmentation/
├── data/
│   ├── customer_data.csv       # Input dataset (generated)
│   └── segmented_customers.csv # Output with Cluster labels
├── src/
│   ├── data_loader.py          # Data generation and loading
│   ├── preprocessing.py        # Feature scaling
│   ├── clustering.py           # K-Means logic
│   └── visualization.py        # Plotting functions
├── main.py                     # Entry point script
├── requirements.txt            # Python dependencies
└── README.md                   # Project documentation

Installation

  1. Clone the repository (if applicable) or navigate to the project folder.

  2. Install dependencies:

    pip install -r requirements.txt

Usage

Run the main analysis script:

python main.py

What to Expect

  1. The script will generate data/customer_data.csv if it doesn't exist.
  2. It will display the Elbow Method and Silhouette Score plots.
    • Note: You must close these plot windows for the script to continue.
  3. It will automatically select the optimal number of clusters (typically between 3 and 6).
  4. It will perform clustering and display 2D and 3D visualizations.
  5. Finally, it saves the results to data/segmented_customers.csv and prints a summary of each cluster's characteristics.

Technologies Used

  • Python 3.x
  • Pandas (Data manipulation)
  • Scikit-learn (Clustering and Preprocessing)
  • Matplotlib & Seaborn (Visualization)
  • NumPy (Numerical operations) dashboard preview dashboard preview dashboard preview dashboard preview

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages