Skip to content

vipin-s27/E-Commerce-Sales-Analysis-Python-SQL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

19 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ›’ E-Commerce Sales Insights Analysis using Python & SQL

This project delivers a complete end-to-end Sales Insights Analysis using a multi-table e-commerce dataset. It includes Python-based data loading, cleaning, preprocessing, MySQL data modeling, advanced SQL analytics, and visualization-driven insights

πŸ“Œ Overview

  • Multiple e-commerce datasets were loaded, cleaned, and preprocessed using Python.
  • Missing values, date formats, and duplicates were handled to prepare high-quality data.
  • Cleaned tables were uploaded into a MySQL database for structured analysis.
  • SQL queries were used to analyze sales trends, customer behavior, and seller performance.
  • Python visualizations helped present key insights clearly and effectively.

πŸ“Š Dataset Summary

The project uses 7 interconnected tables:

  • customers – customer_id, city, state
  • geolocation – latitude & longitude of zip codes
  • orders – timestamps and delivery details
  • order_items – product, seller, quantity, price
  • products – product attributes (size, weight, length, etc.)
  • payments – payment type, installments, price
  • sellers – seller location details

πŸ“‚ Dataset used: E-Commerce Dataset

πŸ’» Project Code: View Notebook

🧹 Data Cleaning & Preprocessing

The raw e-commerce dataset was cleaned and preprocessed using Python and SQL to prepare it for analysis. The following steps were performed:

  • Loaded Data: Imported 7 tables (customers, geolocation, orders, order_items, payments, products, sellers) and checked their structure.
  • Handled Missing Values: Filled numerical columns with medians and categorical columns with "unknown".
  • Removed Duplicates: Checked all tables and removed duplicate rows where necessary.
  • Converted Dates: Converted relevant columns in orders to datetime format.
  • Uploaded to MySQL: Cleaned tables were stored in a MySQL database for efficient querying and analysis.

πŸ“ˆ Key Sales Metrics & Visual Analysis

1️⃣ Number of Orders per Month in 2018

image

This bar chart shows the number of orders placed each month in 2018, highlighting peak and low sales periods.

2️⃣ Average Number of Products per Order by Customer City

image

This bar chart displays the average number of products per order for different customer cities, highlighting cities where customers tend to buy more items per order.

3️⃣ Percentage of Total Revenue Contributed by Each Product Category

image

This analysis calculates the contribution of each product category to the total revenue. The bar chart highlights the top 5 categories generating the most sales.

4️⃣ Cumulative monthly sales – 2017

Screenshot 2025-12-03 191004

This table shows the cumulative sales for each month in 2017, highlighting monthly sales growth trends and seasonal patterns throughout the year.

5️⃣ Top 3 Customers by Spending per Year

image

This section shows the top 3 customers who spent the most money in each year. It highlights the highest-value customers and helps identify key buyers over time.

πŸ› οΈ Tools & Technologies

  • Python: Pandas, NumPy, Matplotlib, Seaborn for data manipulation, analysis, and visualization
  • MySQL Server: For structured data storage, complex queries, and aggregation
  • Jupyter Notebook / VS Code: Development and analysis environment
  • SQLAlchemy / PyMySQL: Python-MySQL integration for uploading and querying datasets

πŸ”§ Project Workflow

1. Data Loading & Exploration

  • Imported 7 interconnected e-commerce tables using Pandas
  • Reviewed structure and summary statistics using .info(), .describe(), and .head()
  • Checked for missing values, duplicates, and inconsistent data

2. Data Cleaning & Preprocessing

  • Filled missing numeric values with median; categorical fields with "unknown"
  • Removed duplicate rows where necessary
  • Converted date columns to datetime format for analysis
  • Uploaded clean tables to MySQL for structured querying

3. SQL Analysis

  • Basic Metrics: Number of orders per year, unique customer cities, orders per state
  • Intermediate Analysis: Average products per order per city, revenue share by product category, top sellers
  • Advanced Analysis: Cumulative monthly sales, moving average of customer order value, year-over-year growth, top 3 customers by spending per year

4. Visualization

  • Bar charts for monthly orders, customer distribution, revenue contribution by category, and top sellers
  • Line charts / cumulative plots for monthly sales trends
  • Heatmaps or correlation charts for price vs purchase frequency

πŸ’‘ Key Insights from the Analysis

1. Sales Trends & Seasonality

  • Orders show clear peaks and dips over months in 2018
  • Cumulative monthly sales reveal growth trends and seasonal patterns

2. Customer Behavior

  • Certain cities have higher average products per order
  • Top 3 customers per year contribute significantly to total revenue

3. Product & Revenue Insights

  • Top product categories contribute the highest revenue
  • Product price and purchase frequency show moderate correlation

4. Payment Insights

  • A notable percentage of orders were paid in installments

πŸ“¬ Contact

Email: vipinsuryavanshi.vs@gmail.com

LinkedIn: Vipin Suryavanshi

GitHub: Vipin-s27

About

E-Commerce sales analysis using Python for data cleaning and SQL for generating insights, with clear visualizations.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors