This project delivers a complete end-to-end Sales Insights Analysis using a multi-table e-commerce dataset. It includes Python-based data loading, cleaning, preprocessing, MySQL data modeling, advanced SQL analytics, and visualization-driven insights
- Multiple e-commerce datasets were loaded, cleaned, and preprocessed using Python.
- Missing values, date formats, and duplicates were handled to prepare high-quality data.
- Cleaned tables were uploaded into a MySQL database for structured analysis.
- SQL queries were used to analyze sales trends, customer behavior, and seller performance.
- Python visualizations helped present key insights clearly and effectively.
The project uses 7 interconnected tables:
- customers β customer_id, city, state
- geolocation β latitude & longitude of zip codes
- orders β timestamps and delivery details
- order_items β product, seller, quantity, price
- products β product attributes (size, weight, length, etc.)
- payments β payment type, installments, price
- sellers β seller location details
π Dataset used: E-Commerce Dataset
π» Project Code: View Notebook
The raw e-commerce dataset was cleaned and preprocessed using Python and SQL to prepare it for analysis. The following steps were performed:
- Loaded Data: Imported 7 tables (customers, geolocation, orders, order_items, payments, products, sellers) and checked their structure.
- Handled Missing Values: Filled numerical columns with medians and categorical columns with "unknown".
- Removed Duplicates: Checked all tables and removed duplicate rows where necessary.
- Converted Dates: Converted relevant columns in orders to datetime format.
- Uploaded to MySQL: Cleaned tables were stored in a MySQL database for efficient querying and analysis.
This bar chart shows the number of orders placed each month in 2018, highlighting peak and low sales periods.
This bar chart displays the average number of products per order for different customer cities, highlighting cities where customers tend to buy more items per order.
This analysis calculates the contribution of each product category to the total revenue. The bar chart highlights the top 5 categories generating the most sales.
This table shows the cumulative sales for each month in 2017, highlighting monthly sales growth trends and seasonal patterns throughout the year.
This section shows the top 3 customers who spent the most money in each year. It highlights the highest-value customers and helps identify key buyers over time.
- Python: Pandas, NumPy, Matplotlib, Seaborn for data manipulation, analysis, and visualization
- MySQL Server: For structured data storage, complex queries, and aggregation
- Jupyter Notebook / VS Code: Development and analysis environment
- SQLAlchemy / PyMySQL: Python-MySQL integration for uploading and querying datasets
- Imported 7 interconnected e-commerce tables using Pandas
- Reviewed structure and summary statistics using .info(), .describe(), and .head()
- Checked for missing values, duplicates, and inconsistent data
- Filled missing numeric values with median; categorical fields with "unknown"
- Removed duplicate rows where necessary
- Converted date columns to datetime format for analysis
- Uploaded clean tables to MySQL for structured querying
- Basic Metrics: Number of orders per year, unique customer cities, orders per state
- Intermediate Analysis: Average products per order per city, revenue share by product category, top sellers
- Advanced Analysis: Cumulative monthly sales, moving average of customer order value, year-over-year growth, top 3 customers by spending per year
- Bar charts for monthly orders, customer distribution, revenue contribution by category, and top sellers
- Line charts / cumulative plots for monthly sales trends
- Heatmaps or correlation charts for price vs purchase frequency
- Orders show clear peaks and dips over months in 2018
- Cumulative monthly sales reveal growth trends and seasonal patterns
- Certain cities have higher average products per order
- Top 3 customers per year contribute significantly to total revenue
- Top product categories contribute the highest revenue
- Product price and purchase frequency show moderate correlation
- A notable percentage of orders were paid in installments
Email: vipinsuryavanshi.vs@gmail.com
LinkedIn: Vipin Suryavanshi
GitHub: Vipin-s27