This project involves an end-to-end data analysis of a retail sales dataset using Microsoft SQL Server (MSSQL). The goal is to explore the data, uncover key business insights, and answer real-world business questions through structured SQL queries.
The analysis covers:
- Sales trends across different time periods
- Customer behavior and top performing customers
- Category-wise and gender-wise sales performance
- Shift-based purchase patterns (Morning/Afternoon/Evening)
- Best performing months by average sales
Tool: Microsoft SQL Server (MSSQL/SSMS) Skills: Data Cleaning, EDA, Advanced SQL
- The project starts by creating the database
SQL_PROJECT1
CREATE DATABASE SQL_PROJECT1
USE SQL_PROJECT1- A table is created named
Retail_Salesto store the data of the sales. The table contains the column of transaction_ID, sales_date, sale_time, customer_ID, gender,age, category, quantity, price per unit, cogs, and total sales.
CREATE TABLE Retail_Sales(
Transaction_ID INT PRIMARY KEY,
Sales_Date DATE,
Sale_time TIME,
Customer_ID INT,
Gender VARCHAR(7),
Age INT,
Category VARCHAR(15),
Quantity INT,
Price_per_unit FLOAT,
Cogs FLOAT,
Total_Sales FLOAT
)
INSERT INTO Retail_Sales
SELECT *
FROM SQL_PROJECT1.dbo.RETAIL;- The column names, their data-type and description about them is given below:-
| Column | Data Type | Description |
|---|---|---|
| Transaction_ID | INT | Unique transaction ID (Primary Key) |
| Sales_Date | DATE | Date of transaction |
| Sale_Time | TIME | Time of transaction |
| Customer_ID | INT | Unique customer identifier |
| Gender | VARCHAR(7) | Customer gender |
| Age | INT | Customer age |
| Category | VARCHAR(15) | Product category |
| Quantity | INT | Units purchased |
| Price_per_unit | FLOAT | Price per unit |
| Cogs | FLOAT | Cost of goods sold |
| Total_Sales | FLOAT | Total sale amount |
- Checked NULL values across all 11 columns.
SELECT *
FROM Retail_sales
WHERE
Transaction_ID IS NULL
OR
Sales_date IS NULL
OR
Sale_Time IS NULL
OR
Customer_ID IS NULL
OR
Gender IS NULL
OR
Age IS NULL
OR
Category IS NULL
OR
Quantity IS NULL
OR
Price_per_unit IS NULL
OR
Cogs IS NULL
OR
Total_Sales IS NULL;- By checking the NULL values, it is found that there are only 13 rows where NULL values are present.
- Deleted incomplete/corrupted records using DELETE + IS NULL, it is done to ensure data-accuracy and integrity.
DELETE FROM Retail_Sales
WHERE
Transaction_ID IS NULL
OR
Sales_date IS NULL
OR
Sale_Time IS NULL
OR
Customer_ID IS NULL
OR
Gender IS NULL
OR
Age IS NULL
OR
Category IS NULL
OR
Quantity IS NULL
OR
Price_per_unit IS NULL
OR
Cogs IS NULL
OR
Total_Sales IS NULL;- Total Transactions: Counting total number of transactions recorded in the dataset
SELECT COUNT(*) AS [Total Sales]
FROM Retail_Sales;- Total Unique Customers: Counting total number of unique customers who made purchases
SELECT COUNT(DISTINCT(Customer_ID)) as [Total Customers]
FROM Retail_Sales;- Total Categories: Counting total number of distinct product categories available
SELECT COUNT(DISTINCT(Category)) AS [Distinct Category]
FROM Retail_Sales;- Names of the Categories: Retrieving the names of all available product categories
SELECT DISTINCT Category AS [Types of Categories]
FROM Retail_Sales;- So after the data-exploration of this dataset, it is found that there are total of 1987 sales were made, in which there are 155 numbers of unique customers are present.
- Also, it has also been found that there are total of 3 categories of purchases customers are making, and they are beauty, electronics and clothing.
Retrieved all transactions on 2022-11-05
SELECT*
FROM Retail_Sales
WHERE Sales_Date LIKE '2022-11-05';Filtered Clothing category with quantity ≥ 4 in November 2022
SELECT*
FROM Retail_Sales
WHERE
Category = 'Clothing'
AND
Quantity >= 4
AND
MONTH(Sales_Date) = 11
AND
YEAR(Sales_Date) = 2022;Aggregated total revenue and order count per category
SELECT
Category,
SUM(Total_Sales) AS [Total Sales],
COUNT(Total_Sales) AS [Total Purchases]
FROM Retail_Sales
GROUP BY Category
ORDER BY SUM(Total_Sales);Found average customer age purchasing Beauty products
SELECT ROUND(AVG(Age),2) AS [Average Age of Beauty Category]
FROM Retail_Sales
WHERE Category LIKE 'Beauty';Retrieved all transactions where Total Sales > 1000
SELECT*
FROM Retail_Sales
WHERE Total_Sales>1000;Counted total transactions grouped by category and gender
SELECT
Category,
Gender,
COUNT(*) AS [Total Transactions]
FROM Retail_Sales
GROUP BY
Category,
Gender
ORDER BY
Category,
COUNT(*);Used Window Functions (RANK + PARTITION BY YEAR) to find highest average sales month for each year
SELECT*
FROM (
SELECT
YEAR(Sales_Date) AS [Year],
MONTH(Sales_Date) AS [Month Number],
ROUND(AVG(Total_Sales),2) AS [Average Sale],
RANK() OVER(
PARTITION BY YEAR(Sales_Date)
ORDER BY AVG(Total_Sales) DESC
) AS [Rank]
FROM Retail_Sales
GROUP BY
YEAR(Sales_Date),
MONTH(Sales_Date)
) AS T1
WHERE RANK=1;Identified top 5 highest spending customers
SELECT TOP 5
Customer_ID,
SUM(Total_Sales) AS [Total Sales]
FROM Retail_Sales
GROUP BY Customer_ID
ORDER BY SUM(Total_Sales) DESC;Counted distinct customers per product category
SELECT
Category,
COUNT(DISTINCT(Customer_ID)) AS [Unique Customers]
FROM Retail_Sales
GROUP BY Category
ORDER BY COUNT(DISTINCT(Customer_ID));Classified transactions into Morning / Afternoon / Evening using CASE + DATEPART — solved using both Subquery and CTE
WITH ShiftCTE AS (
SELECT*,
CASE
WHEN DATEPART(HOUR, Sale_Time) < 12 THEN 'Morning'
WHEN DATEPART(HOUR, Sale_Time) BETWEEN 12 AND 17 THEN 'Afternoon'
ELSE 'Evening'
END AS [Shift]
FROM Retail_Sales
)
SELECT
Shift,
COUNT(*) AS [Total Purchases]
FROM ShiftCTE
GROUP BY Shift;- DDL: CREATE DATABASE, CREATE TABLE
- DML: INSERT, DELETE, SELECT
- Aggregate Functions: SUM, COUNT, AVG, ROUND
- Window Functions: RANK, PARTITION BY
- CASE Statements
- CTEs (Common Table Expressions)
- Subqueries
- DATEPART for time-based analysis
- Filtering: WHERE, AND, OR, IS NULL, LIKE
-
Best Performing Month: The highest average sales month was found for each year using Window Functions — helping identify seasonal trends.
-
Top Category by Revenue: The highest revenue was generated by the category of the Electronics. It was almost 35% of the total revenue.
-
Top 5 Customers: A small group of customers contribute significantly to overall revenue useful for loyalty programs
-
Gender & Category Insights: In each category the major purchasers were Male, while in the beauty category female gender has been the major buyers, which is also a quite knowing fact.
-
Average Customer Age (Beauty): Beauty category attracts customers of average age of 40 years.
-
High Value Transactions: Multiple transactions exceeded ₹1000 in total sales indicating premium purchasing behavior.
-
Shift Analysis: Evening shift recorded the highest number of purchases useful for staffing and promotions.
-
Unique Customers per Category: Each category attracts a distinct customer base with minimal overlap.
Sanjeevan Pal