This project performs a full-cycle analysis on the Brazilian Olist E-Commerce Dataset, focusing on sales trends, product performance, and customer segmentation.
The goal is to generate actionable business insights and present them in a professional, portfolio-ready format.
- Source: Brazilian E-Commerce Public Dataset by Olist
- Contains real e-commerce transactions: orders, items, products, payments, reviews, and customer information.
- Python
- Pandas, NumPy
- Matplotlib, Seaborn
- Scikit-learn
- Jupyter Notebook
- Data Loading – Load all relevant datasets (orders, items, customers, products, payments, reviews)
- Data Cleaning & Preprocessing – Handle missing values, convert timestamps, merge data
- Feature Engineering – Create metrics like delivery time, monthly revenue, and average order value
- Exploratory Data Analysis (EDA) – Analyze revenue trends, order sizes, top product categories, delivery performance
- Customer Segmentation – Apply RFM scoring and KMeans clustering to identify Top, Loyal, and At-Risk customers
- Visualizations with Insights – Key plots with mini takeaways for business decisions
- Key Insights & Conclusion – Summary of actionable metrics and recommendations
- Revenue Trend: Steady growth over time with Q4 seasonal peaks
- Top Categories: Health & Beauty, Watches & Gifts, and Bed, Bath & Table generate the highest revenue
- Order Behavior: Most orders contain 1–3 items; average order value ≈ BRL 160
- Delivery Performance: Median delivery time ≈ 10 days; faster deliveries correlate with higher review scores
- Customer Segments: Three clusters identified — Top, Loyal, At-Risk
- Retention Opportunities: Targeting at-risk customers and improving delivery speed can boost repeat purchases

Revenue grows steadily over time with seasonal peaks in Q4, indicating strong holiday sales.

Most orders contain 1–3 items, highlighting typical purchase size.

Health & Beauty, Watches & Gifts, and Electronics dominate revenue — focus areas for marketing and inventory.

Faster deliveries generally receive higher customer review scores, showing the importance of logistics.
- Install dependencies:
pip install -r requirements.txt - Open the notebook:
jupyter notebook Olist_Analysis.ipynbTo run the notebook locally:- Download all CSV files from the Kaggle dataset.
- Create a folder named
data/in the root directory of this project. - Place all CSVs inside
data/(e.g.,data/olist_orders_dataset.csv, etc.)
- All plots are saved in the
images/folder and referenced in this README - Executive summary PDF:
Olist_Executive_Summary.pdf