Analyzed 180,519 real supply chain orders to find out why 57% of deliveries are late - and what to do about it.
π (https://karan-supply-chain.streamlit.app/)
note on data: Full analysis was run locally on all 180,519 real orders. The live dashboard uses a 50,000 order sample β all 23 regions, all shipping modes, all customer segments fully represented. The findings are consistent with the full dataset. Full SQL query outputs are available in the
/outputs/sqlfolder.
I built this because late deliveries are one of the most expensive and preventable problems in supply chain management.
When I ran the analysis on 180,519 real orders, the number that came back stopped me:
Only 42.7% of orders are delivered on time. 103,400 orders β more than half β arrived late.
That is not a small operational issue. That is a systemic failure that costs companies millions in customer satisfaction, returns, and repeat business. This dashboard identifies exactly where the failures happen and what to fix first.
| Finding | Detail |
|---|---|
| On-time delivery rate | 42.7% - industry standard is 85%+ |
| Total late orders | 103,400 out of 180,519 |
| Worst performing region | Central Africa - 60.7% late rate |
| Best performing region | Canada - 51.9% late rate |
| Highest revenue category | Fishing - $2.4M+ |
| Average profit margin | 12.1% across all categories |
| Total revenue analyzed | $36.7 million |
| Date range | January 2015 - January 2018 |
The C-suite view. Late delivery rate by region and shipping mode side by side. The story is immediate β every region is failing, but some are worse than others.
Where exactly are we failing? Late delivery breakdown by region with a distribution chart showing how bad the delays actually are. Most late orders are only 1-2 days late β which means this is fixable with better carrier management, not a structural problem.
Which product categories make the most money? Which customer segments spend the most? Monthly sales trend showing seasonality patterns across 3 years of real data.
Classic supply chain Six Sigma tool applied to real data. Class A products generate 80% of revenue from a small number of SKUs β these need priority stock protection. Class C products are consuming logistics bandwidth without proportional return.
This tab is what makes this project different from every other supply chain dashboard on GitHub. Full Lean Six Sigma structured analysis β Define, Measure, Analyze, Improve, Control β applied to the data findings. This is how a Black Belt thinks about a problem.
Beyond the dashboard, I loaded the full dataset into SQLite and wrote 15 SQL queries
answering real business questions. Results are saved as CSV files in /outputs/sql/.
-- Sample queries included:
-- Q1: Late delivery rate by region β ranked worst to best
-- Q2: Top 15 revenue categories with profit margins
-- Q3: Customer segment revenue and order value analysis
-- Q4: Shipping mode performance β which carrier is failing most?
-- Q5: Top 20 products by revenue with margin breakdown
-- Q6: Monthly revenue trend β 2015 to 2018
-- Q7: Market performance β which global market is most profitable?
-- Q8: Department revenue and margin analysis
-- Q9: Top 20 highest value customers
-- Q10: Delivery status distribution across all orders
-- Q11: Revenue and profit by payment type
-- Q12: Late orders by category β revenue at risk calculation
-- Q13: Order status analysis β cancelled, pending, complete
-- Q14: Region revenue ranking using SQL window functions (RANK OVER)
-- Q15: Executive KPI summary β single query, all key metricsPython 3.11 β Core analysis and dashboard
Pandas β Data processing (180,519 orders)
Plotly β Interactive visualizations
Streamlit β Live dashboard deployment
SQL + SQLite β 15 business queries on full dataset
Lean Six Sigma β DMAIC analytical framework (Black Belt)
supply-chain-analytics/
β
βββ data/
β βββ DataCoSupplyChainDataset.csv β full dataset (not uploaded β 95MB)
β βββ supply_chain_sample.csv β 50k sample for cloud dashboard
β
βββ outputs/
β βββ 01_late_delivery_by_region.html
β βββ 02_late_delivery_by_shipping_mode.html
β βββ 03_revenue_by_category.html
β βββ 04_customer_segment_revenue.html
β βββ 05_monthly_sales_trend.html
β βββ 06_abc_inventory_segmentation.html
β βββ sql/
β βββ Q1_Late_Rate_By_Region.csv
β βββ Q2_Revenue_By_Category.csv
β βββ ... (15 SQL query outputs)
β
βββ dashboard.py β Streamlit dashboard
βββ analyze.py β Full Python analysis (180k orders)
βββ sql_analysis.py β 15 SQL queries via SQLite
βββ requirements.txt
βββ README.md
git clone https://github.com/Karant15/Supply-Chain-Analytics.git
cd Supply-Chain-Analytics
python -m venv venv
venv\Scripts\activate # Windows
pip install -r requirements.txt
# Download full dataset from Mendeley Data (link below) β place in /data folder
# Run full Python analysis
python analyze.py
# Run SQL analysis
python sql_analysis.py
# Launch dashboard
streamlit run dashboard.py| Phase | Action | Finding |
|---|---|---|
| Define | Set problem scope | 57.3% late delivery rate - goal: 70%+ on time |
| Measure | Baseline KPIs | 103,400 late orders - $36.7M revenue analyzed |
| Analyze | Root cause identification | Central Africa worst (60.7%) - First Class shipping has the highest late rate |
| Improve | Recommendations | Carrier SLA enforcement + regional logistics partner review + Class C SKU reduction |
| Control | Monitoring | This live dashboard - filter by region, segment, year in real time |
Before my MS in Data Analytics I spent 5 years managing client accounts with 30+ NHS hospitals in the UK. Supply chain and staffing analytics were part of every conversation β which hospitals were running low on specialists, which logistics routes were failing, which procurement decisions were draining budgets.
This project brings that domain knowledge together with real data science. It is not just a dashboard β it is a decision-making tool built the way a Six Sigma Black Belt would build it.
Karan Trivedi | MS Data Analytics, Webster University (Dec 2024)
- Lean Six Sigma Black Belt β Benchmark Six Sigma (2021)
- 7+ years healthcare, recruitment, and business analytics
- Former Senior Accounts Manager β 30+ NHS hospital accounts
π§ krntrivedi@gmail.com π LinkedIn π₯ Healthcare Project π GitHub
DataCo Smart Supply Chain for Big Data Analysis
- Source: Mendeley Data
- Records: 180,519 orders Γ 53 columns
- Coverage: 23 global regions | Multiple markets | 2015β2018
- License: CC BY 4.0 - open research data