Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions .saturn/templates-enterprise.json
Original file line number Diff line number Diff line change
Expand Up @@ -113,6 +113,12 @@
"thumbnail_image_url": "https://saturn-public-assets.s3.us-east-2.amazonaws.com/example-thumbnails/openclaw.png",
"weight": 1950,
"recipe_path": "examples/openclaw/.saturn/saturn.json"
},
{
"title": "Streamlit EDA Dashboard (Python)",
"thumbnail_image_url": "https://saturn-public-assets.s3.us-east-2.amazonaws.com/example-thumbnails/dashboard.png",
"weight": 2000,
"recipe_path": "examples/data-science-analystics/cpu-streamlit-eda/.saturn/saturn.json"
}
]
}
6 changes: 6 additions & 0 deletions .saturn/templates-hosted.json
Original file line number Diff line number Diff line change
Expand Up @@ -113,6 +113,12 @@
"thumbnail_image_url": "https://saturn-public-assets.s3.us-east-2.amazonaws.com/example-thumbnails/openclaw.png",
"weight": 1950,
"recipe_path": "examples/openclaw/.saturn/saturn.json"
},
{
"title": "Streamlit EDA Dashboard (Python)",
"thumbnail_image_url": "https://saturn-public-assets.s3.us-east-2.amazonaws.com/example-thumbnails/dashboard.png",
"weight": 2000,
"recipe_path": "examples/data-science-analystics/cpu-streamlit-eda/.saturn/saturn.json"
}
]
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
{
"name": "example-streamlit-eda",
"image_uri": "public.ecr.aws/saturncloud/saturn-python:2025.05.01",
"description": "One-click EDA profiler and stats dashboard built with Streamlit. Profiles the Tips dataset with interactive filters, statistical summaries, and distribution charts.",
"working_directory": "/home/jovyan/examples/examples/data-science-analystics/cpu-streamlit-eda",
"start_script": "pip install -r requirements.txt",
"git_repositories": [
{
"url": "https://github.com/saturncloud/examples",
"path": "/home/jovyan/examples"
}
],
"deployment": {
"instance_type": "large",
"command": "streamlit run app.py --server.port 8000 --server.address 0.0.0.0"
},
"version": "2022.01.06"
}
127 changes: 97 additions & 30 deletions examples/data-science-analystics/cpu-streamlit-eda/README.md
Original file line number Diff line number Diff line change
@@ -1,65 +1,132 @@

# 🚀 Streamlit EDA Dashboard
# 📊 Streamlit EDA Dashboard

<div align="center">
<img src="./icon-bar-histogram.png" width="300">
</div>

### **Overview**
This template provides a rapid deployment setup for a web-based Exploratory Data Analysis (EDA) tool. Designed for **CPU resources**, it allows you to transform a local Python environment into a functional analytics dashboard. The primary goal is to provide a "One-click profiler" that automates data inspection, statistical summaries, and distribution plotting through an intuitive browser interface.
### Overview
This template deploys a browser-based **Exploratory Data Analysis (EDA)** dashboard built with Streamlit on Saturn Cloud. It gives you instant statistical profiling, interactive filtering, and distribution charts — all without writing a single line of code.

### **Dataset Overview**
The template utilizes the **Tips** toy dataset, which contains records of restaurant bills, tip amounts, and demographic data such as the day of the week and time of day. It is an excellent dataset for demonstrating the power of categorical filtering and numerical profiling in an automated dashboard environment.
The app ships with the built-in **Tips** dataset so it works the moment you start it. When you're ready to analyse your own data, simply upload any CSV file directly from the browser and the dashboard immediately re-profiles it.

### **Tech Stack**
* **Python**: The core logic layer for data processing and app execution.
* **Pandas**: Manages the underlying data frames and performs statistical profiling calculations.
* **Streamlit**: The primary framework used to build the interactive UI and handle real-time visualization updates.
---

## ✨ What the App Does

| Feature | Description |
|---|---|
| **CSV file uploader** | Drag and drop any CSV file in the sidebar to replace the default dataset |
| **Auto-detect columns** | Filters and chart options populate automatically based on your data's columns |
| **Categorical filters** | Pick any text/category column to slice the data by its unique values |
| **Statistical summary** | Instant count, mean, std, min/max, and percentiles for all numeric columns |
| **Raw data preview** | Shows the first 10 rows of your filtered dataset |
| **Distribution chart** | Select any numeric column to plot a histogram with a density curve |
| **Default dataset** | Falls back to the Tips dataset when no file is uploaded |

---

## 📁 Uploading Your Own Data

### Supported Format
- **CSV files only** (`.csv`)
- UTF-8 encoding recommended
- First row must be the column headers

### What Makes a Good Dataset for This App

Your CSV should have at least:
- **One categorical column** (text values like names, categories, labels) — used to power the sidebar filter
- **One numeric column** (integers or decimals) — used to generate the distribution chart and statistics

The more columns your CSV has, the more the app gives you to explore — you can switch between any of them using the dropdowns.

### Good Examples to Try

| Dataset type | Categorical columns | Numeric columns |
|---|---|---|
| Sales data | Region, Product, Sales Rep | Revenue, Units Sold, Discount |
| HR / employee data | Department, Job Title, Gender | Salary, Age, Years at Company |
| Survey results | Country, Response, Category | Score, Rating, Count |
| E-commerce orders | Status, Category, Payment Method | Price, Quantity, Delivery Days |
| Sensor / IoT readings | Location, Device, Status | Temperature, Pressure, Voltage |

### What to Avoid
- **Very wide CSVs** (100+ columns) — the app handles them but dropdowns get crowded
- **Purely numeric CSVs** (no text columns) — the categorical filter won't have anything to work with
- **Large files over ~100MB** — the app will load them but may feel slow in the browser
- **Nested or multi-header CSVs** — the app expects a flat, single-header structure

### How to Upload
1. Open the app URL in your browser
2. In the **left sidebar**, find the **📂 Data Source** section
3. Click **"Browse files"** or drag and drop your CSV onto the uploader
4. The dashboard instantly re-renders with your data — no refresh needed
5. Use the **Filter by column** dropdown to pick a categorical column to slice by
6. Use the **Select column to visualize** dropdown in the chart section to switch between numeric columns

---

## 🛠️ Tech Stack

| Library | Role |
|---|---|
| **Streamlit** | Interactive web app framework — handles the UI, file uploader, widgets, and layout |
| **Pandas** | Data loading, filtering, and statistical profiling |
| **Seaborn** | Distribution chart rendering and default Tips dataset |
| **Matplotlib** | Chart rendering backend used by Seaborn |

---

## 🛠️ Local Setup Instructions
## 🚀 Deploying on Saturn Cloud

This template includes a `.saturn/saturn.json` recipe that configures everything automatically. When you create a deployment from this template on Saturn Cloud:

1. The repo is cloned fresh on every start
2. All dependencies from `requirements.txt` are installed automatically via the start script
3. The Streamlit server starts and binds to the correct port for Saturn's proxy
4. The app is immediately accessible at your deployment URL

No manual configuration needed.

---

## 💻 Local Setup

If you want to run the dashboard locally:

### 1. Create and Activate a Virtual Environment

### 1. Create and Activate Virtual Environment
Open your terminal on your host machine and execute the following:
```bash
# Create environment
python -m venv streamlit_env

# Activate (Windows)
streamlit_env\Scripts\activate

# Activate (macOS/Linux)
source streamlit_env/bin/activate

# Activate (Windows)
streamlit_env\Scripts\activate
```

### 2. Install Dependencies

```bash
pip install streamlit pandas matplotlib seaborn

```
or

```bash
pip install -r requirements.txt

```

### 3. Run the Dashboard
### 3. Run the App

```bash
streamlit run app.py

```

---
The app will open at `http://localhost:8501` in your browser.

## 🔗 Resources and Support
---

* **Dashboard Platform**: [Streamlit Cloud](https://streamlit.io/cloud)
* **Library**: [Streamlit Documentation](https://docs.streamlit.io/)
* **Library**: [Pandas API Reference](https://pandas.pydata.org/docs/)
## 🔗 Resources

---
* [Streamlit Documentation](https://docs.streamlit.io/)
* [Pandas API Reference](https://pandas.pydata.org/docs/)
* [Seaborn Documentation](https://seaborn.pydata.org/)
* [Saturn Cloud Docs](https://saturncloud.io/docs/)
80 changes: 53 additions & 27 deletions examples/data-science-analystics/cpu-streamlit-eda/app.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,42 +6,68 @@
# --- PAGE CONFIGURATION ---
st.set_page_config(page_title="EDA Profiler", layout="wide")

# --- INTRODUCTION ---
# --- TITLE ---
st.title("📊 One-Click EDA Dashboard")
st.markdown("This dashboard provides instant statistical profiling and visualization for the Tips dataset.")

# --- STEP 2: LOAD AND PREPARE DATASET ---
# We load the Tips dataset from seaborn's built-in repository.
@st.cache_data # Cache data to prevent reloading on every interaction
def load_data():
return sns.load_dataset('tips')

df = load_data()

# --- STEP 3: SIDEBAR FILTERS ---
# We enable global filters to allow users to slice data by specific days of the week.
st.sidebar.header("Global Filters")
selected_day = st.sidebar.multiselect(
"Select Day",
options=df['day'].unique(),
default=df['day'].unique()
)
filtered_df = df[df['day'].isin(selected_day)]
st.markdown("Upload any CSV file to instantly profile your data — or explore the built-in Tips dataset to get started.")

# --- SIDEBAR: DATA SOURCE ---
st.sidebar.header("📂 Data Source")
uploaded_file = st.sidebar.file_uploader("Upload your CSV", type=["csv"])

@st.cache_data
def load_default():
return sns.load_dataset("tips")

@st.cache_data
def load_uploaded(file):
return pd.read_csv(file)

if uploaded_file is not None:
df = load_uploaded(uploaded_file)
st.sidebar.success(f"✅ Loaded: {uploaded_file.name}")
else:
df = load_default()
st.sidebar.info("💡 No file uploaded — showing built-in Tips dataset. Upload a CSV above to use your own data.")

# --- SIDEBAR: FILTERS ---
st.sidebar.header("🔎 Filters")

# Dynamically detect categorical columns and let user pick which one to filter on
categorical_cols = df.select_dtypes(include=["object", "category"]).columns.tolist()

if categorical_cols:
filter_col = st.sidebar.selectbox("Filter by column", options=categorical_cols)
selected_values = st.sidebar.multiselect(
f"Select {filter_col} values",
options=df[filter_col].unique(),
default=df[filter_col].unique()
)
filtered_df = df[df[filter_col].isin(selected_values)]
else:
filtered_df = df
st.sidebar.info("No categorical columns found for filtering.")

# --- STEP 4: STATISTICS PROFILER ---
# This block calculates and displays the numerical summary statistics for the filtered data.
col1, col2 = st.columns(2)

with col1:
st.subheader("🔢 Statistical Summary")
st.dataframe(filtered_df.describe(), use_container_width=True)

with col2:
st.subheader("📋 Raw Data Sample")
st.write(filtered_df.head(10))
st.dataframe(filtered_df.head(10), use_container_width=True)

# --- STEP 5: VISUALIZATION ---
# Using Seaborn and Matplotlib to render the distribution of bill amounts interactively.
st.subheader("📈 Distribution of Total Bills")
fig, ax = plt.subplots(figsize=(10, 4))
sns.histplot(filtered_df['total_bill'], kde=True, ax=ax, color="#FF4B4B")
st.pyplot(fig)
st.subheader("📈 Distribution Chart")

numeric_cols = filtered_df.select_dtypes(include="number").columns.tolist()

if numeric_cols:
chart_col = st.selectbox("Select column to visualize", options=numeric_cols)
fig, ax = plt.subplots(figsize=(10, 4))
sns.histplot(filtered_df[chart_col], kde=True, ax=ax, color="#FF4B4B")
ax.set_xlabel(chart_col)
st.pyplot(fig)
else:
st.info("No numeric columns available for visualization.")
Loading