A hands-on, fully-dockerized tutorial. A small sensor simulator streams synthetic data into a Postgres warehouse, and you build an operational KPI dashboard in Metabase — step by step, with screenshots for every click.
Why synthetic data? Because the point is to mimic a real-time data-producing pipeline without needing real plant access, credentials, or a big download. Simulated "machines" continuously write production telemetry into Postgres — exactly the shape of data a real PLC/sensor system produces — so the whole stack comes alive with one command and you can practise the analyst's real job: turning a live data stream into KPIs, dashboards, and decisions.
The example domain is manufacturing (machines, sensors, OEE), chosen to mirror a Data Analyst brief: exploratory analysis of operational data, dashboards, KPI/metric definition, descriptive + diagnostic views, and functional data-quality validation.
Note: the data here is synthetic, so the figures are illustrative — not a real finding about real machines. The point of this section is to show how an analyst reads a dashboard and turns it into a decision. The actual conclusions are always specific to the real data and business you're working with; the same chart in a different context would tell a different story.
Reading the dashboard top-down tells a clear operational story. All three machines show healthy Quality (~0.93) and Performance (~0.92) — they're not producing bad parts or running slowly. The low OEE (0.38–0.45, well below the ~0.85 world-class benchmark) is driven almost entirely by Availability: the machines are stopped too often. The failure breakdown locates the cause — stoppages are dominated by heat-dissipation (HDF) and power (PWF) failures, and PRESS-01 carries a PWF problem the others don't. The takeaway: focus maintenance on availability, not quality or speed — specifically, investigate PRESS-01's power supply and the heat-dissipation conditions affecting the line.
That single sentence — what's fine, what's wrong, the specific cause, the action — is the actual deliverable; the dashboard is the evidence for it. (More on this workflow in docs/02-data-analyst-playbook.md.)
- docs/01-what-is-a-data-analyst.md — what the role is, what a KPI is, and the reports different industries (fintech, IoT, e-commerce) ask for.
- docs/02-data-analyst-playbook.md — the craft: how a business question becomes a metric, becomes SQL, becomes a chart, becomes a decision.
- docs/03-data-platforms-explained.md — why this repo uses one Postgres, and how operational databases, data warehouses, and data lakes differ in the real world.
- docs/04-setup-runbook.md — the full step-by-step build, with a screenshot at every step and a "configure-as-code" section.
machine-cnc01 ┐
machine-cnc02 ├──► Postgres ("plant" DB) ◄── Metabase ──► dashboard
machine-press01 ┘ (warehouse) (BI layer) localhost:3000
(sensor sim) (decisions)
All three machines run the same image; only their command-line arguments
differ. That is the Dockerfile ENTRYPOINT (the script) + CMD (per-machine
args, overridden in compose) pattern — one build, many machines.
In a real system the sensors would write to a live operational/time-series database, and a pipeline would move that data into a separate warehouse for analytics. Here a single Postgres plays the warehouse role for simplicity — see docs/03-data-platforms-explained.md.
- Postgres 16 — the warehouse the sensors write to and Metabase reads from.
- Metabase — open-source BI; SQL questions assembled into a dashboard.
- Python sensor simulator — AI4I-style telemetry (temperatures, speed, torque, tool wear) plus five failure modes and OEE-ready production fields.
- Docker Compose — brings the whole thing up with one command.
docker compose up --buildEach machine backfills ~2000 historical cycles instantly, then streams live, so dashboards have data within seconds. Check the warehouse directly:
docker compose exec postgres psql -U sensors -d plant \
-c "SELECT machine_id, count(*) FROM sensor_readings GROUP BY 1;"Run it a few times — the counts climb (here 2.6k → 3.5k → 93k across runs), which proves the sensor stream is genuinely live, not a one-off load:
Then connect Metabase and build the dashboard by following
docs/04-setup-runbook.md. In short: open
http://localhost:3000, add the Postgres database (host postgres, db plant,
user sensors, password sensors_pw), create one SQL question per file in
sql/kpi/, and arrange them on a dashboard.
"Live" dashboards: the sensor stream is genuinely continuous, but a Metabase chart runs its SQL once on load. Turn on the dashboard's auto-refresh (clock icon → 1 minute) for a continuously-updating view. It's near-real-time (1-minute floor), which is what "real-time" means for an operations dashboard in practice.
One file per question in sql/kpi/:
| Query | Question it answers | Chart |
|---|---|---|
| OEE by Machine | How effective is each machine overall? | bar |
| Failure Breakdown by Type | Which failure modes dominate, per machine? | bar |
| Downtime % by Hour | When and how much are machines stopped? | line |
| Tool Wear Trend | How does tool wear evolve between changes? | line |
| Data Quality Checks | Is the data internally consistent? (all 0 = healthy) | table |
.
├── README.md
├── docker-compose.yml # Postgres + Metabase + 3 machines
├── sensor/
│ ├── sensor.py # the simulator (ENTRYPOINT target)
│ ├── Dockerfile # single-stage (prebuilt psycopg2 wheel)
│ ├── Dockerfile.multistage # learning variant: builder/runtime split
│ └── requirements.txt
├── sql/kpi/ # one SQL file per Metabase question
└── docs/
├── 01-what-is-a-data-analyst.md # the role, KPIs, industry reports
├── 02-data-analyst-playbook.md # the analyst craft, explained
├── 03-data-platforms-explained.md # DB vs warehouse vs lake
├── 04-setup-runbook.md # illustrated step-by-step build
└── *.png # screenshots
- Data is synthetic by design (see the "why" above). The simulator is modelled on the UCI AI4I 2020 Predictive Maintenance dataset, so the columns and failure modes mirror a real machine log. Swap in the real CSV to validate distributions.
- Tweak a machine by editing its
command:args in compose, e.g.--interval 0.2(rows/sec),--down-prob 0.08,--backfill 20000.

