Role: Analytics Engineer Stack: Docker · PostgreSQL · dbt · Python · Metabase · VS Code
This project analyzes Voice AI interactions to evaluate accessibility, efficiency, adoption, friction, and error reduction using real conversational data.
The solution is implemented using a modern analytics engineering stack, emphasizing:
- Reproducibility
- Data quality & governance
- Clear analytical reasoning
- Stakeholder-ready insights
The final output supports:
- KPI tracking
- Root-cause analysis of friction and abandonment
- Error-reduction measurement
- Trustworthy dashboards via Metabase
| Layer | Technology | Purpose |
|---|---|---|
| Orchestration | Docker / Docker Compose | Reproducible local environment |
| Database | PostgreSQL | Raw + analytics warehouse |
| Transformation | dbt | Staging, marts, testing, governance |
| Analysis | Python (Pandas, SQLAlchemy, sklearn) | Statistical analysis & modeling |
| BI / Visualization | Metabase | Stakeholder dashboards |
| Dev Environment | VS Code (Dev Containers) | Seamless containerized development |
CSV Data
↓
PostgreSQL (raw tables)
↓
dbt Staging Models (stg_*)
↓
dbt Fact Models (fact_voice_ai_sessions)
↓
Metabase Dashboards
↓
Stakeholder Insights
Design principles:
- One fact table = one grain (session-level)
- No business logic in dashboards
- All metrics validated before exposure
- Python used only where SQL is insufficient
- Docker & Docker Compose
- VS Code + Dev Containers extension
git clone <repo-url>
cd <repo-name>docker compose up --buildThis starts:
postgres(persistent volume)analytics(Python + dbt)metabase
In VS Code:
Command Palette → Dev Containers: Attach to Running Container
→ analytics
Your local workspace is now mounted into the container.
Data is loaded once and persists via Docker volumes.
docker exec -it postgres psql -U analyst -d irembocd /workspace/dbt
dbt deps
dbt debugdbt buildThis will:
- Build staging models
- Build fact models
- Run data quality tests
http://localhost:3000
Connect Metabase to:
- Host:
postgres - Database:
irembo - Schema:
analytics
.
├── docker-compose.yml
├── Dockerfile
├── requirements.txt
├── data/
| └── raw/
| └── all.csv
├── scripts/
| └── load_csvs.py
├── dbt/
│ ├── dbt_project.yml
│ ├── packages.yml
│ ├── models/
│ │ ├── staging/
│ │ └── marts/
│ └── tests/
| └── tests.sql
├── notebooks/
│ └── 01_friction_analysis.ipynb
└── README.md
- Completion rate for first-time users
- Rural vs urban completion gap
- Average ASR confidence by user type
- Average turns per session
- Average session duration
- Error rate per session
- Share of sessions using Voice AI
- Repeat usage rate
- Voice vs non-voice completion delta
Approach:
KPIs are session-level metrics aggregated from turn-level data and materialized in fact_voice_ai_sessions.
fact_voice_ai_sessions
-
Grain: 1 row per session
-
Source:
voice_turnsonly (applications excluded after validation) -
Metrics:
- Error proportions
- ASR & intent confidence
- Turn counts
- Final outcome
Why: Sessions are the natural decision unit for Voice AI effectiveness.
- Primary friction driver is ASR performance, not conversation length.
- Users abandon early, indicating initial recognition issues.
- Voice AI performs better for first-time digital users than non-voice channels.
- Rural users benefit disproportionately from Voice AI accessibility.
Tools Used:
- SQL → aggregations & KPIs
- Python → Logistic Regression, segmentation, statistical validation
An error is any turn marked as:
misunderstandingsilence- or repeated intent failures
error_turns / total_turns
- Compare pre/post model deployments
- Control for user mix & session length
- Track statistically significant deltas
- Segment by channel & user type
- Avoid raw averages
- Require confidence intervals
Modeling non-completion revealed:
- ASR confidence as the strongest abandonment predictor
- Errors occur early, not due to long sessions
➡️ Strategy shift: Fix ASR first, not dialogue flow.
-
Not null & uniqueness
-
Relationship tests
-
Accepted ranges (0–1)
-
Business logic tests:
- Error proportions ≤ 100%
- Sessions must have turns
- No raw audio
- User IDs anonymized
- Aggregated reporting only
Metabase only queries certified dbt models.
- Deterministic builds
- Persistent volumes
- Versioned transformations
- Automated data quality enforcement
- BI governance boundary
This project was designed as:
- A real analytics platform, not a one-off analysis
- A showcase of 6+ years Analytics Engineering maturity
- A system that scales beyond the assignment
Key philosophy: If Metabase can see it, it has already been tested.
- CI pipeline for dbt tests
- Snapshotting for longitudinal analysis
- Feature store for ML reuse