Role: Analytics Engineer Stack: Docker Β· PostgreSQL Β· dbt Β· Python Β· Metabase Β· VS Code
This project analyzes Voice AI interactions to evaluate accessibility, efficiency, adoption, friction, and error reduction using real conversational data.
The solution is implemented using a modern analytics engineering stack, emphasizing:
- Reproducibility
- Data quality & governance
- Clear analytical reasoning
- Stakeholder-ready insights
The final output supports:
- KPI tracking
- Root-cause analysis of friction and abandonment
- Error-reduction measurement
- Trustworthy dashboards via Metabase
| Layer | Technology | Purpose |
|---|---|---|
| Orchestration | Docker / Docker Compose | Reproducible local environment |
| Database | PostgreSQL | Raw + analytics warehouse |
| Transformation | dbt | Staging, marts, testing, governance |
| Analysis | Python (Pandas, SQLAlchemy, sklearn) | Statistical analysis & modeling |
| BI / Visualization | Metabase | Stakeholder dashboards |
| Dev Environment | VS Code (Dev Containers) | Seamless containerized development |
CSV Data
β
PostgreSQL (raw tables)
β
dbt Staging Models (stg_*)
β
dbt Fact Models (fact_voice_ai_sessions)
β
Metabase Dashboards
β
Stakeholder Insights
Design principles:
- One fact table = one grain (session-level)
- No business logic in dashboards
- All metrics validated before exposure
- Python used only where SQL is insufficient
- Docker & Docker Compose
- VS Code + Dev Containers extension
git clone <repo-url>
cd <repo-name>docker compose up --buildThis starts:
postgres(persistent volume)analytics(Python + dbt)metabase
In VS Code:
Command Palette β Dev Containers: Attach to Running Container
β analytics
Your local workspace is now mounted into the container.
Data is loaded once and persists via Docker volumes.
docker exec -it postgres psql -U analyst -d irembocd /workspace/dbt
dbt deps
dbt debugdbt buildThis will:
- Build staging models
- Build fact models
- Run data quality tests
http://localhost:3000
Connect Metabase to:
- Host:
postgres - Database:
irembo - Schema:
analytics
.
βββ docker-compose.yml
βββ Dockerfile
βββ requirements.txt
βββ data/
| βββ raw/
| βββ all.csv
βββ scripts/
| βββ load_csvs.py
βββ dbt/
β βββ dbt_project.yml
β βββ packages.yml
β βββ models/
β β βββ staging/
β β βββ marts/
β βββ tests/
| βββ tests.sql
βββ notebooks/
β βββ 01_friction_analysis.ipynb
βββ README.md
- Completion rate for first-time users
- Rural vs urban completion gap
- Average ASR confidence by user type
- Average turns per session
- Average session duration
- Error rate per session
- Share of sessions using Voice AI
- Repeat usage rate
- Voice vs non-voice completion delta
Approach:
KPIs are session-level metrics aggregated from turn-level data and materialized in fact_voice_ai_sessions.
fact_voice_ai_sessions
-
Grain: 1 row per session
-
Source:
voice_turnsonly (applications excluded after validation) -
Metrics:
- Error proportions
- ASR & intent confidence
- Turn counts
- Final outcome
Why: Sessions are the natural decision unit for Voice AI effectiveness.
- Primary friction driver is ASR performance, not conversation length.
- Users abandon early, indicating initial recognition issues.
- Voice AI performs better for first-time digital users than non-voice channels.
- Rural users benefit disproportionately from Voice AI accessibility.
Tools Used:
- SQL β aggregations & KPIs
- Python β Logistic Regression, segmentation, statistical validation
An error is any turn marked as:
misunderstandingsilence- or repeated intent failures
error_turns / total_turns
- Compare pre/post model deployments
- Control for user mix & session length
- Track statistically significant deltas
- Segment by channel & user type
- Avoid raw averages
- Require confidence intervals
Modeling non-completion revealed:
- ASR confidence as the strongest abandonment predictor
- Errors occur early, not due to long sessions
β‘οΈ Strategy shift: Fix ASR first, not dialogue flow.
-
Not null & uniqueness
-
Relationship tests
-
Accepted ranges (0β1)
-
Business logic tests:
- Error proportions β€ 100%
- Sessions must have turns
- No raw audio
- User IDs anonymized
- Aggregated reporting only
Metabase only queries certified dbt models.
- Deterministic builds
- Persistent volumes
- Versioned transformations
- Automated data quality enforcement
- BI governance boundary
This project was designed as:
- A real analytics platform, not a one-off analysis
- A showcase of 6+ years Analytics Engineering maturity
- A system that scales beyond the assignment
Key philosophy: If Metabase can see it, it has already been tested.
- CI pipeline for dbt tests
- Snapshotting for longitudinal analysis
- Feature store for ML reuse