📍 Richmond, Kentucky, USA | 🇺🇸 U.S. Permanent Resident — No Sponsorship Required
I design and ship production data science systems — ML pipelines, causal inference engines, AI platforms, and real-time analytics infrastructure — applied to healthcare and population health problems at scale.
My background combines hands-on engineering with deep quantitative methodology: I build the models and I understand the math behind them. My systems have run in production against live government health databases, served predictions to frontline health workers in real time, and informed decisions affecting millions of individuals.
Core areas:
- 🤖 AI / LLM Systems — RAG pipelines, multi-agent architectures, healthcare Q&A platforms
- 🧠 Machine Learning & MLOps — end-to-end pipelines, model calibration, SHAP explainability, drift monitoring, CI/CD
- 📊 Healthcare Analytics — risk stratification, population health modeling, clinical decision intelligence
- 🔬 Causal Inference & RWE — PSM, DiD, ITS, TMLE, SuperLearner — production-grade, not just academic
Philosophy: Models that don't deploy don't matter. Data science should produce systems, not papers.
| What I Built | Result |
|---|---|
| Immunization defaulter risk engine (Kenya MOH eCHIS · live production DB) | ROC-AUC 0.892 · 6,864 patients · 4,672 CHW areas · live app ↗ |
| ML predictive models for health outcomes | ~30% improvement in prediction accuracy |
| Automated data pipelines (ClickHouse + Python + dbt) | Reporting latency: 10–14 days → real-time |
| Causal inference & RWE studies | 25+ production studies informing program decisions |
| Medicare risk adjustment pipeline (U.S.) | Validated ATT of −$391/member, p<0.0001 |
| Healthcare analytics platforms | Scale: 8.5M+ individuals across multiple health systems |
| Peer-reviewed publications | 30+ articles incl. The Lancet Global Health |
| Project | Description | Stack |
|---|---|---|
| Immunization Defaulter Risk Engine |
Production XGBoost pipeline predicting vaccine defaulter risk for 6,864 children across 4,672 CHW areas. Data drawn directly from Kenya Ministry of Health eCHIS. Per-patient SHAP explainability, isotonic calibration (ECE=0.023), PSI drift monitoring, RBAC Streamlit dashboard, FastAPI serving. | Python · XGBoost · SHAP · FastAPI · PostgreSQL · MLflow · Streamlit |
| Medicare Risk Adjustment Pipeline | Validated U.S. Medicare RAF pipeline — ATT −$391/member, p<0.0001 | Python · R · SQL · CMS HCC |
| Insurance Premium Prediction | End-to-end ML pipeline with CI/CD, MLflow tracking and SHAP explainability | Python · XGBoost · MLflow · SHAP |
| DHS RAG System | Semantic intelligence system for Demographic & Health Survey datasets | Python · RAG · Vector Search |
| Multimodal PDF RAG System | Document intelligence platform with OCR, table extraction and semantic search | Python · FastAPI · React |
| Project | Description | Stack |
|---|---|---|
| AI-Powered Research Assistant | Production RAG platform for scientific paper intelligence with modular LangGraph workflows | Python · LangGraph · LangChain · ChromaDB · FastAPI |
| Automated Research & Report Generation | Multi-agent AI system for research retrieval, synthesis and structured reporting | Python · LangGraph · FastAPI |
| MultiAgent Research Graph | AI knowledge graph generator from natural language queries | Python · LangGraph · LLMs |
| Healthcare Q&A RAG Platform | Enterprise healthcare knowledge retrieval with vector search and RBAC | Python · FastAPI · ChromaDB |
| Project | Description | Stack |
|---|---|---|
| Medical Diagnosis AI | ML prototype for clinical diagnostic support | Python · scikit-learn |
| KDHS Memory Bot | Multimodal RAG chatbot for large public health survey datasets | Python · OCR · Vector DB |
| Kenya Community Health AI | AI analytics platform integrating national digital health systems | Python · Multi-Agent AI |
Languages Python · R · SQL
Machine Learning scikit-learn · XGBoost · PyTorch · TensorFlow · MLflow · SHAP · Optuna · Survival models
AI / LLM LangChain · LangGraph · RAG · Vector Databases (ChromaDB, Pinecone) · Multi-Agent Systems · Prompt Engineering
Data Infrastructure AWS (Redshift · Glue · SageMaker · S3) · ClickHouse · PostgreSQL · dbt · FastAPI · Docker · Streamlit · Airflow
Visualization & BI Power BI · Tableau · Plotly · ggplot2
Causal & Statistical Methods PSM · Difference-in-Differences · Interrupted Time Series · TMLE · SuperLearner · Bayesian modeling · Mixed-effects models · Pharmacoepidemiology
PhD — Epidemiology (Quantitative Methods, Causal Inference & Health Data Science) Advanced training in study design, statistical theory, and evidence generation — applied directly to ML model validation, experiment design, and real-world evidence production.
MSc — Health Systems Management BSc — Statistics
Certifications:
- Stanford University — Machine Learning in Medicine
- AWS Certified Data Science & Analytics
- Google Data Analytics Professional Certificate
- DataCamp Machine Learning Scientist Track
- Generative AI (multiple platforms)
A common assumption: PhD = academic researcher = not hands-on.
That's not my profile.
My PhD is in quantitative epidemiology — which means advanced statistics, causal modeling, experimental design, and evidence validation. These are the same foundations that make a data scientist rigorous: knowing why a model works, not just that it works.
In practice, I:
- Build and ship ML pipelines against live government health databases — not toy datasets
- Design causal inference studies that hold up to scrutiny
- Write production Python and SQL, not just R markdown
- Lead analytics engineering alongside research
The PhD makes the data science better. It doesn't replace it.
Hands-on and leadership roles across data science, healthcare analytics, and AI:
- Senior / Principal / Lead Data Scientist
- Healthcare Data Scientist
- Clinical Data Scientist
- Population Health Analyst / Analytics Lead
- Real-World Evidence Scientist / Analyst
- HEOR Data Scientist
- Decision Science / Advanced Analytics
- Director / Associate Director, Data Science or Epidemiology
Target sectors: Pharma · Biotech · CRO · Health tech · Payers & Insurers · Clinical AI · Population health
Data Science · Healthcare Analytics · AI Systems · Causal Inference · Real-World Evidence
📩 keyegon@gmail.com | 🔗 LinkedIn | 🌐 Portfolio

