A natural language interface for querying CDC chronic disease health data. Ask questions in plain English and get AI-powered answers backed by real data.
- Natural language to SQL — Type any health-related question and the app generates SQLite SQL automatically
- Plain English answers — Results are summarized into a clear 1-2 sentence response
- Auto charts — Bar charts are generated automatically when numeric data is returned
- Example questions sidebar — One-click preset questions to get started quickly
- Raw data & SQL viewer — Expandable sections to inspect the underlying data and query
| Layer | Technology |
|---|---|
| Frontend | Streamlit |
| Database | SQLite (health.db) |
| LLM | Qwen/Qwen2.5-72B-Instruct via Hugging Face |
| Data | U.S. CDC Chronic Disease Indicators |
mediquery/
├── app.py # Main Streamlit app
├── setup_db.py # Script to load CSV into SQLite
├── health.db # SQLite database
├── U.S._Chronic_Disease_Indicators.csv # Source dataset
├── .env # Environment variables (not committed)
└── README.md
cd mediquerypip install streamlit pandas requests python-dotenv transformers huggingface_hub torchCreate a .env file in the project root:
HF_TOKEN=hf_your_token_here
Get your token at: https://huggingface.co/settings/tokens (READ permission is sufficient)
python setup_db.pypython -m streamlit run app.pyOpen your browser at http://localhost:8501
- Which state has the highest obesity rate?
- What is the diabetes prevalence by state?
- Which state has the most cancer cases?
- Show alcohol-related indicators by state
- What are the top 5 states for asthma rates?
- Compare mental health indicators across states
U.S. Chronic Disease Indicators (CDI) — CDC Open Data