MoveSmart is a personalized decision-support system that recommends cities based on user-defined preferences such as affordability, weather, lifestyle, and urban characteristics.
Unlike static city-ranking websites, MoveSmart lets users actively shape city recommendations through interactive preferences and a transparent scoring system. It brings together a wide range of information about each city in one place, so users don’t have to dig through multiple sources to find what they need.
Built as a data science capstone project focused on recommendation systems, interpretability, and real-world evaluation challenges.
👉 Try the dashboard here:
Choosing a city to live in or relocate to is a high-impact decision involving multiple tradeoffs (cost, climate, lifestyle, infrastructure).
However, existing platforms:
- Provide static rankings with no personalization
- Lack transparency in how rankings are generated
- Do not adapt to individual user priorities
MoveSmart addresses this gap by building a user-driven and explainable city recommendation system.
- Build a personalized recommendation engine for cities
- Enable user-driven weighting of preferences
- Combine structured data + semantic (text-based) signals
- Provide an interactive dashboard for exploration and comparison
- Design an explainable scoring system suitable for real-world decision-making
- Personalized city recommendations using user sliders
- Multi-factor scoring (affordability, weather, urban lifestyle, etc.)
- Semantic preference matching using text embeddings
- Hybrid scoring system with adjustable feature weights
- Interactive Streamlit dashboard for exploration
- Python – Pandas, NumPy, Scikit-learn, GeoPandas, Boto3, Requests etc
- Streamlit – Interactive frontend dashboard
- Sentence Transformers – Semantic similarity modeling
- AWS Bedrock – LLM
| Entry point | Role |
|---|---|
app.py |
Streamlit UI (streamlit run app.py). Reads data/final/Final_Enriched_Dataset.csv. |
index.html |
Static about page for the project (e.g. deployed on Netlify); linked from the Streamlit UI. |
Other Python modules (src/recommender.py, src/visualizations.py, src/rag_explanation.py) are imported by the app. Clustering used in the final dataset lives in models/cluster_model.py.
movesmart/
├── app.py # Streamlit app (main UI)
├── index.html # Static about page for the project (separate from the Streamlit app)
├── assets/
│ └── flow_chart.png # Pipeline / architecture diagram (shown at top of this README)
├── data/
│ ├── raw/ # Primary source inputs (see Step 0); large files are often obtained locally and omitted from git
│ ├── processed/ # Per-source CBSA tables (loader outputs)
│ ├── evaluation/ # Stores evaluation results and analysis
│ ├── clustering_output/ # Clustering outputs and evaluation artifacts
│ └── final/ # Final_Base_Dataset.csv, Final_Enriched_Dataset.csv
├── exploratory_notebooks/
│ ├── 01_data_eda.ipynb # Exploratory Data Analysis notebook
│ ├── 02_clustering.ipynb # Exploratory Clustering notebook + recommender scratch work
│ ├── 05_sensitivityanalysis.ipynb # Sensitivity Analysis of Recommender Scoring Methods
│ └── 06_evaluation.ipynb # Evaluation notebook (semantic search, summaries, explanations)
├── models/
│ └── cluster_model.py # KMeans / PCA; used by final_dataset_loader
├── src/
│ ├── __init__.py
│ ├── census_data_loader.py
│ ├── crime_data_loader.py
│ ├── places_data_loader.py
│ ├── walkability_data_loader.py
│ ├── weather_data_loader.py # slow; normally skipped (use Weather_Data.csv)
│ ├── final_dataset_loader.py # merges processed → final + scores + clusters
│ ├── standardize_scores.py # score columns (imported by final_dataset_loader)
│ ├── recommender.py
│ ├── visualizations.py
│ ├── wiki_text_loader.py # Calls Wikipedia/Wikivoyage APIs and uses LLM to write CBSA metro/micro summaries to data/processed/
│ ├── semantic_search.py # Embeds CBSA summaries into ChromaDB and semantic-searches that index for user queries
│ └── rag_explanation.py # Bedrock Haiku: grounded prompt from user prefs, theme scores, and CBSA summary → “Why this city?” text
└── requirements.txt
- Python 3.11+: This project requires Python 3.11 to support modern type hinting and stable library dependencies.
python -m venv .venvWindows (PowerShell):
.\.venv\Scripts\Activate.ps1
python -m pip install -r requirements.txtmacOS / Linux / Git Bash:
source .venv/bin/activate
python -m pip install -r requirements.txt| Area | Packages |
|---|---|
| App | streamlit, plotly, pandas, numpy |
| Census / crime / walkability / weather HTTP | requests, urllib3 |
| PLACES spatial join | geopandas (+ GDAL stack via pip or conda) |
Clustering + scaling in models/cluster_model.py |
scikit-learn |
| Semantic search in recommender | chromadb, sentence-transformers |
| Bedrock-backed explanation generation | boto3 |
| Wiki/raw Excel ingestion | boto3, openpyxl, requests |
| Optional notebook/evaluation workflow | jupyter, ipykernel (included in requirements.txt) |
All options assume you’ve completed Setup (virtualenv + pip install -r requirements.txt) and you’re running commands from the repo root.
Command examples use PowerShell syntax where shown; the same python and streamlit commands work in bash or zsh on macOS and Linux.
-
Ensure
data/final/Final_Enriched_Dataset.csvexists and thechroma_db/directory is populated (run Step 3 in Option 3 if needed).- If you don’t have the final CSV yet, generate it via Option 2 (from
data/processed/) or Option 3 (full pipeline from raw data).
- If you don’t have the final CSV yet, generate it via Option 2 (from
-
app.pyrequires AWS credentials in.streamlit/secrets.toml(see AWS / Bedrock setup below). -
Start the UI:
streamlit run app.pyUse this if you already have the processed inputs under data/processed/ (for example: Census_Data.csv, Crime_Data.csv, Places_Data.csv, Walkability_Data.csv, plus the provided Weather_Data.csv and wiki summaries CSV).
- Generate the final dataset:
python -m src.final_dataset_loader-
If
chroma_db/is missing or empty, run Step 3 in Option 3 so semantic / keyword matching works in the app (needsdata/processed/cbsa_wiki_wikivoyage_summaries_df.csv). -
app.pyrequires AWS credentials in.streamlit/secrets.toml(see AWS / Bedrock setup below). -
Run the app:
streamlit run app.pyFollow Step 0 → Step 1 → Step 2 → Step 3 → Step 4 below. Use this when rebuilding processed tables and the final dataset from raw inputs (not only the prebuilt files under data/processed/).
All commands assume the repository root as the current working directory.
Download raw files from this Google Drive folder and place them under data/raw/:
| Loader | Required paths (defaults in code) |
|---|---|
| Census | data/raw/2023_Gaz_cbsa_national.txt (Census CBSA gazetteer). ACS tables are fetched from api.census.gov (optional CENSUS_API_KEY). |
| Crime | data/raw/FBI_Crime_Data_By_City_with_Counties.csv, data/raw/ZIP_CBSA_122023.csv |
| PLACES | data/raw/PLACES__Census_Tract_Data_(GIS_Friendly_Format),_2025_release_20260314.csv (or your tract file with the same column expectations), data/raw/shapefiles/tl_2023_us_cbsa.shp plus sidecars (.dbf, .shx, .prj, …). |
| Walkability | data/raw/EPA_SmartLocationDatabase_V3_Jan_2021_Final.csv |
| Weather | Skipped for normal reproduction — use committed data/processed/Weather_Data.csv. Full rebuild uses the gazetteer + thousands of NOAA downloads (many hours). |
| Wiki text | data/raw/list2_2023.xlsx (cities by CBSA/metro/micro). Fetches Wikipedia/Wikivoyage intro text and uses Bedrock to write per–metro/micro summaries under data/processed/ (slow; optional). |
Optional Census API key (better rate limits): set CENSUS_API_KEY in your shell (for example $env:CENSUS_API_KEY='…' in PowerShell) before running the census loader. The loader also runs without a key.
Run in this order (census first is conventional; crime/places/walkability only depend on raw files, not on each other):
# Windows PowerShell (from repo root)
python -m src.census_data_loader
python -m src.crime_data_loader
python -m src.places_data_loader
python -m src.walkability_data_loaderSkip weather and keep using the repo’s data/processed/Weather_Data.csv. Do not run weather_data_loader unless you intend to wait for a full NOAA pull.
If you must rebuild weather:
python -m src.weather_data_loaderThat writes data/processed/Weather_Data.csv (and uses data/raw/weather/noaa_monthly_normals/ as a cache).
Skip wiki text and keep using the repo’s data/processed/cbsa_wiki_wikivoyage_summaries_df.csv (or generate it once and reuse). Do not run wiki_text_loader unless you intend to wait for many Wikipedia/Wikivoyage API calls plus Bedrock summarization per CBSA.
AWS credentials (Step B in AWS setup) are required to rebuild wiki summaries
python src/wiki_text_loader.pypython -m src.final_dataset_loaderBuilds the persisted vector index under chroma_db/ from data/processed/cbsa_wiki_wikivoyage_summaries_df.csv (required for keyword / semantic matching in the app).
python src/semantic_search.pyNotes:
- Requires internet access the first time (downloads the
sentence-transformers/all-MiniLM-L6-v2model). - If your environment blocks TLS/certificate validation, fix local cert trust first or this step will fail.
- After a successful run, start or rerun the app as usual.
Outputs:
| File/ Folder | Description |
|---|---|
data/processed/Census_Data.csv |
Census loader |
data/processed/Crime_Data.csv |
Crime loader |
data/processed/Places_Data.csv |
PLACES loader |
data/processed/Walkability_Data.csv |
Walkability loader |
data/processed/Weather_Data.csv |
Weather loader (or committed copy) |
data/processed/cbsa_wiki_wikivoyage_summaries_df.csv |
Wiki text loader (Wikipedia/Wikivoyage + Bedrock summaries per CBSA) |
data/final/Final_Base_Dataset.csv |
Merged + imputed base |
data/final/Final_Enriched_Dataset.csv |
Base + feature/composite scores + cluster columns (app input) |
chroma_db/ |
ChromaDB store: vector embeddings of CBSA summary text |
Set AWS credentials in .streamlit/secrets.toml
streamlit run app.pyAmazon Bedrock is used in two places: the Streamlit “Why this city?” explanations, and wiki_text_loader.py, which calls an LLM to turn Wikipedia/Wikivoyage text into per-CBSA summaries.
- Bedrock Runtime permission:
bedrock:InvokeModel - Model access enabled in Bedrock console for:
anthropic.claude-3-haiku-20240307-v1:0 - Region:
us-east-1
Use the below steps to set the AWS secrets.
step A — Streamlit secrets (local machine only) to run the app
Create .streamlit/secrets.toml in the project root directory locally (never commit) and add secrets:
AWS_ACCESS_KEY_ID="..."
AWS_SECRET_ACCESS_KEY="..."
AWS_SESSION_TOKEN="..."step B — Environment variables (temporary credentials) to run .py files
AWS_ACCESS_KEY_ID=...
AWS_SECRET_ACCESS_KEY=...
AWS_SESSION_TOKEN=... step C — Environment variables (temporary credentials) to run .ipynb files
Create .env in the project root directory locally (never commit) and add secrets:
AWS_ACCESS_KEY_ID=...
AWS_SECRET_ACCESS_KEY=...
AWS_SESSION_TOKEN=... - Never hardcode
AWS_ACCESS_KEY_ID,AWS_SECRET_ACCESS_KEY, orAWS_SESSION_TOKENin source files. - If credentials were ever committed in code history, rotate/revoke them immediately.
Cursor was used sporadically throughout this project. Specifically it was used to help set up the framework of the data loader files, but many edits were made outside of that initial setup, so line-by-line attribution to Cursor is not possible.
This project utilized Gen AI (Claude) for UI and styling components to ensure a consistent user experience:
index.html: The structural framework and CSS were generated by Claude. All project-specific text and documentation content were manually authored and refined by the project team.app.py: Custom CSS styling was generated by Claude to maintain visual parity between the Streamlit application and the static About page.
Respect terms of use for Census API, CDC PLACES, FBI crime statistics, EPA Smart Location Database, and NOAA normals when redistributing derived files.
| Issue | Fix |
|---|---|
pip install errors on requirements.txt |
Some pins are not Windows-friendly (e.g. uvloop, torch/torchvision/numpy conflicts). Use requirements.windows.txt from the repo root instead, then continue with the same venv activation steps. |
import torch fails — WinError 126 / fbgemm.dll / missing libomp140.x86_64.dll |
Install the Microsoft Visual C++ Redistributable 2015–2022 (x64). Still fails? Also install the x86 redistributable. Still fails? Install Visual Studio Build Tools 2022 with Desktop development with C++ (MSVC + Windows SDK), restart, and retry. |
NumPy + PyTorch warnings / _ARRAY_API / "compiled using NumPy 1.x" |
With torch==2.4.0, pin NumPy 1.x: pip install "numpy<2" |
| Hugging Face download errors (xet / corrupt) | Delete %USERPROFILE%\.cache\huggingface\hub\models--sentence-transformers--all-MiniLM-L6-v2, then rerun with $env:HF_HUB_DISABLE_XET = "1" (PowerShell). |
CERTIFICATE_VERIFY_FAILED on huggingface.co |
Install certifi, then run: $env:SSL_CERT_FILE = python -c "import certifi; print(certifi.where())" and set $env:REQUESTS_CA_BUNDLE to the same path. Retry. |
