🏛️ Congressional Twitter Intelligence

A decade of congressional tweets analyzed to build a data-driven lobbying targeting system.

1,243,370 tweets · 548 members · 2008–2017

📊 Figures


_{Fig 1–2 — Top 20 Bigrams by Party}	_{Fig 3 — Top 15 Bipartisan Bigrams}
_{Fig 4 — Vocabulary Divergence Over Time}	_{Fig 5 — Retweet Distribution: Senate vs House}
_{Fig 6 — Sentiment vs Log Retweet Count}	_{Fig 7 — OLS Regression Coefficients}
_{Fig 8 — Top 20 Members by LLS}	_{Fig 9 — Bipartisan Window Score Heatmap by State × Month}

🗂️ Project Overview

This capstone project analyzes a decade of congressional Twitter activity to build a data-driven lobbying targeting system. Twitter stores no political metadata — party, chamber, and state are all missing. We solved this through a three-step enrichment join against the @unitedstates legislators database, recovering metadata for 74.8% of members.

🚀 Quickstart

Note: The raw dataset is not included due to GitHub's file size limit. See Data below.

git clone https://github.com/username/congressional-twitter-intelligence.git
cd congressional-twitter-intelligence
pip install -r requirements.txt
jupyter notebook notebooks/M3.ipynb

🛠️ Stack

Tool	Purpose
SQLite	Primary database and storage
Python / pandas	Data wrangling and analysis
scikit-learn	TF-IDF vectorization, OLS regression
TextBlob	Sentiment scoring
matplotlib	Data visualization
scipy	Pearson correlation tests

📁 Structure

repo/
├── figures/                        # All saved charts & plots (Fig 1–9 + supplementary)
├── data/
│   ├── export/                     # CSV exports for Tableau
│   └── US_PoliticalTweets.tar.gz   # Raw dataset (not tracked — see Data section)
├── notebooks/
│   ├── M3.ipynb                    # TF-IDF, regression, custom metrics
│   ├── US_Political_Tweet_War_M2.ipynb  # Descriptive statistics
│   ├── Project_Proposal.ipynb      # Project proposal
│   └── Project_Proposal.pdf        # Proposal PDF export
├── presentation/
│   └── Lobbyists4America.html      # Final HTML presentation
├── sql/
│   └── SQL-Query.ipynb             # SQL queries notebook
├── .gitignore
└── README.md

💾 Data

The raw dataset (US_PoliticalTweets.tar.gz, 229MB) is not included in this repo due to GitHub's file size limit.

Download it from: [link to original source or Google Drive]

Once downloaded, place it in the data/ folder and run notebooks/M3.ipynb from the top.

⚠️ Caveats

Dataset covers 2008–2017 only — Twitter's 280-char limit, follower growth, and political intensity all postdate the archive
24.4% of tweets could not be matched to a party (Independent members, data gaps)
Sentiment scored via TextBlob on a 20K random sample — not full corpus
LLS and BWS scores should be re-validated with updated data before operational use

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🏛️ Congressional Twitter Intelligence

📊 Figures

🗂️ Project Overview

🚀 Quickstart

🛠️ Stack

📁 Structure

💾 Data

⚠️ Caveats

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

🏛️ Congressional Twitter Intelligence

📊 Figures

🗂️ Project Overview

🚀 Quickstart

🛠️ Stack

📁 Structure

💾 Data

⚠️ Caveats