Skip to content

Latest commit

 

History

History
102 lines (72 loc) · 4.71 KB

File metadata and controls

102 lines (72 loc) · 4.71 KB
Project Banner

🏛️ Congressional Twitter Intelligence

A decade of congressional tweets analyzed to build a data-driven lobbying targeting system.

Python SQLite Jupyter scikit-learn License: MIT

1,243,370 tweets · 548 members · 2008–2017


📊 Figures

Fig 1–2 — Top 20 Bigrams by Party
Fig 1–2 — Top 20 Bigrams by Party
Fig 3 — Top 15 Bipartisan Bigrams
Fig 3 — Top 15 Bipartisan Bigrams
Fig 4 — Vocabulary Divergence Over Time
Fig 4 — Vocabulary Divergence Over Time
Fig 5 — Retweet Distribution: Senate vs House
Fig 5 — Retweet Distribution: Senate vs House
Fig 6 — Sentiment vs Log Retweet Count
Fig 6 — Sentiment vs Log Retweet Count
Fig 7 — OLS Regression Coefficients
Fig 7 — OLS Regression Coefficients
Fig 8 — Top 20 Members by LLS
Fig 8 — Top 20 Members by LLS
Fig 9 — Bipartisan Window Score Heatmap
Fig 9 — Bipartisan Window Score Heatmap by State × Month

🗂️ Project Overview

This capstone project analyzes a decade of congressional Twitter activity to build a data-driven lobbying targeting system. Twitter stores no political metadata — party, chamber, and state are all missing. We solved this through a three-step enrichment join against the @unitedstates legislators database, recovering metadata for 74.8% of members.


🚀 Quickstart

Note: The raw dataset is not included due to GitHub's file size limit. See Data below.

git clone https://github.com/username/congressional-twitter-intelligence.git
cd congressional-twitter-intelligence
pip install -r requirements.txt
jupyter notebook notebooks/M3.ipynb

🛠️ Stack

Tool Purpose
SQLite Primary database and storage
Python / pandas Data wrangling and analysis
scikit-learn TF-IDF vectorization, OLS regression
TextBlob Sentiment scoring
matplotlib Data visualization
scipy Pearson correlation tests

📁 Structure

repo/
├── figures/                        # All saved charts & plots (Fig 1–9 + supplementary)
├── data/
│   ├── export/                     # CSV exports for Tableau
│   └── US_PoliticalTweets.tar.gz   # Raw dataset (not tracked — see Data section)
├── notebooks/
│   ├── M3.ipynb                    # TF-IDF, regression, custom metrics
│   ├── US_Political_Tweet_War_M2.ipynb  # Descriptive statistics
│   ├── Project_Proposal.ipynb      # Project proposal
│   └── Project_Proposal.pdf        # Proposal PDF export
├── presentation/
│   └── Lobbyists4America.html      # Final HTML presentation
├── sql/
│   └── SQL-Query.ipynb             # SQL queries notebook
├── .gitignore
└── README.md

💾 Data

The raw dataset (US_PoliticalTweets.tar.gz, 229MB) is not included in this repo due to GitHub's file size limit.

Download it from: [link to original source or Google Drive]

Once downloaded, place it in the data/ folder and run notebooks/M3.ipynb from the top.


⚠️ Caveats

  • Dataset covers 2008–2017 only — Twitter's 280-char limit, follower growth, and political intensity all postdate the archive
  • 24.4% of tweets could not be matched to a party (Independent members, data gaps)
  • Sentiment scored via TextBlob on a 20K random sample — not full corpus
  • LLS and BWS scores should be re-validated with updated data before operational use