Skip to content

anth-chan/mongodb-tweet-analytics-engine

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 

Repository files navigation

Assignment 5 - MongoDB with Node.js

This project uses Node.js to query a MongoDB database loaded with tweets from the 2020 IEEE VIS Conference.

Prerequisites

Setup

1. Download the Dataset

Download the tweets dump file: https://johnguerra.co/viz/influentials/ieeevis/ieeevis2020/ieeevis2020Tweets.dump.bz2

2. Unzip the File

Use Keka (Mac) or 7zip (Windows) to extract the .dump file.

3. Import into MongoDB

mongoimport -h localhost:27017 -d ieeevisTweets -c tweet --file ieeevis2020Tweets.dump

4. Install Dependencies

npm install

Running the Queries

node Query1.js
node Query2.js
node Query3.js
node Query4.js
node Query5.js

Queries

Query 1 - Non-Retweets/Replies: Returns the count of tweets that are not retweets or replies (i.e. where retweeted_status does not exist).

Query 2 - Top 10 by Followers: Returns the top 10 screen names ranked by their number of followers.

Query 3 - Most Tweeted At: Finds the person who was mentioned the most across all tweets.

Query 4 - Top Retweeted Users: Returns the top 10 users with the highest average retweet count, filtered to only users who tweeted more than 3 times.

Query 5 - Separate User Collection: Migrates the embedded user data into its own users collection, and creates a new tweets_only collection that references users by ID instead of embedding the full user object.

Dependencies

  • mongodb — MongoDB Node.js driver

About

MongoDB-based analytics engine for processing and querying large-scale tweet data, enabling efficient data exploration and insight generation.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors