This project uses Node.js to query a MongoDB database loaded with tweets from the 2020 IEEE VIS Conference.
Download the tweets dump file: https://johnguerra.co/viz/influentials/ieeevis/ieeevis2020/ieeevis2020Tweets.dump.bz2
Use Keka (Mac) or 7zip (Windows) to extract the .dump file.
mongoimport -h localhost:27017 -d ieeevisTweets -c tweet --file ieeevis2020Tweets.dumpnpm installnode Query1.js
node Query2.js
node Query3.js
node Query4.js
node Query5.jsQuery 1 - Non-Retweets/Replies: Returns the count of tweets that are not retweets or replies (i.e. where retweeted_status does not exist).
Query 2 - Top 10 by Followers: Returns the top 10 screen names ranked by their number of followers.
Query 3 - Most Tweeted At: Finds the person who was mentioned the most across all tweets.
Query 4 - Top Retweeted Users: Returns the top 10 users with the highest average retweet count, filtered to only users who tweeted more than 3 times.
Query 5 - Separate User Collection: Migrates the embedded user data into its own users collection, and creates a new tweets_only collection that references users by ID instead of embedding the full user object.
mongodb— MongoDB Node.js driver