Skip to content

br1xn/Twitter-Sentiment-Analysis

Repository files navigation

Twitter Sentiment Analysis

App Screenshot

Objective

The objective of this project is to determine which method of the sentiment analysis provides better results when analyzing X data. We compared the performance of traditional Machine learning models, Deep neural networks and Transformers by leveraging Natural Lnaguage Processing(NLP) techniques. We evualte these models using Accuracy, Confusion Matrix, Precision, Loss.

Dataset

The dataset used is Kaggle's Sentiment140. Consists of 1.6 million tweets labelled as positive and negative.You can access the dataset on Kaggle at the following link: https://www.kaggle.com/datasets/kazanova/sentiment140

Exploratory Data Analysis

  • Distribution of Positive and Negative Tweets

download

  • Distribution of @UserMentions , Links & #Hashtags

download

Word Clouds

  • Frequency of Positive words

download

  • Frequency of Negative words

download

Natural Language Processing (NLP)

NLP plays a pivotal role in extracting, processing, and understanding textual data to determine the sentiment expressed within it.NLP techniques are used to clean and prepare textual data for analysis. This includes removing noise (e.g., stopwords, punctuation), normalizing text (e.g., lowercasing, stemming, and lemmatization), and handling slang and abbreviations.

Sentiment Analysis models

Machine Learning Models

  • Logistic Regression

Widely used statistical model for binary classification. It predicts the probability that a given input belongs to a particular class by applying a logistic function (also known as the sigmoid function) to a linear combination of input features.

Tokenizer: TfidfVectorizer

image

  • Naive Bayes

Probablistic classifier based on Bayes theorem.

Tokenizer: CountVectorizer

download

  • Evaluation Table

    Model Accuracy Dataset size
    Logistic Regression 78.1 1,600,000
    Naive Bayes 76.7 1,600,000

Deep Neural Network

  • LSTM (Long Short-Term Memory)

A type of Recurrent Neural Network, an LSTM recurrent unit tries to “remember” all the past knowledge that the network is seen so far and to “forget” irrelevant data.

Embeddings: Word2Vec

  • Evaluation Table
Model Accuracy Dataset size Number of Epochs
LSTM 79 300,000 8

Transformer

  • BERT(Bidirectional Encoder Representations from Transformer)

BERT is a deep bidirectional, unsupervised language representation, pre-trained using a plain text corpus.BERT converts words into numbers. This process is important because machine learning models use numbers, not words, as inputs. This allows you to train machine learning models on your textual data.

BERT model used: bert-base-multilingual-uncased-sentiment

I was able to use only limited number of epochs due to its long training time and the size of the dataset is greatly reduced for this purpose.

  • Evaluation Table
Model No of Epochs Train Loss Precision Recall F1 score Dataset size
BERT 3 38.2 80 77.6 75.4 20,000

Sentiment Analyzer

A sentiment analyzer is built using the streamlit interface where user can upload the required sentiment-to-be-found file and after analyzing would give you accurate sentiment next to the text also to help visually would provide you with bar chart and pie chart of the over all analysis. If needed user can install the csv file which contains all the sentiment results directly from it.

Model used: bert-base-multilingual-uncased-sentiment

  • Demo video
0801.mp4
  • Batch sentiment analysis
Bacth_Sentiment_Analysis.mp4

References

About

Twitter Sentiment Analysis using NLP , ML models , Deep Neural Networks and Transformers

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors