Twitter Sentiment Analysis

Objective

The objective of this project is to determine which method of the sentiment analysis provides better results when analyzing X data. We compared the performance of traditional Machine learning models, Deep neural networks and Transformers by leveraging Natural Lnaguage Processing(NLP) techniques. We evualte these models using Accuracy, Confusion Matrix, Precision, Loss.

Dataset

The dataset used is Kaggle's Sentiment140. Consists of 1.6 million tweets labelled as positive and negative.You can access the dataset on Kaggle at the following link: https://www.kaggle.com/datasets/kazanova/sentiment140

Exploratory Data Analysis

Distribution of Positive and Negative Tweets

Distribution of @UserMentions , Links & #Hashtags

Word Clouds

Frequency of Positive words

Frequency of Negative words

Natural Language Processing (NLP)

NLP plays a pivotal role in extracting, processing, and understanding textual data to determine the sentiment expressed within it.NLP techniques are used to clean and prepare textual data for analysis. This includes removing noise (e.g., stopwords, punctuation), normalizing text (e.g., lowercasing, stemming, and lemmatization), and handling slang and abbreviations.

Sentiment Analysis models

ML Models: Logistic Regression , Naive Bayes
Neural Network: LSTM
Transformer: BERT

Machine Learning Models

Logistic Regression

Widely used statistical model for binary classification. It predicts the probability that a given input belongs to a particular class by applying a logistic function (also known as the sigmoid function) to a linear combination of input features.

Tokenizer: TfidfVectorizer

Naive Bayes

Probablistic classifier based on Bayes theorem.

Tokenizer: CountVectorizer

Evaluation Table

Model Accuracy Dataset size

Logistic Regression 78.1 1,600,000

Naive Bayes 76.7 1,600,000

Deep Neural Network

LSTM (Long Short-Term Memory)

A type of Recurrent Neural Network, an LSTM recurrent unit tries to “remember” all the past knowledge that the network is seen so far and to “forget” irrelevant data.

Embeddings: Word2Vec

Evaluation Table

Model	Accuracy	Dataset size	Number of Epochs
`LSTM`	79	300,000	8

Transformer

BERT(Bidirectional Encoder Representations from Transformer)

BERT is a deep bidirectional, unsupervised language representation, pre-trained using a plain text corpus.BERT converts words into numbers. This process is important because machine learning models use numbers, not words, as inputs. This allows you to train machine learning models on your textual data.

BERT model used: bert-base-multilingual-uncased-sentiment

I was able to use only limited number of epochs due to its long training time and the size of the dataset is greatly reduced for this purpose.

Evaluation Table

Model	No of Epochs	Train Loss	Precision	Recall	F1 score	Dataset size
`BERT`	3	38.2	80	77.6	75.4	20,000

Sentiment Analyzer

A sentiment analyzer is built using the streamlit interface where user can upload the required sentiment-to-be-found file and after analyzing would give you accurate sentiment next to the text also to help visually would provide you with bar chart and pie chart of the over all analysis. If needed user can install the csv file which contains all the sentiment results directly from it.

Model used: bert-base-multilingual-uncased-sentiment

Demo video

0801.mp4

Batch sentiment analysis

Bacth_Sentiment_Analysis.mp4

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
App.ipynb		App.ipynb
BERT_Sentiment_Training.ipynb		BERT_Sentiment_Training.ipynb
Exploratory_Data_Analysis.ipynb		Exploratory_Data_Analysis.ipynb
LSTM_Twitter_Training.ipynb		LSTM_Twitter_Training.ipynb
ML_Sentiment_Training.ipynb		ML_Sentiment_Training.ipynb
README.md		README.md
batchapp.py		batchapp.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Twitter Sentiment Analysis

Objective

Dataset

Exploratory Data Analysis

Word Clouds

Natural Language Processing (NLP)

Sentiment Analysis models

Machine Learning Models

Deep Neural Network

Transformer

Sentiment Analyzer

References

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Model	Accuracy	Dataset size
`Logistic Regression`	78.1	1,600,000
`Naive Bayes`	76.7	1,600,000

Folders and files

Latest commit

History

Repository files navigation

Twitter Sentiment Analysis

Objective

Dataset

Exploratory Data Analysis

Word Clouds

Natural Language Processing (NLP)

Sentiment Analysis models

Machine Learning Models

Deep Neural Network

Transformer

Sentiment Analyzer

References

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages