Skip to content

nithu0035/sentiment-analysis-LLM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

4 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ“Š Employee Sentiment Analysis β€” LLM & NLP Pipeline

An end-to-end NLP pipeline that analyzes employee email feedback to detect sentiment, rank employees, identify flight risks, and model sentiment trends over time using TextBlob and Scikit-learn.


✨ What This Project Does

  • 🏷️ Sentiment Labelling β€” TextBlob polarity scoring maps each feedback to Positive / Neutral / Negative
  • πŸ“ˆ Monthly Sentiment Scoring β€” Aggregates average sentiment over time with a line chart
  • πŸ† Employee Ranking β€” Ranks all employees by their average sentiment score
  • ⚠️ Flight Risk Detection β€” Rule-based heuristic flags high-risk employees (avg sentiment ≀ -0.1 with β‰₯ 3 feedbacks)
  • πŸ“‰ Linear Regression Trend β€” Fits a trend line over monthly sentiment to detect improving or declining patterns
  • πŸ” EDA & Visualizations β€” Text length distributions, feedback counts per year, sentiment distribution charts

πŸ› οΈ Tech Stack

Layer Technology
Language Python 3.9+
Sentiment Engine TextBlob
Data Processing Pandas, NumPy
ML / Trend Model Scikit-learn (LinearRegression)
Visualizations Matplotlib, Seaborn
Notebook Jupyter
Config python-dotenv

πŸ“ Project Structure

sentiment-analysis-LLM/
β”œβ”€β”€ data/
β”‚   └── employee_feedback.csv       # Email dataset (included)
β”œβ”€β”€ notebooks/
β”‚   └── employee_sentiment_analysis.ipynb  # Full interactive pipeline
β”œβ”€β”€ src/
β”‚   └── sentiment_pipeline.py       # Standalone Python script
β”œβ”€β”€ outputs/                        # Generated on run
β”‚   β”œβ”€β”€ feedback_with_sentiment.csv
β”‚   β”œβ”€β”€ employee_ranking_and_flight_risk.csv
β”‚   └── monthly_sentiment_trend.csv
β”œβ”€β”€ .env.example
β”œβ”€β”€ requirements.txt
└── README.md

πŸ“‚ Dataset

The dataset data/employee_feedback.csv contains employee email data with these columns:

Column Used As Description
from employee_id Sender email (employee identifier)
body feedback_text Email content (analyzed for sentiment)
date date Date of the feedback
Subject β€” Email subject (kept for reference)

βš™οΈ Setup & Installation

1. Clone the repository

git clone https://github.com/nithu0035/sentiment-analysis-LLM.git
cd sentiment-analysis-LLM

2. Create a virtual environment (recommended)

python -m venv venv

# Windows:
venv\Scripts\activate

# macOS / Linux:
source venv/bin/activate

3. Install dependencies

pip install -r requirements.txt

4. Configure environment variables

cp .env.example .env

.env should contain:

DATA_PATH=./data/employee_feedback.csv
OUTPUT_DIR=./outputs

▢️ Usage

Option A β€” Jupyter Notebook (recommended)

jupyter notebook

Open notebooks/employee_sentiment_analysis.ipynb and run all cells (Kernel β†’ Restart & Run All).

Option B β€” Python Script

python src/sentiment_pipeline.py

Both options produce the same 3 output CSVs saved to the outputs/ folder.


🧠 Methodology

1. Data Cleaning β€” Parse dates, drop nulls, rename columns, add text length feature.

2. Sentiment Analysis β€” TextBlob polarity score in range [-1, 1] mapped to three labels:

  • score > 0.05 β†’ Positive
  • score < -0.05 β†’ Negative
  • otherwise β†’ Neutral

3. Monthly Aggregation β€” Group by year-month, compute average sentiment and feedback count, plot time-series.

4. Employee Ranking β€” Group by employee ID, rank by average sentiment score descending.

5. Flight Risk Flag β€” Employees with avg sentiment ≀ -0.1 AND β‰₯ 3 feedback entries are flagged as High risk.

6. Linear Regression Trend β€” Encodes months as integer time index, fits LinearRegression, plots actual vs predicted trend line. A positive slope = improving sentiment over time.


πŸ“€ Output Files

File Description
feedback_with_sentiment.csv All feedback rows with polarity score and label
employee_ranking_and_flight_risk.csv Per-employee avg sentiment, feedback count, rank, and flight risk flag
monthly_sentiment_trend.csv Monthly avg sentiment with linear regression predictions

πŸ“„ License

MIT

πŸ‘€ Author

Gudipatoju Nitesh
GitHub: @nithu0035

About

πŸ“Š Employee feedback sentiment analysis using TextBlob & Scikit-learn β€” with monthly trend tracking, employee flight-risk flagging, and linear regression sentiment forecasting.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors