An end-to-end NLP pipeline that analyzes employee email feedback to detect sentiment, rank employees, identify flight risks, and model sentiment trends over time using TextBlob and Scikit-learn.
- π·οΈ Sentiment Labelling β TextBlob polarity scoring maps each feedback to Positive / Neutral / Negative
- π Monthly Sentiment Scoring β Aggregates average sentiment over time with a line chart
- π Employee Ranking β Ranks all employees by their average sentiment score
β οΈ Flight Risk Detection β Rule-based heuristic flags high-risk employees (avg sentiment β€ -0.1 with β₯ 3 feedbacks)- π Linear Regression Trend β Fits a trend line over monthly sentiment to detect improving or declining patterns
- π EDA & Visualizations β Text length distributions, feedback counts per year, sentiment distribution charts
| Layer | Technology |
|---|---|
| Language | Python 3.9+ |
| Sentiment Engine | TextBlob |
| Data Processing | Pandas, NumPy |
| ML / Trend Model | Scikit-learn (LinearRegression) |
| Visualizations | Matplotlib, Seaborn |
| Notebook | Jupyter |
| Config | python-dotenv |
sentiment-analysis-LLM/
βββ data/
β βββ employee_feedback.csv # Email dataset (included)
βββ notebooks/
β βββ employee_sentiment_analysis.ipynb # Full interactive pipeline
βββ src/
β βββ sentiment_pipeline.py # Standalone Python script
βββ outputs/ # Generated on run
β βββ feedback_with_sentiment.csv
β βββ employee_ranking_and_flight_risk.csv
β βββ monthly_sentiment_trend.csv
βββ .env.example
βββ requirements.txt
βββ README.md
The dataset data/employee_feedback.csv contains employee email data with these columns:
| Column | Used As | Description |
|---|---|---|
from |
employee_id |
Sender email (employee identifier) |
body |
feedback_text |
Email content (analyzed for sentiment) |
date |
date |
Date of the feedback |
Subject |
β | Email subject (kept for reference) |
git clone https://github.com/nithu0035/sentiment-analysis-LLM.git
cd sentiment-analysis-LLMpython -m venv venv
# Windows:
venv\Scripts\activate
# macOS / Linux:
source venv/bin/activatepip install -r requirements.txtcp .env.example .env.env should contain:
DATA_PATH=./data/employee_feedback.csv
OUTPUT_DIR=./outputsjupyter notebookOpen notebooks/employee_sentiment_analysis.ipynb and run all cells (Kernel β Restart & Run All).
python src/sentiment_pipeline.pyBoth options produce the same 3 output CSVs saved to the outputs/ folder.
1. Data Cleaning β Parse dates, drop nulls, rename columns, add text length feature.
2. Sentiment Analysis β TextBlob polarity score in range [-1, 1] mapped to three labels:
score > 0.05β Positivescore < -0.05β Negative- otherwise β Neutral
3. Monthly Aggregation β Group by year-month, compute average sentiment and feedback count, plot time-series.
4. Employee Ranking β Group by employee ID, rank by average sentiment score descending.
5. Flight Risk Flag β Employees with avg sentiment β€ -0.1 AND β₯ 3 feedback entries are flagged as High risk.
6. Linear Regression Trend β Encodes months as integer time index, fits LinearRegression, plots actual vs predicted trend line. A positive slope = improving sentiment over time.
| File | Description |
|---|---|
feedback_with_sentiment.csv |
All feedback rows with polarity score and label |
employee_ranking_and_flight_risk.csv |
Per-employee avg sentiment, feedback count, rank, and flight risk flag |
monthly_sentiment_trend.csv |
Monthly avg sentiment with linear regression predictions |
Gudipatoju Nitesh
GitHub: @nithu0035