This project builds a data-driven system to evaluate instructor effectiveness using course performance, engagement, and feedback metrics.
Instead of relying on a single metric, the pipeline combines:
- Course quality estimation
- Course difficulty adjustment
- Instructor experience
- Machine learning models for prediction and ranking
The final output includes:
- Instructor effectiveness scores
- Tier-based segmentation (Low / Medium / High)
- Insights into key drivers of teaching success
The dataset contains information on:
-
Courses (25 unique)
-
Instructors (120 unique)
-
Performance Metrics, including:
- Completion rate
- Dropout rate
- Average quiz score
- Score improvement
- Watch time
- Assignment submission rate
- Forum activity
- Feedback score
- Feedback response rate
Key findings:
-
Strong negative correlation between completion rate and dropout rate
-
Positive relationships between:
- Learning outcomes (scores, improvement)
- Engagement metrics
- Feedback scores
-
Most features capture a shared concept: student success
- Applied StandardScaler to normalize features
- Ensures fair contribution across variables
A hybrid scoring system combining:
-
PCA (Principal Component Analysis) Captures maximum variance (linear relationships)
-
Random Forest Captures non-linear feature importance
-
Linear Regression (LR) Provides baseline linear weighting
Final Formula:
Course Quality Score =
0.3 × PCA Score +
0.5 × RF Score +
0.2 × LR Score
Derived using:
- Inverse completion rate
- Low score improvement
- Low quiz performance
Processed via:
- Min-Max scaling
- PCA reduction
Accounts for:
- Course quality
- Course difficulty
- Instructor experience
Formula:
Instructor Score =
Weighted Quality (difficulty-adjusted) +
λ × Experience
Where:
- Experience = log(1 + number of courses taught)
Predict instructor effectiveness from raw features
- Ridge Regression
- Lasso Regression
- Random Forest
- Extra Trees
- Gradient Boosting
- XGBoost
- Cross-validation (5-fold)
- RMSE (Root Mean Squared Error)
- R² score
-
Applied KMeans clustering
-
Grouped instructors into:
- High performers
- Medium performers
- Low performers
- Completion rate is the strongest predictor of effectiveness
- Learning metrics (score improvement, quiz scores) are highly influential
- Engagement metrics have moderate impact
- Difficulty-adjusted scoring provides a fairer evaluation
- Model predictions closely align with actual scores
-
Correlation ≠ causation
-
Confounding factors:
- Course difficulty
- Student background
-
Potential for metric manipulation
-
Limited contextual data (no student-level features)
-
Include:
- Student demographics
- Baseline knowledge levels
- Course type & difficulty labels
-
Add:
- Causal inference methods
- Time-series analysis
-
Improve fairness:
- Bias detection & correction
-
Deploy:
- Real-time instructor dashboard
- Python
- Pandas, NumPy
- Scikit-learn
- XGBoost
- Matplotlib, Seaborn
├── notebook.ipynb # Full analysis pipeline
├── data/ # Input dataset
├── outputs/ # Plots & results
└── README.md # Project documentation
This project demonstrates how data science + machine learning can be used to:
- Evaluate instructor effectiveness
- Identify key success drivers
- Support better educational decisions
However, the model should be used as a:
decision-support tool, not a standalone evaluation system
For questions or collaboration, feel free to reach out.