Overview: This project focuses on analyzing and predicting transportation-related data using various machine learning techniques. The goal is to develop accurate models for classification and regression tasks to understand and forecast key metrics in the transportation domain.
Loading Data: Import and preprocess transportation data, including balancing datasets and encoding categorical variables.
Feature Engineering: Create additional features such as time zone dummy variables to enhance model performance.
Model Selection: Evaluate multiple models including Gradient Boosting Classifier, K-Nearest Neighbors (KNN), and others.
Training: Train models using training data with optimized parameters.
Hyperparameter Tuning: Test different learning rates and number of estimators for Gradient Boosting Classifier.
Prediction: Generate predictions on test data and evaluate performance.
Performance Metrics: Use metrics such as accuracy, precision, recall, F1 score, and ROC-AUC to evaluate model performance.
Confusion Matrix: Visualize confusion matrices to understand model predictions.
Visualization: Plot learning curves and trends to interpret model behavior and performance.
Gradient Boosting Classifier: Train and evaluate the model with different hyperparameters.
K-Nearest Neighbors: Implement and evaluate KNN classifier for classification tasks.
Model Scoring: Score models on training and testing data to measure performance.
Comparison: Compare results across different models and select the best-performing one.