In Formula 1, every tenth of a second matters. Race engineers use predictive models to decide when to pit, which tyre compound to deploy, and how fast a car will degrade over a stint.
This project applies Simple and Multiple Linear Regression to real F1 telemetry data (101,371 laps across the 2022–2025 seasons) to predict lap times — a real problem in modern motorsport analytics.
| Property | Value |
|---|---|
| Name | F1 Strategy Dataset v4 |
| Rows | 101,371 individual laps |
| Features | 16 columns |
| Seasons | 2022, 2023, 2024, 2025 |
| Target | LapTime (s) — continuous numerical |
Key Features Used:
TyreLife— laps completed on the current set of tyresLapNumber— current lap in the raceRaceProgress— fraction of race completed (0–1)Position— current race positionCompound— tyre type (SOFT / MEDIUM / HARD)
- Feature:
TyreLife(single predictor) - Equation:
LapTime = m × TyreLife + b
- Features:
TyreLife,LapNumber,RaceProgress,Position,Compound_Speed - Equation:
LapTime = m₁×TyreLife + m₂×LapNumber + m₃×RaceProgress + m₄×Position + m₅×Compound_Speed + b
| Metric | Simple LR | Multiple LR | Improvement |
|---|---|---|---|
| MAE | — s | — s | ↓ ~X% |
| RMSE | — s | — s | ↓ ~X% |
| R² Score | — | — | ↑ + X |
Run the notebook to see your actual metrics filled in!
f1-lap-time-regression/
│
├── F1_LapTime_Regression.ipynb ← Main notebook (run this!)
├── f1_strategy_dataset_v4.csv ← Dataset
├── README.md ← You are here
└── plots/ ← Exported visualizations
├── plot1_lapdist.png
├── plot2_heatmap.png
├── plot3_degradation.png
├── plot4_raceprogress.png
├── plot5_simple_lr.png
├── plot6_multi_lr.png
└── plot7_comparison.png
- Click the Open in Colab badge above
- Upload
f1_strategy_dataset_v4.csvwhen prompted - Run all cells (
Runtime → Run all)
# Clone the repo
git clone https://github.com/YOUR_USERNAME/f1-lap-time-regression.git
cd f1-lap-time-regression
# Install dependencies
pip install pandas numpy matplotlib seaborn plotly scikit-learn
# Launch Jupyter
jupyter notebook F1_LapTime_Regression.ipynbpandas >= 1.5
numpy >= 1.23
matplotlib >= 3.6
seaborn >= 0.12
plotly >= 5.11
scikit-learn >= 1.2
- Tyre degradation is real and measurable — lap times increase linearly with tyre age, most strongly on SOFT compounds
- Multiple features significantly reduce prediction error vs a single-variable model
- Data leakage pitfall —
LapTime_DeltaandCumulative_Degradationwere excluded as they're derived directly from the target variable - Safety car laps and weather transitions are the largest residual outliers — future models should flag these
- Add
TrackTemp_C— grip varies massively with track temperature - Add
FuelLoad_kg— fuel burn improves lap time by ~0.03s/lap - Add
IsSafetyCarflag — removes major outliers - Try Polynomial Regression — tyre degradation is non-linear
- Try Random Forest / XGBoost — handles interaction effects
- Integrate FastF1 API for real-time live race predictions
Devendra Agrawal
LinkedIn | GitHub | Kaggle
This project is licensed under the MIT License.
Dataset used for educational purposes only.
Built as part of a Machine Learning coursework assignment.
F1 data analysis | Python | scikit-learn | 2025