Bulldozer Price Regression (End-to-End ML Project)

This project is an end-to-end machine learning workflow to predict the sale price of bulldozers using historical auction data.

It is based on the Bluebook for Bulldozers Kaggle competition and follows a complete ML pipeline — from data preprocessing to model evaluation and prediction.

Project Objective

The goal of this project is:

To predict the future sale price of a bulldozer based on its characteristics and past sales data.

Since the output is a continuous value, this is a regression problem.
Additionally, because the data includes time-based features (sale dates), it also involves time series forecasting concepts. :contentReference[oaicite:0]{index=0}

Dataset

The dataset comes from the Kaggle competition and consists of:

Train.csv → Historical data up to 2011
Valid.csv → Validation data (Jan–Apr 2012)
Test.csv → Test data (May–Nov 2012, without target) :contentReference[oaicite:1]{index=1}

Key Features

Machine specifications (ModelID, ProductSize, etc.)
Usage data (MachineHoursCurrentMeter)
Sale information (state, auctioneerID)
Time-based feature (saledate)

Target Variable

SalePrice

Machine Learning Workflow

This project follows a structured ML pipeline:

1. Problem Definition

Predict bulldozer sale price using historical data.

2. Data Exploration

Load dataset with Pandas
Understand structure, missing values, and data types
Identify key features

3. Feature Engineering

Convert saledate into:
- Year
- Month
- Day
- Day of week
Handle missing values:
- Numerical → filled with median
- Categorical → converted to numerical codes
Add missing-value indicator columns

4. Data Preprocessing

Convert categorical variables into numeric format
Ensure training and test data have the same feature structure

5. Model Building

Model used: RandomForestRegressor
Reason:
- Works well on structured/tabular data
- Handles non-linear relationships
- No need for feature scaling :contentReference[oaicite:2]{index=2}

6. Model Evaluation

The evaluation metric used is:

RMSLE (Root Mean Squared Log Error)

Why RMSLE?

Penalizes large percentage errors
Suitable for price prediction problems with wide value ranges :contentReference[oaicite:3]{index=3}

Other metrics:

MAE (Mean Absolute Error)
R² Score

Model Optimization

Used RandomizedSearchCV for hyperparameter tuning
Tuned parameters such as:
- n_estimators
- max_depth
- min_samples_split
- max_features

This improves model performance while keeping training time manageable.

Predictions

Preprocessed test data using the same pipeline
Matched feature columns with training data
Generated predictions using the trained model
Created submission file with:
- SalesID
- Predicted SalePrice

Contact

If you have any suggestions or feedback, feel free to connect!

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.vscode		.vscode
Bulldozer-price-regression.ipynb		Bulldozer-price-regression.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Bulldozer Price Regression (End-to-End ML Project)

Project Objective

Dataset

Key Features

Target Variable

Machine Learning Workflow

1. Problem Definition

2. Data Exploration

3. Feature Engineering

4. Data Preprocessing

5. Model Building

6. Model Evaluation

Model Optimization

Predictions

Contact

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Bulldozer Price Regression (End-to-End ML Project)

Project Objective

Dataset

Key Features

Target Variable

Machine Learning Workflow

1. Problem Definition

2. Data Exploration

3. Feature Engineering

4. Data Preprocessing

5. Model Building

6. Model Evaluation

Model Optimization

Predictions

Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages