Skip to content

pranjalisr/BulldozerPricePrediction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 

Repository files navigation

Bulldozer Price Regression (End-to-End ML Project)

This project is an end-to-end machine learning workflow to predict the sale price of bulldozers using historical auction data.

It is based on the Bluebook for Bulldozers Kaggle competition and follows a complete ML pipeline — from data preprocessing to model evaluation and prediction.

Project Objective

The goal of this project is:

To predict the future sale price of a bulldozer based on its characteristics and past sales data.

Since the output is a continuous value, this is a regression problem.
Additionally, because the data includes time-based features (sale dates), it also involves time series forecasting concepts. :contentReference[oaicite:0]{index=0}


Dataset

The dataset comes from the Kaggle competition and consists of:

  • Train.csv → Historical data up to 2011
  • Valid.csv → Validation data (Jan–Apr 2012)
  • Test.csv → Test data (May–Nov 2012, without target) :contentReference[oaicite:1]{index=1}

Key Features

  • Machine specifications (ModelID, ProductSize, etc.)
  • Usage data (MachineHoursCurrentMeter)
  • Sale information (state, auctioneerID)
  • Time-based feature (saledate)

Target Variable

  • SalePrice

Machine Learning Workflow

This project follows a structured ML pipeline:

1. Problem Definition

Predict bulldozer sale price using historical data.

2. Data Exploration

  • Load dataset with Pandas
  • Understand structure, missing values, and data types
  • Identify key features

3. Feature Engineering

  • Convert saledate into:
    • Year
    • Month
    • Day
    • Day of week
  • Handle missing values:
    • Numerical → filled with median
    • Categorical → converted to numerical codes
  • Add missing-value indicator columns

4. Data Preprocessing

  • Convert categorical variables into numeric format
  • Ensure training and test data have the same feature structure

5. Model Building

  • Model used: RandomForestRegressor
  • Reason:
    • Works well on structured/tabular data
    • Handles non-linear relationships
    • No need for feature scaling :contentReference[oaicite:2]{index=2}

6. Model Evaluation

The evaluation metric used is:

  • RMSLE (Root Mean Squared Log Error)

Why RMSLE?

  • Penalizes large percentage errors
  • Suitable for price prediction problems with wide value ranges :contentReference[oaicite:3]{index=3}

Other metrics:

  • MAE (Mean Absolute Error)
  • R² Score

Model Optimization

  • Used RandomizedSearchCV for hyperparameter tuning
  • Tuned parameters such as:
    • n_estimators
    • max_depth
    • min_samples_split
    • max_features

This improves model performance while keeping training time manageable.


Predictions

  • Preprocessed test data using the same pipeline
  • Matched feature columns with training data
  • Generated predictions using the trained model
  • Created submission file with:
    • SalesID
    • Predicted SalePrice

Contact

If you have any suggestions or feedback, feel free to connect!


About

Bulldozer Price Regression using ML

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors