Skip to content

WenxiZhao126/Statistics-Project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Statistics-Project

UVA MSDS 2018 Fall 2018

Jiangxue Han (jh6rg), Shaoran Li (sl4bz), Jingyi Luo (jl6zh), Mengyao Zhang (mz6jv), Wenxi Zhao (wz8nx)

Overview:

This project serves the purpose of conducting analysis on features which drive housing price in King County of Washington State in United States and searching for the best statistical model to interpret the relationship among features and housing price, as well as making predictions of housing price in future. The dataset used in this study is obtained from web-based online data repository Kaggle. Both discrete and continuous variables will be addressed and massaged for better model development. Statistical techniques such as stepwise type procedures, various forms of transformations and influential point analysis will be exploited for model development and model adequacy validation.

Introduction

Housing price has been a trendy topic for machine learning and regression techniques for recent years. People might not necessarily associate an ideal home with features such as the square footage of basement or the latitude of the house; nevertheless, we will explore these features in this study and inspect their relevance with housing price closely.

Data Description

The dataset used in this study is obtained from the Kaggle data repository and it contains house sale prices for King County of Washington State in United States, including Seattle. This dataset collects housing prices from King County between May 2014 and May 2015, and the data entries are observational. There are 20 features including both quantitative and qualitative variables. The target response variable for prediction is housing price. Refer to (Appendix-Table.1) for a detailed description of these variables.

Objectives

The objective of the study is to indicate the most influential parameters that drive the housing price in King County. In this exercise, creative feature engineering and advanced regression techniques will be explored, namely stepwise-type procedures, various forms of transformation and influential point analysis. Patterns found in residual plots and influential points will be evaluated and addressed to gear towards a more adequate model. The model is evaluated on the root mean square error between our predictions and the true value. The expectation is that at least more than one feature will be of relevance to the housing price and we will build a statistical model with adequate measures of R-squared and minimum squared errors of residuals to project the housing price in King Country. Patterns that we extract from this dataset could potentially be extrapolated to future studies in other US areas that have similar features with King County, and be used for those housing price predictions.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages