Student: Anusara Ranasinghe
Index Number: ADAI2401014
Qualification: DLC Higher Diploma in AI and Data Science
Module: AID232 - Machine Learning
Lecturer: Mr. Ravidu Bandara
Ames Housing Dataset — Dean De Cock (2011)
Source: https://www.kaggle.com/datasets/prevek18/ames-housing-dataset
Records: 2,930 residential sales from Ames, Iowa (2006-2010)
Features: 80 physical and locational attributes
Supervised regression task to predict the sale price of residential houses using 80 physical and locational features. The dataset contains 2,930 records of residential sales from Ames, Iowa (2006-2010).
Task (T): Predict sale price (continuous USD value)
Experience (E): 2,930 labeled house sale records with 80 features
Performance (P): RMSE and R-squared on a held-out 20% test set
Primary model: Lasso Regression (L1 Regularisation)
Compared against: Linear Regression, Ridge Regression (L2), Polynomial Regression (degree=2)
Selection rationale: Lasso provides automatic feature selection by driving weak feature coefficients to zero, which is particularly effective with the 200+ one-hot encoded features in this dataset. It prevents overfitting while maintaining interpretability.
Why not other algorithms?
- Logistic Regression: binary classification only, cannot predict continuous price
- SVM: only classification variant was covered in the module
- Naive Bayes: classification algorithm, incompatible with regression output
-
Open project in PyCharm Professional (or any Python IDE)
-
Install dependencies:
pip install pandas numpy matplotlib seaborn scikit-learn scipy
-
Place
AmesHousing.csvinside thedata/folder -
Run the main script:
python src/model.py
Or in PyCharm: Right-click src/model.py -> Run
- To explore step-by-step notebooks:
Open
notebooks/folder, run cells in order:01_EDA.ipynb— Exploratory Data Analysis02_Preprocessing.ipynb— Data cleaning and preparation03_ModelBuilding.ipynb— Train and tune all 4 models04_Evaluation.ipynb— Results visualisation and comparison
HousePricePrediction/
├── data/
│ └── AmesHousing.csv
├── notebooks/
│ ├── 01_EDA.ipynb
│ ├── 02_Preprocessing.ipynb
│ ├── 03_ModelBuilding.ipynb
│ └── 04_Evaluation.ipynb
├── src/
│ └── model.py
├── outputs/ (generated charts)
├── presentation/
│ └── slides.pptx
└── README.md