Housing Price Prediction Case Study

Author

Wil Jones, Savanna Whittaker, Mahonri Reynolds, Teghearnan Stevens, Kalina Warner

Published

May 30, 2025

Housing Price Prediction Case Study

Housing Price Prediction

This project focused on predicting home prices using machine learning models. The goal was to uncover the key features driving housing prices and to create a reliable model for real-world estimation.

Project Summary

This case study, conducted for Reddic Housing LLC, explored how location, property features, and engineered variables impact housing prices. Our final model blended XGBoost and Random Forest to maximize predictive accuracy.

Tools & Technologies

  • Python for scripting and modeling
  • Jupyter Notebook for interactive development
  • Pandas & NumPy for data cleaning and transformation
  • Scikit-learn and XGBoost for modeling
  • Matplotlib / Seaborn for visualizations
  • Quarto for documentation and report generation

Key Techniques Used

Feature Engineering
Added custom variables like luxury score, age, and view rating to enrich input data

Model Blending
Combined XGBoost and Random Forest regressors to improve stability and accuracy

Error Handling
Applied outlier smoothing and tested multiple metrics (RMSE, R², median error)

Location Awareness
Used latitude and longitude directly to proxy proximity to important amenities (parks, schools, etc.)

SQL Scenario Highlights

As part of the supporting analysis, we explored CSV data using filtering and logic conditions. Examples include:

  • Range Filtering: Only properties above/below certain price thresholds
  • Missing Data Checks: Identified records with missing view or year built values
  • Aggregation: Calculated average prices by region or view score
  • Join Simulation: Merged multiple CSV-derived frames for composite analysis

Results Summary

  • Median Error: $33,842.78
  • RMSE: $109,329.21
  • R² Score: 0.9108 (91% variance explained)

The combination of models and domain-based feature engineering led to strong performance on unseen test data.

Visuals


Figure 1. Predicted vs Actual Housing Prices


Figure 2. Most Influential Features

Supplementary Materials


View the full Colab Notebook for reproducible code and model walkthrough.