Predict the House Prices - King County
We will be working on a dataset that has sales prices of houses in King County. As a data scientist, you are given a responsibility to create a machine learning model that would predict the sales price for each house in future based on certain input variables. The target variable in this dataset is 'price' and you are given a new unseen test dataset on which you will have to predict the price of each house.
Submissions are evaluated on Root-Mean-Squared-Error (RMSE) between the predicted value of your model and true value of sales price on the unseen new test dataset mentioned under submission guidelines below.
Submissions are evaluated using Root-Mean-Squared-Error (RMSE). How do we do it?
Once you generate and submit the target variable predictions on evaluation dataset, your submissions will be compared with the true values of the target variable.
The True or Actual values of the target variable are hidden on the DPhi Practice platform so that we can evaluate your model's performance on evaluation data. Finally, a Root-Mean-Squared-Error (RMSE) for your model will be generated and displayed
About the dataset
In this dataset the sales price of houses in King County (Seattle) are present. It includes homes sold between May 2014 and May 2015.
To load the dataset in your jupyter notebook, use the below command:
import pandas as pd house_data = pd.read_csv('https://raw.githubusercontent.com/dphi-official/Datasets/master/kc_house_data/kc_house_data.csv')
Before doing anything we should first know about the dataset what it contains what are its features and what is the structure of data.
- price: the price of the house. This is our target variable.
- bedrooms: Number of bedrooms
- bathrooms: Number of bathrooms
- sqft_living: Square footage of the house
- sqft_lot: Square footage of the lot
- floors: Number of floors/ Level
- waterfront: 1 = Waterfront view; 0 = No waterfront view
- view: 1 = House been viewed; 0 = House has not been viewed
- condition: 1 indicates worn-out property and 5 excellent
- grade: Overall grade given to the housing unit, based on the King County grading system. 1 poor,13 excellent
- sqft_above: Square footage of house apart from the basement
- sqft_below: Square footage of the basement
- yr_built: Year of the house built
- yr_renovated: Year of the house renovated
- zipcode: Zipcode
- lat: Latitude coordination
- long: Longitude coordination
- sqft_living15: Square footage of the house in 2015 (implies-- some renovations)
- sqft_lot15: Square footage of lot in 2015 (implies-- some renovations)
Load the evaluation dataset (name it as '
eval_data'). You can load the data using the below command.
eval_data = pd.read_csv('https://raw.githubusercontent.com/dphi-official/Datasets/master/kc_house_data/kc_house_new_test_data.csv')
Here the price column is deliberately not there as you need to predict it.
This dataset was downloaded from Kaggle.
To participate in this challenge either you have to create a team of atleast members or join some team