About the dataset

This database contains 14 attributes. The target variable refers to the median value of owner-occupied homes in 1000 USD's.

To load the training data in your jupyter notebook, use the below command:

import pandas as pd
boston_data  = pd.read_csv("https://raw.githubusercontent.com/dphi-official/Datasets/master/Boston_Housing/Training_set_boston.csv" )

Data Description

CRIM: per capita crime rate by town
ZN: proportion of residential land zoned for lots over 25,000 sq.ft.
INDUS: proportion of non-retail business acres per town
CHAS: Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)
NOX: nitric oxides concentration (parts per 10 million)
RM: average number of rooms per dwelling
AGE: proportion of owner-occupied units built prior to 1940
DIS: weighted distances to five Boston employment centres
RAD: index of accessibility to radial highways
TAX: full-value property-tax rate per 10,000 USD
PTRATIO: pupil-teacher ratio by town
B: 1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town
LSTAT: lower status of the population (%)
MEDV: Median value of owner-occupied homes in 1000 USD's

Evaluation Dataset

Load the evaluation data (name it as 'eval_data'). You can load the data using the below command.

eval_data = pd.read_csv('https://raw.githubusercontent.com/dphi-official/Datasets/master/Boston_Housing/Testing_set_boston.csv')

Here the target column is deliberately not there as you need to predict it.

Reference

This dataset is adapted from:

Harrison, David; Rubinfeld, Daniel. Hedonic housing prices and the demand for clean air. Journal of Environmental Economics and Management. Volume 5, Issue 1, March 1978, Pages 81-102. Available at Carnagie Mellon University, Statistics and Data Science: http://lib.stat.cmu.edu/datasets/boston.

Boston Housing

Registration Ends

Challenge Ends

Challenge Starts