Data Sprint #23: Used Cars Prices
What determines the price of a used car?
The value of a car starts dropping right from the date it is bought. The depreciation continues with each passing year. In fact, in the first year, the value of the car drops by 20% of the price at which it was bought.
As per BankBazaar, the aforementioned values correspond to the rate of depreciation of the entire vehicle with age.
A car owner needs to be aware of the worth of his/her vehicle. This will be useful when one is selling the vehicle or when buying a suitable insurance cover for it. Having an accurate valuation of the car ensures that you get the best price for the vehicle.
Most car buyers appreciate honesty and are more likely to buy a car when they think that they would save some money through the deal. So, it is crucial to price a used car accurately by identifying a fair price for that model.
Being a Data Scientist, use the power of data science to calculate a fair price for a car.
Build a machine learning model to calculate a fair price for the given car.
What you will learn?
- Exploratory Data Analysis (Learn it here)
- Data Preparation (Learn it here)
- Natural Language Processing (Learn it here)
- Supervised Learning Algorithms - Regression (Learn it here)
Submissions are evaluated using the Root Mean Squared Error (RMSE).
How do we do it?
Once we release the data, anyone can download it, build a model, and make a submission. We give competitors a set of data (training data) with both the independent and dependent variables.
We also release another set of data (test dataset) with just the independent variables, and we hide the dependent variable that corresponds with this set. You submit the predicted values of the dependent variable for this set and we compare it against the actual values.
The predictions are evaluated based on the evaluation metric defined in the Datathon.
About the Data
The dataset contains all relevant information that Craigslist provides on car sales including columns like price, condition, manufacturer, and more. Some of the features are listed below:
- year: entry year of the car
- car_price: entry price of the car (the target variable)
- manufacturer: manufacturer of the vehicle
- model_name: the model of the car that is listed
- fuel_type: fuel type that the listed car support
- #cylinders: number of cylinders
- odometer: miles travelled by vehicle
- title_status: title status of the vehicle
- vin: vehicle identification number
- drive: type of drive of the vehicle
To load the training data in your jupyter notebook, use the below command:
import pandas as pd
train_data = pd.read_csv("https://raw.githubusercontent.com/dphi-official/Datasets/master/used_car_price/train_set_label.csv")
Saving Prediction File & Sample Submission:
You can find more details on how to save a prediction file here: https://discuss.dphi.tech/t/how-to-submit-predictions/548
Sample submission: You should submit a CSV file with a header row and the sample submission can be found below
Note that the header name should be ‘prediction’ else it will throw an evaluation error. A sample submission file can be found here
Load the test data (name it as test_data). You can load the data using the below command.
test_data = pd.read_csv('https://raw.githubusercontent.com/dphi-official/Datasets/master/used_car_price/test_set_label.csv')
Here the target column is deliberately not there as you need to predict it
We would like to thank Austin Reese for providing us this dataset. He is a Software Engineer at Advanced Wireless Communications.
To participate in this challenge either you have to create a team of atleast None members or join some team