Context and objective
We will be working with a dataset that relates the height and weight of 10,000 individuals. Your task as a data scientist is to build machine learning models to predict the value of weight based on the height of individuals. Can you model this relationship?
Once you generate and submit the target variable predictions on evaluation dataset, your submissions will be compared with the true values of the target variable.
The True or Actual values of the target variable are hidden on the DPhi Practice platform so that we can evaluate your model's performance on unseen data. Finally, a Root-Mean-Squared-Error (RMSE) score for your model will be generated and displayed.
About the dataset
This database contains two attributes. The target variable refers to the weights of the sample individuals.
To load the training data in your jupyter notebook, use the below command:
import pandas as pd heights_data = pd.read_csv("https://raw.githubusercontent.com/dphi-official/Datasets/master/Weights_Heights_Predict_Weights/Training_set_weights.csv
- Height(Inches): Height of the individuals in inches
- Weight(Pounds): Weight of the individuals in pounds
Load the evaluation data (name it as
weights_eval). You can load the data using the below command.
weights_eval = pd.read_csv('https://raw.githubusercontent.com/dphi-official/Datasets/master/Weights_Heights_Predict_Weights/Testing_set_weights.csv
target column is deliberately not there as you need to predict it.
This dataset is adapted from:
University of California, Los Angeles; Statistics Online Computational Resource (SOCR). The comlete Human Weight/Height Dataset. 2008. Available at: http://socr.ucla.edu/docs/resources/SOCR_Data/SOCR_Data_Dinov_020108_HeightsWeights.html.
To participate in this challenge either you have to create a team of atleast members or join some team