Data Sprint #1 - Census Income
What is Census?
The United Nations defines a population census as the total process of collecting, compiling, and publishing demographic, economic, and social data pertaining to a specific time to all persons in a country or delimited part of a country
There are many variables to consider when doing a population census. This dataset contains the grouped and collected census data from the 1994 and 1995 population surveys conducted by the US Census Bureau.
Your task is to build machine learning models to predict the income level (target variable) of the related collaborators in the evaluation set, being 0 a collaborator who has an income less than 50,000 USD annually, and 1 a collaborator who has an income equal to or greater than 50,000 USD annually.
Submissions are evaluated using F1 Score. How do we do it?
Once you generate and submit the target variable predictions on evaluation dataset, your submissions will be compared with the true values of the target variable.
The True or Actual values of the target variable are hidden on the DPhi platform so that we can evaluate your model's performance on unseen data. Finally, an F1 score for your model will be generated and displayed.
Start Date: 7th August 2020, 21:00 hours IST / 17:30 hours CET
End Date: 10th August 2020, 21:00 hours IST / 17:30 hours CET (please locate your time here)
About the dataset
This database contains 41 attributes. The target variable refers to the income level, being 0 a collaborator who has an income less than 50,000 USD annually, and 1 a collaborator who has an income equal to or greater than 50,000 USD annually.
To load the training data in your jupyter notebook, use the below command:
import pandas as pd census_data = pd.read_csv("https://raw.githubusercontent.com/dphi-official/Datasets/master/Census_Income/Training_set_census.csv" )
Load the evaluation data (name it as census_eval). You can load the data using the below command.
census_eval = pd.read_csv('https://raw.githubusercontent.com/dphi-official/Datasets/master/Census_Income/Testing_set_census.csv')
Here the target column is deliberately not there as you need to predict it.
This dataset is adapted from:
Ronny Kohavi and Barry Becker. Data Mining and Visualization. Silicon Graphics. 2019. Available at: UCI Machine Learning Repository. Irvine, CA: University of California, School of Information and Computer Science
To participate in this challenge either you have to create a team of atleast None members or join some team