Datathon

Ended

Data Sprint #1 - Census Income

Predict Income Level

Medium

|

253 Submissions

Context

What is Census?

The United Nations defines a population census as the total process of collecting, compiling, and publishing demographic, economic, and social data pertaining to a specific time to all persons in a country or delimited part of a country

Reference: The New Yorker

 

There are many variables to consider when doing a population census. This dataset contains the grouped and collected census data from the 1994 and 1995 population surveys conducted by the US Census Bureau.


Objective

Your task is to build machine learning models to predict the income level (target variable) of the related collaborators in the evaluation set, being 0 a collaborator who has an income less than 50,000 USD annually, and 1 a collaborator who has an income equal to or greater than 50,000 USD annually.


Evaluation Criteria

Submissions are evaluated using F1 Score. How do we do it? 

Once you generate and submit the target variable predictions on evaluation dataset, your submissions will be compared with the true values of the target variable. 

The True or Actual values of the target variable are hidden on the DPhi platform so that we can evaluate your model's performance on unseen data. Finally, an F1 score for your model will be generated and displayed.


Timeline

Start Date: 7th August 2020, 21:00 hours IST / 17:30 hours CET 

End Date: 10th August 2020, 21:00 hours IST / 17:30 hours CET (please locate your time here)

 

 

About the dataset

This database contains 41 attributes. The target variable refers to the income level, being 0 a collaborator who has an income less than 50,000 USD annually, and 1 a collaborator who has an income equal to or greater than 50,000 USD annually.

To load the training data in your jupyter notebook, use the below command:

import pandas as pd

census_data  = pd.read_csv("https://raw.githubusercontent.com/dphi-official/Datasets/master/Census_Income/Training_set_census.csv" )

Evaluation Dataset

Load the evaluation data (name it as census_eval). You can load the data using the below command.

census_eval = pd.read_csv('https://raw.githubusercontent.com/dphi-official/Datasets/master/Census_Income/Testing_set_census.csv')

Here the target column is deliberately not there as you need to predict it.


References

This dataset is adapted from:

Ronny Kohavi and Barry Becker. Data Mining and Visualization. Silicon Graphics. 2019. Available at: UCI Machine Learning Repository. Irvine, CA: University of California, School of Information and Computer Science

 

loading...

You need to choose a submission file.

File Format

Your submission should be in CSV format.

Predictions

This file should have a header row called 'prediction'.
Please see the instructions to save a prediction file under the “Data” tab.

To participate in this challenge either you have to create a team of atleast None members or join some team