Data Sprint #24: Tinder Millennial Match
Tinder is a casual dating site that allows users to make split-second decisions to determine if they like a potential match. The user swipes right on the profile to match the potential suitor. If the potential suitor also swipes right, a match is made and both parties are alerted.
Tinder is a massive phenomenon in the online dating world. Because of its vast user base, it potentially offers lots of data that is exciting to analyze.
We have collected a small dataset which explains the match rate of the individuals from different universities, and whether the app has helped them find a relationship.
You are required to build a machine learning model that would predict if an individual succeeded with a relationship or not (It became a successful relationship or not).
What you will learn?
- Exploratory Data Analysis (Learn it here)
- Supervised Learning Algorithms - Classification (Learn it here)
Submissions are evaluated using Accuracy Score.
How do we do it?
Once we release the data, anyone can download it, build a model, and make a submission. We give competitors a set of data (training data) with both the independent and dependent variables.
We also release another set of data (test dataset) with just the independent variables, and we hide the dependent variable that corresponds with this set. You submit the predicted values of the dependent variable for this set and we compare it against the actual values.
The predictions are evaluated based on the evaluation metric defined in the datathon.
About the Data
The dataset contains information about the match rate of the individuals from different universities, and whether the app (i.e. Tinder) has helped them find a relationship.
- ID : User id
- Segment type : Medium of Usage
- Segment Description : Name of Universities
- Answer : Do you use tinder ?
- Count : Number of Matches
- Percentage : % of matches (the value ranges from 0 to 1 as it is not multiplied with 100)
- It became a relationship : Success of relationship (Target)
To load the training data in your jupyter notebook, use the below command:
import pandas as pd
train_data = pd.read_csv("https://raw.githubusercontent.com/dphi-official/Datasets/master/Tinder_Millennial_Match/train_set_label.csv")
Saving Prediction File & Sample Submission
You can find more details on how to save a prediction file here: https://discuss.dphi.tech/t/how-to-submit-predictions/548
Sample submission: You should submit a CSV file with a header row and the sample submission can be found below
Note that the header name should be ‘prediction’ else it will throw an evaluation error. A sample submission file can be found here
Load the test data (name it as test_data). You can load the data using the below command.
test_data = pd.read_csv('https://raw.githubusercontent.com/dphi-official/Datasets/master/Tinder_Millennial_Match/test_set_label.csv')
Here the target column is deliberately not there as you need to predict it
Dataset by Adam Halper, Founder / CEO of Whatsgoodly, a millennial social polling company. Adam is an entrepreneur / app developer who studied HCI at Stanford. Technology has made people more lonely, and Adam wants to change that.
To participate in this challenge either you have to create a team of atleast None members or join some team