Datathon

Ended

Data Sprint #24: Tinder Millennial Match

Predict if the relationship succeeded or not

Medium

|

754 Submissions

Problem Statement

Tinder is a casual dating site that allows users to make split-second decisions to determine if they like a potential match. The user swipes right on the profile to match the potential suitor. If the potential suitor also swipes right, a match is made and both parties are alerted.

Tinder is a massive phenomenon in the online dating world. Because of its vast user base, it potentially offers lots of data that is exciting to analyze. 

We have collected a small dataset which explains the match rate of the individuals from different universities, and whether the app has helped them find a relationship.


Objective

You are required to build a machine learning model that would predict if an individual succeeded with a relationship or not (It became a successful relationship or not).


What you will learn?
  • Exploratory Data Analysis (Learn it here)
  • Supervised Learning Algorithms - Classification (Learn it here)

Evaluation Criteria

Submissions are evaluated using Accuracy Score.

How do we do it? 

Once we release the data, anyone can download it, build a model, and make a submission. We give competitors a set of data (training data) with both the independent and dependent variables. 

We also release another set of data (test dataset) with just the independent variables, and we hide the dependent variable that corresponds with this set. You submit the predicted values of the dependent variable for this set and we compare it against the actual values. 

The predictions are evaluated based on the evaluation metric defined in the datathon.


 

The baseline notebook is available here.

About the Data

The dataset contains information about  the match rate of the individuals from different universities, and whether the app (i.e. Tinder) has helped them find a relationship.

Data Description

  1. ID : User id
  2. Segment type : Medium of Usage
  3. Segment Description : Name of Universities
  4. Answer : Do you use tinder ?
  5. Count : Number of Matches
  6. Percentage : % of matches (the value ranges from 0 to 1 as it is not multiplied with 100)
  7. It became a relationship : Success of relationship (Target)

To load the training data in your jupyter notebook, use the below command:

import pandas as pd

train_data = pd.read_csv("https://raw.githubusercontent.com/dphi-official/Datasets/master/Tinder_Millennial_Match/train_set_label.csv")


Saving Prediction File & Sample Submission

You can find more details on how to save a prediction file here: https://discuss.dphi.tech/t/how-to-submit-predictions/548

Sample submission: You should submit a CSV file with a header row and the sample submission can be found below

prediction

1

1

1

0

1

0

.

.

Etc.

Note that the header name should be ‘prediction’ else it will throw an evaluation error. A sample submission file can be found here


Test Dataset

Load the test data (name it as test_data). You can load the data using the below command.

test_data = pd.read_csv('https://raw.githubusercontent.com/dphi-official/Datasets/master/Tinder_Millennial_Match/test_set_label.csv')

Here the target column is deliberately not there as you need to predict it


Acknowledgement

Dataset by Adam Halper, Founder / CEO of Whatsgoodly, a millennial social polling company. Adam is an entrepreneur / app developer who studied HCI at Stanford. Technology has made people more lonely, and Adam wants to change that.

License: CC-BY-SA

 

loading...

You need to choose a submission file.

File Format

Your submission should be in CSV format.

Predictions

This file should have a header row called 'prediction'.
Please see the instructions to save a prediction file under the “Data” tab.

To participate in this challenge either you have to create a team of atleast None members or join some team