Datathon

Ended

Data Sprint #7: Bank Marketing

Predict if a Customer will subscribe the product or not

Medium

|

399 Submissions

Context

A bank is a financial institution licensed to receive deposits and make loans. Banks may also provide financial services such as wealth management, currency exchange, and safe deposit boxes. Marketing refers to activities a company undertakes to promote the buying or selling of a product or service. Marketing includes advertising, selling, and delivering products to consumers or other businesses. [source of information: Investopedia]

Image Courtesy: The Business Journals

Bank Marketing

Marketing of bank products refers to the various ways in which a bank can help a customer, such as operating accounts, making transfers, paying standing orders and selling foreign currency. Banking is the business activity of banks and similar institutions.


 

Objective

The marketing team of the bank has data related to direct marketing campaigns of the previous year. The marketing campaigns were based on phone calls. Often, more than one contact with the same client was required, in order to assess if the product would be ('yes') or not ('no') subscribed. The bank conducted a similar marketing campaign this year too and stored the data related to each phone call. 

Imagine you are hired as a Data Scientist in a bank in Portugal. The bank manager decided to take your help and understand whether a client would subscribe to the product or not. You are required to build a Machine Learning Model that would predict if a customer will subscribe to the product or not. Here the product is nothing but a term deposit

What is a term deposit?

A term deposit is a fixed-term investment that includes the deposit of money into an account at a financial institution. Term deposit investments usually carry short-term maturities ranging from one month to a few years and will have varying levels of required minimum deposits. [source of information: Investopedia]

Image source: total advice partners

Evaluation Criteria

Submissions are evaluated using F1 Score.  

How do we do it?

Once you generate and submit the target variable predictions on the testing dataset, your submissions will be compared with the true values of the target variable. 

The True or Actual values of the target variable are hidden on the DPhi platform so that we can evaluate your model's performance on unseen data. Finally, an F1 score for your model will be generated and displayed. 


Timeline

Start Date: 25th September 2020, 21:00 hours IST / 17:30 hours CET (please locate your time here)

End Date: 28th September 2020, 21:00 hours IST / 17:30 hours CET (please locate your time here)


 

Do you like to understand the problem through code?

Don't worry! Understand through code! Here is your getting started code


 

The baseline notebook is available here.


!! Some corrections have been made to the dataset, you can find the details here: https://discuss.dphi.tech/t/solve-data-sprint-7-challenge-dphi/1303/5?u=dphi_official

The data is related to the direct marketing campaigns of a Portuguese banking institution.

To load the training data in your jupyter notebook, use the below command:

import pandas as pd

bank_marketing_data  = pd.read_csv("https://raw.githubusercontent.com/dphi-official/Datasets/master/bank_marketing_data/training_set_label.csv" )


Data Description

# bank client data:

age: Age of the client

job: Type of job

marital: Marital status of the client

education: Highest education of the client

default: Has credit in default?

balance: The amount in the client’s bank account in the bank

housing: Whether the client has housing loan or not

loan: Whether the client has any personal loan or not

 

# related to the last contact of the current campaign:

contact: Contact communication type

month: Last contact month of the year

day_of_week: Last contact day of the week

duration: Last contact duration in seconds. Important note: this attribute highly affects the output target (e.g., if duration=0 then y='no'). Yet, the duration is not known before a call is performed. Also, after the end of the call y is obviously known. Thus, this input should only be included for benchmark purposes and should be discarded if the intention is to have a realistic predictive model.

 

# other attributes:

campaign: number of contacts performed during this campaign and for this client

pdays: number of days that passed by after the client was last contacted from a previous campaign

previous: number of contacts performed before this campaign and for this client

poutcome: outcome of the previous marketing campaign

 

# Output variable / Target variable:
subscribe:
we check whether the client has subscribed to the term deposit or not?
- 0 implies NOT subscribed
- 1 implies subscribed


Test Dataset

Load the test data (name it as test_data). You can load the data using the below command.

test_data = pd.read_csv('https://raw.githubusercontent.com/dphi-official/Datasets/master/bank_marketing_data/testing_set_label.csv)

Here the target column is deliberately not there as you need to predict it.



Saving Prediction File & Sample Submission

You can find more details on how to save a prediction file here: https://discuss.dphi.tech/t/how-to-submit-predictions/548

Sample submission: You should submit a CSV file with a header row and the sample submission can be found below.

prediction
0
1
1
0

Etc.

Note that the header name should prediction else it will through evaluation error
Acknowledgment

This data has been sourced from the UCI Machine Learning Repository.

loading...

You need to choose a submission file.

File Format

Your submission should be in CSV format.

Predictions

This file should have a header row called 'prediction'.
Please see the instructions to save a prediction file under the “Data” tab.

To participate in this challenge either you have to create a team of atleast None members or join some team