Data Sprint #7: Bank Marketing
A bank is a financial institution licensed to receive deposits and make loans. Banks may also provide financial services such as wealth management, currency exchange, and safe deposit boxes. Marketing refers to activities a company undertakes to promote the buying or selling of a product or service. Marketing includes advertising, selling, and delivering products to consumers or other businesses. [source of information: Investopedia]
Marketing of bank products refers to the various ways in which a bank can help a customer, such as operating accounts, making transfers, paying standing orders and selling foreign currency. Banking is the business activity of banks and similar institutions.
The marketing team of the bank has data related to direct marketing campaigns of the previous year. The marketing campaigns were based on phone calls. Often, more than one contact with the same client was required, in order to assess if the product would be ('yes') or not ('no') subscribed. The bank conducted a similar marketing campaign this year too and stored the data related to each phone call.
Imagine you are hired as a Data Scientist in a bank in Portugal. The bank manager decided to take your help and understand whether a client would subscribe to the product or not. You are required to build a Machine Learning Model that would predict if a customer will subscribe to the product or not. Here the product is nothing but a term deposit.
What is a term deposit?
A term deposit is a fixed-term investment that includes the deposit of money into an account at a financial institution. Term deposit investments usually carry short-term maturities ranging from one month to a few years and will have varying levels of required minimum deposits. [source of information: Investopedia]
Submissions are evaluated using F1 Score.
How do we do it?
Once you generate and submit the target variable predictions on the testing dataset, your submissions will be compared with the true values of the target variable.
The True or Actual values of the target variable are hidden on the DPhi platform so that we can evaluate your model's performance on unseen data. Finally, an F1 score for your model will be generated and displayed.
Start Date: 25th September 2020, 21:00 hours IST / 17:30 hours CET (please locate your time here)
End Date: 28th September 2020, 21:00 hours IST / 17:30 hours CET (please locate your time here)
Do you like to understand the problem through code?
Don't worry! Understand through code! Here is your getting started code
!! Some corrections have been made to the dataset, you can find the details here: https://discuss.dphi.tech/t/solve-data-sprint-7-challenge-dphi/1303/5?u=dphi_official
The data is related to the direct marketing campaigns of a Portuguese banking institution.
To load the training data in your jupyter notebook, use the below command:
import pandas as pd
bank_marketing_data = pd.read_csv("https://raw.githubusercontent.com/dphi-official/Datasets/master/bank_marketing_data/training_set_label.csv" )
# bank client data:
age: Age of the client
job: Type of job
marital: Marital status of the client
education: Highest education of the client
default: Has credit in default?
balance: The amount in the client’s bank account in the bank
housing: Whether the client has housing loan or not
loan: Whether the client has any personal loan or not
# related to the last contact of the current campaign:
contact: Contact communication type
month: Last contact month of the year
day_of_week: Last contact day of the week
duration: Last contact duration in seconds. Important note: this attribute highly affects the output target (e.g., if duration=0 then y='no'). Yet, the duration is not known before a call is performed. Also, after the end of the call y is obviously known. Thus, this input should only be included for benchmark purposes and should be discarded if the intention is to have a realistic predictive model.
# other attributes:
campaign: number of contacts performed during this campaign and for this client
pdays: number of days that passed by after the client was last contacted from a previous campaign
previous: number of contacts performed before this campaign and for this client
poutcome: outcome of the previous marketing campaign
# Output variable / Target variable:
subscribe: we check whether the client has subscribed to the term deposit or not?
- 0 implies NOT subscribed
- 1 implies subscribed
Load the test data (name it as test_data). You can load the data using the below command.
Here the target column is deliberately not there as you need to predict it.
Saving Prediction File & Sample Submission
You can find more details on how to save a prediction file here: https://discuss.dphi.tech/t/how-to-submit-predictions/548
Sample submission: You should submit a CSV file with a header row and the sample submission can be found below.
Note that the header name should
prediction else it will through evaluation error
This data has been sourced from the UCI Machine Learning Repository.
To participate in this challenge either you have to create a team of atleast None members or join some team