!! Some corrections have been made to the dataset, you can find the details here: https://discuss.dphi.tech/t/solve-data-sprint-7-challenge-dphi/1303/5?u=dphi_official

The data is related to the direct marketing campaigns of a Portuguese banking institution.

To load the training data in your jupyter notebook, use the below command:

import pandas as pd

bank_marketing_data = pd.read_csv("https://raw.githubusercontent.com/dphi-official/Datasets/master/bank_marketing_data/training_set_label.csv" )

Data Description

# bank client data:

age: Age of the client

job: Type of job

marital: Marital status of the client

education: Highest education of the client

default: Has credit in default?

balance: The amount in the client’s bank account in the bank

housing: Whether the client has housing loan or not

loan: Whether the client has any personal loan or not

# related to the last contact of the current campaign:

contact: Contact communication type

month: Last contact month of the year

day_of_week: Last contact day of the week

duration: Last contact duration in seconds. Important note: this attribute highly affects the output target (e.g., if duration=0 then y='no'). Yet, the duration is not known before a call is performed. Also, after the end of the call y is obviously known. Thus, this input should only be included for benchmark purposes and should be discarded if the intention is to have a realistic predictive model.

# other attributes:

campaign: number of contacts performed during this campaign and for this client

pdays: number of days that passed by after the client was last contacted from a previous campaign

previous: number of contacts performed before this campaign and for this client

poutcome: outcome of the previous marketing campaign

# Output variable / Target variable:
subscribe: we check whether the client has subscribed to the term deposit or not?
- 0 implies NOT subscribed
- 1 implies subscribed

Test Dataset

Load the test data (name it as test_data). You can load the data using the below command.

test_data = pd.read_csv('https://raw.githubusercontent.com/dphi-official/Datasets/master/bank_marketing_data/testing_set_label.csv)

Here the target column is deliberately not there as you need to predict it.

Saving Prediction File & Sample Submission

You can find more details on how to save a prediction file here: https://discuss.dphi.tech/t/how-to-submit-predictions/548

Sample submission: You should submit a CSV file with a header row and the sample submission can be found below.

prediction 0 1 1 0
Etc.

Note that the header name should `prediction` else it will through evaluation error

Acknowledgment

This data has been sourced from the UCI Machine Learning Repository.

Data Sprint #7: Bank Marketing

Challenge Starts

Registration Ends

Challenge Ends