Data Sprint #9: Credit Risk
Credit risks refer to the risks of loss on a debt that occurs when the borrower fails to repay the principal and related interest amounts of a loan back to the lender on due dates.
When a bank receives a loan application, based on the applicant’s profile the bank has to make a decision for its approval or rejection. There are two types of risks associated with this decision:
- If the applicant has good credit risk, i.e. is likely to repay the loan, then rejecting the loan results in a loss to the bank
- If the applicant has bad credit risk, i.e. is unlikely to repay the loan, then approving the loan results in a loss to the bank
It may be assumed that the second risk is a greater risk, as the bank (or any other institution lending the money to an untrustworthy party) had a higher chance of not being paid back the borrowed amount.
So it's on the part of the bank or other lending authority to evaluate the risks associated with lending money to a customer.
Imagine a bank in your locality. The bank has realized that applying data science methodologies can help them focus their resources efficiently, make smarter decisions on credit risk calculations, and improve performance.
Earlier they used to check the credit risk of the loan applicants manually by analyzing their bank-related data, which used to take months of time. But this time they want a smart data scientist who can automate this process.
You are required to build a machine learning model that helps you predict the credit risk of the loan applicants.
Submissions are evaluated using Accuracy Score.
How do we do it?
Once you generate and submit the target variable predictions on the test dataset, your submissions will be compared with the true values of the target variable.
The True or Actual values of the target variable are hidden on the DPhi platform so that we can evaluate your model's performance on unseen data. Finally, an accuracy score for your model will be generated and displayed
Start Date: 9th October 2020, 21:00 hours IST / 17:30 hours CET (please locate your time here)
End Date: 12th October 2020, 21:00 hours IST / 17:30 hours CET (please locate your time here)
About the Data
This dataset classifies loan applicants described by a set of attributes as good or bad credit risks.
To load the training data in your jupyter notebook, use the below command:
import pandas as pd
audit_data = pd.read_csv("https://raw.githubusercontent.com/dphi-official/Datasets/master/credit_risk/training_set_labels.csv" )
There are 20 attributes in the dataset. Some of them are mentioned below:
- checking_status: Status of the existing checking account
- duration: Duration in month
- credit_history: Credit history of the applicant
- purpose: Purpose of taking the earlier loans
- employment: Present employment since
- installment_commitment: Installment rate in percentage of disposable income
- personal_status: Personal status and sex
- other_parties: Other debtors/guarantors
- residence_since: Present residence since
- other_payment_plans: Other installment plans
- existing_credits: Number of existing credits at this bank
- class: The target variable(good, bad)
Saving Prediction File & Sample Submission
You can find more details on how to save a prediction file here: https://discuss.dphi.tech/t/how-to-submit-predictions/548
Sample submission: You should submit a CSV file with a header row and the sample submission can be found below.
Note that the header name should prediction else it will through evaluation error
Load the test data (name it as test_data). You can load the data using the below command.
test_data = pd.read_csv('https://raw.githubusercontent.com/dphi-official/Datasets/master/credit_risk/testing_set_labels.csv')
Here the target column is deliberately not there as you need to predict it
This data has been sourced from the UCI Machine Learning Repository.
To participate in this challenge either you have to create a team of atleast None members or join some team