Datathon

Ended

Data Sprint #9: Credit Risk

Predict if a loan application should be rejected or approved

Medium

|

570 Submissions

Context

Credit risks refer to the risks of loss on a debt that occurs when the borrower fails to repay the principal and related interest amounts of a loan back to the lender on due dates.

When a bank receives a loan application, based on the applicant’s profile the bank has to make a decision for its approval or rejection. There are two types of risks associated with this decision:

  • If the applicant has good credit risk, i.e. is likely to repay the loan, then rejecting the loan results in a loss to the bank
  • If the applicant has bad credit risk, i.e. is unlikely to repay the loan, then approving the loan results in a loss to the bank

It may be assumed that the second risk is a greater risk, as the bank (or any other institution lending the money to an untrustworthy party) had a higher chance of not being paid back the borrowed amount.

So it's on the part of the bank or other lending authority to evaluate the risks associated with lending money to a customer.


Problem Statement

Imagine a bank in your locality. The bank has realized that applying data science methodologies can help them focus their resources efficiently, make smarter decisions on credit risk calculations, and improve performance.

Earlier they used to check the credit risk of the loan applicants manually by analyzing their bank-related data, which used to take months of time. But this time they want a smart data scientist who can automate this process.


Objective

You are required to build a machine learning model that helps you predict the credit risk of the loan applicants.


Evaluation Criteria

Submissions are evaluated using Accuracy Score.

How do we do it? 

Once you generate and submit the target variable predictions on the test dataset, your submissions will be compared with the true values of the target variable. 

The True or Actual values of the target variable are hidden on the DPhi platform so that we can evaluate your model's performance on unseen data. Finally, an accuracy score for your model will be generated and displayed


Timeline

Start Date: 9th October 2020, 21:00 hours IST / 17:30 hours CET (please locate your time here)

End Date: 12th October 2020, 21:00 hours IST / 17:30 hours CET (please locate your time here)


Problem Setters: Nisrin Dhoondia, Manish KC


The baseline notebook is available here.

About the Data

This dataset classifies loan applicants described by a set of attributes as good or bad credit risks.

To load the training data in your jupyter notebook, use the below command:

import pandas as pd

audit_data  = pd.read_csv("https://raw.githubusercontent.com/dphi-official/Datasets/master/credit_risk/training_set_labels.csv" )


Data Description

There are 20 attributes in the dataset. Some of them are mentioned below:

  • checking_status: Status of the existing checking account
  • duration: Duration in month
  • credit_history: Credit history of the applicant
  • purpose: Purpose of taking the earlier loans
  • employment: Present employment since
  • installment_commitment: Installment rate in percentage of disposable income
  • personal_status: Personal status and sex
  • other_parties: Other debtors/guarantors
  • residence_since: Present residence since
  • other_payment_plans: Other installment plans
  • existing_credits: Number of existing credits at this bank
  • class: The target variable(good, bad)

Saving Prediction File & Sample Submission

You can find more details on how to save a prediction file here: https://discuss.dphi.tech/t/how-to-submit-predictions/548

Sample submission: You should submit a CSV file with a header row and the sample submission can be found below.

prediction

good

good

bad

good

Etc.

Note that the header name should prediction else it will through evaluation error

Test Dataset

Load the test data (name it as test_data). You can load the data using the below command.

test_data = pd.read_csv('https://raw.githubusercontent.com/dphi-official/Datasets/master/credit_risk/testing_set_labels.csv')

Here the target column is deliberately not there as you need to predict it


Acknowledgement

This data has been sourced from the UCI Machine Learning Repository.

loading...

You need to choose a submission file.

File Format

Your submission should be in CSV format.

Predictions

This file should have a header row called 'prediction'.
Please see the instructions to save a prediction file under the “Data” tab.

To participate in this challenge either you have to create a team of atleast None members or join some team