# Introduction

Borrowing is a key service for people in society, and access to it can significantly affect one's economic opportunities. Therefore, great care must be put when designing such a decision mechanism as it can greatly impact the well-being of people benefitting or being rejected from it.

Credit Scoring is about estimating the creditworthiness of a borrower to inform a lending decision. Historically, these decisions were made using quantitative and qualitative data to assess the borrower's future performance. Statistical techniques appear to have removed human-biased judgment in making these decisions because these quantitative measures seem more objective. However, how data is processed and models are designed can also create or perpetuate bias that would continue to affect some groups of people.

The concepts described below come from the Responsible AI framework developed by Dataiku.

# Data

The data used to build credit risk models traditionally comes from three sources:

- The applicant's form information: which contains declarative fields input by the applicant and the characteristics of the loan requested.
- The applicant's historical data from the bank's system if the applicant is a current customer: behavioral data about the customer's balances on previous credit products and other additional information.
- The applicant's scores and historical data from credit bureaux: the applicant's credit balances with other banks.

The analyst should engage into thorough data quality checks before starting the modeling. Indeed, data comes from multiple systems and contains some declarative fields that might be filled incorrectly, so it is important to run some sanity checks to avoid including erroneous data points in the analysis. There is also a step of outlier and fraud detection before the analysis takes place to remove the most abnormal observations. Once the data has reached a quality threshold and is reliable enough, the other considerations below come into play.

## Privacy

Multiple privacy laws depending on the location, state how sensitive private information should be handled:
- [Fair Credit Reporting Act](https://en.wikipedia.org/wiki/Fair_Credit_Reporting_Act) in the US.
- [GDPR](https://en.wikipedia.org/wiki/General_Data_Protection_Regulation) in the EU.

An approach to handle the sensitive information would be to first flag the sensitive dataset and remove or anonymize the identifying columns to create a dataset with less risk of privacy leakage.

## Bias

Bias in the way credit risk is modeled is also addressed by legislation:
- [US Equal Credit Opportunities Act](https://en.wikipedia.org/wiki/Equal_Credit_Opportunity_Act) in the US.
- EU Directives regarding equal opportunity.

Some obvious sensitive variables cannot be used in these models:
 - Race
 - Religion
 - Sex
 - Marital Status
 - Disability
 
An applicant's historic banking data will also be a product of social discrimination so it is important to measure imbalance across attributes in the dataset. This means looking at whether or not there is a significant variation in the target variable across sensitive groups. One way to do this would be by a chi-square test to check for expected values. Examples of these tests are available [here](article:31). In the case of severe imbalance, row level weights could be used based on the methodology available in packages like [Fairlearn](https://fairlearn.org/) or [AI360 Fairness](https://ai-fairness-360.org/).
 
### Proxy Analysis
 
However, other variables can act as proxies of these sensitive variables. Statistical tools can help find these proxies and take action to reduce the bias from the data. Information value statistics are computed against gender to check if some variables are closely related to this sensitive variable.

![Information Value Sensitive Variable.png](PPQkD7R5TSdI)

One of the top variables that is included in the model (occupation_type) is also a variable closely related to gender. Therefore, this variable should be handled carefully to check if its use does not introduce a bias in the model.

### Reject Inference

Reject Inference is the acknowledgment of the fact that the data used to estimate credit risk is only made of previously accepted applications because these are the only ones for which credit performance can be measured. Therefore, those groups of applicants who have never been granted credit will remain ignored in the analysis, and bias might persist against them. Furthermore, some of these groups might have good credit performance but were never tested for it, so considering these rejected could also bring some additional credit-worthy customers. We will discuss the different techniques for reject inference in a dedicated [article](article:11).

# Model

After multiple steps of feature selection and engineering, the model is trained. The trained model can be checked for bias concerning the sensitive variables. The chosen model is a logistic regression with no regularization, therefore, no meta-parameter optimization is taking place, but different feature handlings can be compared. Some iterations between the model fitting and the feature engineering can happen to find the best set of features. However, the analyst should be cautious not to overfit the data by going back and forth between the features and the models. 

The cost matrix allows the definition of an objective function that reflects the business objective of the model. In particular, in this context of taking credit decisions:

- a True Positive yields a positive revenue equal to the average loan revenue.
- a True Negative does not yield any revenue.
- a False Positive incurs a loss that could be estimated as the average loss given default, which is greater than the average loan revenue.
- a False Negative incurs an opportunity cost that amounts to the average loan revenue.

The metric output from the cost matrix is only a rough estimation, other considerations will come into play to take the credit decision. The final model is evaluated against a holdout dataset to ensure that the performance will remain stable out of the sample.

As seen above, the data contains more observations from the Male gender, whereas there is no obvious reason why this bias would exist. To make sure the model takes equally into account Males and Females, the choice can be made to overweight Female data points and underweight Male data points so that they contribute to the model in the same proportion.

## Subpopulation Analysis

![RAI Subpopulation Analysis.png](Njk0sgBT32f9)

The subpopulation analysis view compares some metrics about the model by subgroups of the overall number of observations. This breakdown enables us to check if the model performs better or worse according to the metric chosen by the analyst. In case there is an explicit need to align these metrics for subpopulation groups, some remediation techniques exist that will affect the output of the model, bearing in mind that it will affect the overall performance of the model.

# Report

The [dashboard](article:29) linked with the project contains a summary of the outputs of the modeling process. They can be used to report on the model and the analysis of the data that was used and can be further complemented by more custom visualizations. The [webapp](article:16) also serves as a reporting tool, which enables the management teams to understand how the scorecard works and gives a comprehensive view of its structure.

# Resources

- https://blog.dataiku.com/walk-the-talk-ai-governance-at-scale

- 2020, Responsible A.I. Credit Scoring - A Legal Framework, Katja Langenbucher


