The primary objective of Probability of Default (PD) modeling is to gauge the probability of loans within a designated portfolio defaulting under various economic scenarios. This modeling approach can take two distinct paths: one involves directly forecasting the proportion of loans expected to default over specific future time intervals, while the other, which we have adopted in this project, entails categorizing the portfolio into discrete segments that can be individually assessed with modeled transitions between them. Our chosen approach employs transition matrices, as detailed in the IMF paper referenced in our [project resources](article:9), to provide a more nuanced and precise assessment of default risk within the portfolio.

# Transition Matrices

Through-the-Cycle (TTC) and Point-in-Time (PIT) transition matrices are essential components of credit risk modeling, particularly in the context of credit portfolio stress testing. These matrices help financial institutions assess the probability of credit events such as defaults or rating migrations occurring over time: 

- The TTC transition matrix provide a long-term perspective on credit risk by capturing credit events and rating migrations over a full economic cycle, which typically includes periods of economic expansion and contraction. This matrix is designed to be stable and less sensitive to short-term economic fluctuations. It reflects the average behavior of credit portfolios across different economic environments.
- PIT transition matrices, in contrast, offer a snapshot of credit risk at a specific point in time. They reflect the credit quality of a portfolio as of a particular date, considering current economic conditions and market factors. PIT matrices are more sensitive to short-term economic changes and capture the impact of economic cycles and market dynamics as they unfold.

![TTC Transition Matrix.png](BYCX2CMOg4fK)

# Z-Score

The Z-score concept is described in the JP Morgan paper linked in the [resources](article:9). It consists in computing the distance between each of the PIT transition matrices with the TTC matrix. With this approach, we assume that each transition follows a normal distribution $`X`$. The distribution is segmented into bins that are the credit risk categories, they are equivalent to credit ratings issued by Credit Rating Agencies. For a given initial credit bin G, the bins are segmented by $`(x_g^G, x_{g+1}^G]`$, g in the possible set of next ratings. The probability for moving from credit rating G to g is defined as:

```math
P(G, g) = \Phi(x_{g+1}^G) - \Phi(x_{g}^G)
```

$`X`$ is modelled as:

```math
X = \sqrt{1 - \rho} Y + \sqrt{\rho} Z
```

Where $`Y`$ is the idiosyncratic component and $`Z`$ the systemic component, which is common to all transitions. $`Z`$ explains a portion of the variance for each period, the goal is to find the $`Z`$ that minimizes the difference between the modeled transition probabilities and the observed ones. This distance can be written as follows:

```math
\Delta(x_{g+1}^G,x_g^G,Z_t) = \Phi(\frac{x_{g+1}^G - \sqrt{\rho}Z_t}{\sqrt{1 - \rho}}) - \Phi(\frac{x_g^G - \sqrt{\rho}Z_t}{\sqrt{1 - \rho}})
```

Then for a given $`t`$ and $`\rho`$, the objective function that is minimized of $`Z`$ is:

```math
\min_{Z_t} = \sum_G \sum_g \frac{n_{t,g}[P_t(G, g) - \Delta(x_{g+1}^G,x_g^G,Z_t)]^2}{\Delta(x_{g+1}^G,x_g^G,Z_t)[1 - \Delta(x_{g+1}^G,x_g^G,Z_t)]}
```

To obtain the time series $`Z_t`$, we first optimize over the whole period on $`\rho`$, such as$`Z_t`$ as a variance of one. Then we will get the $`Z_t`$ for each period, using the optimized $`\rho`$. Below is the graph of the Z-Score, where we can easily see drops corresponding to economic downturns (2008 crisis, 2020 covid).

![Extracted Z-Score.png](obknmErXmLqt)

# Linear Regression

We then use the historical macroeconomic variables to predict this Z-Score which will be used to model probability of default. Given the small number of observations (as many as the number of periods), and the need to have a model simple to explain, linear models are often the chosen option. So here, we use a linear regression as our target variable, the Z-Score, is a numeric value. To avoid overfitting, a few of the economic variables (and their lags) are selected to be included in the model, this selection can be automated or achieved by iteratively trying some combinations. It is also important to run some sanity checks to make sure that the coefficients make sense economically.
