# Definition

Generalized Linear Models (GLMs) are a generalization of the ordinary linear regression. Formally, the model can be written as: 

```math
{\displaystyle \operatorname{E}(\mathbf{y}|\mathbf{X}) = g^{-1}(\mathbf{X}\boldsymbol{\beta})}
```

Where $`\mathbf{y}`$ is the target variable, $`\mathbf{X}`$ represent the features. The model assumes that $`\mathbf{y}`$ follows a predefined distribution belonging to the [Exponential family](https://en.wikipedia.org/wiki/Exponential_family) (it includes Normal, Poisson, Gamma among others). $`g`$ is the link function (frequent choices include identity, log, inverse). And $`\boldsymbol{\beta}`$ is the vector of coefficients of the linear regression. 

Some special cases of GLMs are well-known: 

 - Ordinary linear regression, when distribution is normal and link function is identity
 - Logistic regression, when distribution is binomial and link function is logit

The GLM fitting consists in finding the $`\boldsymbol{\beta}`$ that maximizes the likelihood on the training dataset.

# Claim Frequency

Claim frequency is defined as the number of claims per year:

```ClaimFrequency = ClaimNb / Exposure```

Where ```ClaimNb``` is the number of claims that were reported for a given policyholder, and ```Exposure``` is the duration of the policy, in year. So when we model the claim frequency, we want to predict the expected number of claims a policyholder will report in a year. In our dataset, exposures can vary as customers did not all start their contract at the same time, so normalizing by the exposure allow to have comparable responses.

It is often modeled as a [Poisson](https://en.wikipedia.org/wiki/Poisson_distribution) distribution. We plotted the distribution of claim numbers, using exposure as weights in the following graph: [Claim Frequency on claim_train](insight:kMwNWuA).

# Claim Severity

Claim severity is defined as the claim amount per claim:

```ClaimSeverity = ClaimAmount / ClaimNb```

Where ```ClaimAmount``` is the total reported claim amounts for a given policyholder. Similarly to claim frequency, this normalization makes claim amounts comparable. This claim severity is only available when at least one claim has been reported by the policyholder.

It is often modeled as a [gamma](https://en.wikipedia.org/wiki/Gamma_distribution) distribution. We plotted the distribution of claim amounts, only for claim amounts above zero, using the number of claims as weights in the following graph: [Claim Severity on claim_train](insight:kmPkEBi).

# Pure Premium

Pure premium is the expected risk for a given policyholder per year. It is the final output of the modeling that will be used for pricing afterwards. It combines claim frequency and claim severity and can be written as follows:

```PurePremium = ClaimAmount / Exposure = ClaimFrequency * ClaimSeverity```

As seen in this equation, pure premium can be expressed as the product of claim frequency and claim severity. Hence, either we model claim frequency and claim severity, multiply the results and get the pure premium or we can directly model pure premium.

Pure premium usually follows a distribution with a high number of 0 and then a distribution of claim amounts resembling the gamma distribution. This distribution which is a mix of Poisson and gamma is called the [Tweedie](https://en.wikipedia.org/wiki/Tweedie_distribution) distribution. We plotted the distribution of claim amounts, only for claim amounts, using exposure as weights in the following graph: [Pure Premium on claim_train](insight:rMkR48N).

# Going Beyond GLMs

GLMs allow actuaries to model phenomenon that would not be possible to model using simple linear regression. However, the linear regression part of the GLM creates some strong constraints on the dependencies between the features and the response variables. To circumvent that limitation, analysts must engineer ther features precisely  and avoid colinearity between the variables.

To insert non linearities and more complex dependencies between covariates the response variable, other models are starting to be explored. These models include Decision Trees, Gradient Boosting Machines (GBM) and Neural Networks. To take into account the response distribution in the fitting, these models can use most of the Exponential family distributions log likelihood or deviances as objective functions.