Pure Premium Modeling is the direct prediction of claim amount, unconditional on the existence of a claim.

![Pure Premium Modeling.png](A8nqxEVO05me)

The target is ClaimAmount, as it was for [Claim Severity](article:17) but the exposure will be Exposure instead of ClaimNb. The distribution of the target is similar to the one for Claim Amount, except that it also contains in addition a significant peak at 0, like [Claim Frequency](article:16). Therefore, the distribution chosen to model the response is the Tweedie distribution (and therefore the evaluation metric is the Tweedie deviance). The Tweedie distribution has a variance power parameter that affects the shape of its distribution: setting it to 1 makes it a Poisson distribution and to 2 makes it a gamma distribution. We try values between 1 and 2.

Another option for the evaluation metric would be to use the [Gini coefficient](https://en.wikipedia.org/wiki/Gini_coefficient). It can be interpreted as a goodness-of-fit measure; its value range between 0 and 1, where 0 is a random fit and 1 is the perfect fit.

The GLM is set up in the following way:
- Elastic Net Penalty: 0
- Distribution: Tweedie
- Link function: Log
- Offset mode: Offsets/Exposures
- Training dataset: [claim_train](dataset:claim_train)
- Offset columns: None
- Exposure columns: Exposure, as we are predicting the claim amounts directly, normalized by exposure.
- Variance Power: values between 1 and 2, 1.5 is a good starting point.

We then score [claim_test](dataset:claim_test) to produce a dataset that we will compare with the scoring from the compound model which is the model combining together Claim Frequency and Claim Severity.