In this zone, the dataset is transformed to build the graphs of response against each variable and visualize patterns.

![Univariate Analysis.png](oikm8VtQjkgo)

To avoid having as many grouping recipes as variables we want to analyze, we first fold all the variables using a prepare step. We obtain thereby a much longer dataset that will enable us to compute all the aggregates using only one grouping recipe.

The following window recipe builds some initial aggregates that will be necessary in the grouping recipe. These aggregations are built using the partitioning keys  **grouping**  and  **group_value**  which are the results of the fold we made previously. We are computing the sum of claim number, claim amount and exposure.

In the grouping recipe, the keys are also  **grouping**  and  **group_value** . We choose as aggregations: the minimum of claim number, claim amount sum and exposure sum, which are their actual value, because the same aggregation was already made in the window recipe (we could have taken the maximum, it would not make a difference). To compute uncertainty on our metrics estimate, we use custom aggregations to build standard deviations of claim frequency, claim severity and pure premium. These standard deviations are respectively weighted by exposure, number of claims and exposure.

```math
{\displaystyle sd(ClaimFrequency) = \sqrt{\frac{\sum{Exposure * (\frac{ClaimNb}{Exposure} - \frac{\sum{ClaimNb}}{\sum{Exposure}})^2}}{\sum{Exposure}}}}
```

```math
{\displaystyle sd(ClaimSeverity) = \sqrt{\frac{\sum{ClaimNb * (\frac{ClaimAmount}{ClaimNb} - \frac{\sum{ClaimAmount}}{\sum{ClaimNb}})^2}}{\sum{ClaimNb}}}}
```

```math
{\displaystyle sd(PurePremium) = \sqrt{\frac{\sum{Exposure * (\frac{ClaimAmount}{Exposure} - \frac{\sum{ClaimAmount}}{\sum{Exposure}})^2}}{\sum{Exposure}}}}
```

Finally a prepare recipe unfolds the folded  **grouping**  and  **group_value**  columns by first creating a dummy column for the  **grouping**  and assigning the value from  **group_value**  using a Python prepare step. Then Claim Frequency, Claim Severity and Pure Premium by taking their respective ratios. Finally upper and lower bounds of these estimates are computed using the standard deviations computed previously. For example for Claim Frequency:

```math
{\displaystyle ClaimFrequencyUpperBound = ClaimFrequency + 1.96 \frac{sd(ClaimFrequency)}{\sqrt{\sum{Exposure}}}}
```

The graphs created in the final dataset of this zone are displayed in the [dashboard](dashboard:DbWZIwg), and will be discussed further in [Claim Frequency](article:24), [Claim Severity](article:23) and [Pure Premium](article:25).