# Concept Summary: Statistical Testing[¶](https://knowledge.dataiku.com/latest/courses/statistics/statistical-testing/statistical-testing-concept-summaries.html#concept-summary-statistical-testing "Permalink to this headline")

Before we move on to the hands-on lessons in this section, let’s summarize what we just learned in each of the concept videos.

* Hypothesis Testing

* Test Categories

* Grouping Variable

* Adjustment Method

## Hypothesis Testing[¶](https://knowledge.dataiku.com/latest/courses/statistics/statistical-testing/statistical-testing-concept-summaries.html#hypothesis-testing "Permalink to this headline")

Recall that a hypothesis test allows you to evaluate a pair of competing hypotheses (the null and alternative hypotheses) about the properties of a population, such as its distribution or parameters of the distribution.

Hypothesis test cards in Dataiku DSS display the purpose of the test, to guide in choosing the correct test for our use case.

To create a card, specify one or more test variables, and in some cases, specify additional parameters.

Dataiku DSS uses an alpha value (or significance level) of 0.05 to determine how rare an event must be for us to reject the null hypothesis. DSS also computes an appropriate test statistic value and calculates the *p* value. By comparing the *p* value and significance level, DSS can arrive at one of these conclusions:

* Reject the null hypothesis, if \(p \leq alpha\)

* Determine that the test is inconclusive, if \(p > alpha\)

The header of any hypothesis test card includes a question icon that provides detailed information about the test, such as: what kind of test it is, the underlying assumptions for the test to be meaningful, and so on.

## Test Categories[¶](https://knowledge.dataiku.com/latest/courses/statistics/statistical-testing/statistical-testing-concept-summaries.html#test-categories "Permalink to this headline")

Dataiku DSS groups hypothesis tests into categories based on different test attributes.

### 1. One-sample, Two-sample, and N-sample Tests[¶](https://knowledge.dataiku.com/latest/courses/statistics/statistical-testing/statistical-testing-concept-summaries.html#one-sample-two-sample-and-n-sample-tests "Permalink to this headline")

* **One-sample** tests consider one population from which a random sample is used to make inferences.

* **Two-sample** tests consider two populations from which independent random samples are used to make inferences.

* **N-sample** tests consider more than two populations with independent random samples that are used to make inferences.

### 2. Location or Distribution Tests[¶](https://knowledge.dataiku.com/latest/courses/statistics/statistical-testing/statistical-testing-concept-summaries.html#location-or-distribution-tests "Permalink to this headline")

* **Location** tests evaluate hypotheses about location parameters. For example, the mean of a population (in the case of the one-sample **Student’s t-test**), and the median of a population (in the case of the **Sign test**).

* **Distribution** tests evaluate hypotheses about population distributions. For example, one-sample distribution tests compare the distribution of a population to a hypothesized one, and two-sample distribution tests compare the distributions of two populations.

### 3. Categorical Tests[¶](https://knowledge.dataiku.com/latest/courses/statistics/statistical-testing/statistical-testing-concept-summaries.html#categorical-tests "Permalink to this headline")

DSS provides the **Chi-square Independence** test, to evaluate whether two categorical variables are independent.

### Additional Test Details[¶](https://knowledge.dataiku.com/latest/courses/statistics/statistical-testing/statistical-testing-concept-summaries.html#additional-test-details "Permalink to this headline")

Finally, the header of a hypothesis test card contains a question icon that provides additional details about the test. More specifically, DSS displays if the test is a **parametric** or **nonparametric** test.

## Grouping Variable[¶](https://knowledge.dataiku.com/latest/courses/statistics/statistical-testing/statistical-testing-concept-summaries.html#grouping-variable "Permalink to this headline")

Grouping variables are used to split up a data set into disjoint groups — one group for each unique value of the grouping variable. In statistical testing, grouping variables can be used to define populations.

If you’re familiar with the **Group** Recipe, you may recall that it uses a group key, based on the unique values of a particular column (or a combination of columns), to perform aggregations. This group key has similar functionality to the grouping variable.

In Dataiku DSS, when performing two-sample tests, you specify a numerical “Test Variable” and a **Grouping Variable** with 2 modalities (or groups). For the N-sample tests, specify a numerical “Test Variable” and a “Grouping Variable” with multiple modalities.

Furthermore, you can define test populations by manually specifying values from the Grouping Variable.

You can also define test populations by building groups from the most frequent values of the grouping variable, and specifying a value for the “Maximum number of groups”. This value limits the number of modalities, if your grouping variable is categorical, or limits the number of bins if your grouping variable is numerical.

## Adjustment Method[¶](https://knowledge.dataiku.com/latest/courses/statistics/statistical-testing/statistical-testing-concept-summaries.html#adjustment-method "Permalink to this headline")

Dataiku DSS provides an **Adjustment Method** parameter for hypothesis test cards that perform several comparisons simultaneously. For example, the **Pairwise Student’s t-test** and the **Pairwise Median Mood Test**.

When testing a hypothesis, an acceptable significance level such as `0.05` or `0.01` is typically used. This significance level corresponds to the probability of making a type I error — that is, incorrectly rejecting a null hypothesis. However, when several statistical tests are being performed simultaneously, using the same error rate for the set of all comparisons can increase the probability of making type I errors. Using the adjustment method parameter in Dataiku DSS can avoid this situation.

If you choose to use the Adjustment Method parameter, you can specify either the **Bonferroni** or **Holm-Bonferroni** adjustment method. For each hypothesis that is tested, these methods adjust the observed *p* value, which is then compared to the pre-specified significance level.
