# Hands-On Tutorial: Top N Recipe[¶](https://knowledge.dataiku.com/latest/courses/advanced-data-prep/visual-recipes-102/topn/topn-hands-on.html#hands-on-tutorial-top-n-recipe "Permalink to this headline")

Recall that the Explore tab of a dataset only includes a sample of the actual dataset. Accordingly, sorting a column in that view only sorts the rows in the current sample. The true minimum or maximum value for a column might not be included.

The **Top N** recipe allows you to retrieve records from a dataset based on the top and bottom values of a given column, or groups within a column. This can be especially helpful when publishing results to a dashboard.

## Let’s Get Started![¶](https://knowledge.dataiku.com/latest/courses/advanced-data-prep/visual-recipes-102/topn/topn-hands-on.html#lets-get-started "Permalink to this headline")

Using the same credit card transactions project from other Advanced Designer tutorials, you’ll learn how to:

* retrieve records of a dataset based on the top and/or bottom values of a column using the visual **Top N** recipe

* create new columns with the **Computed Columns** step of the Top N recipe.

This lesson assumes that you have basic knowledge of working with Dataiku DSS datasets and recipes.

Note

If not already on the Advanced Designer learning path, completing the Core Designer Certificate is recommended.

To complete the Advanced Designer learning path, you’ll need access to an instance of Dataiku DSS (version 8.0 or above) with the following plugins installed:

* Census USA (minimum version 0.3)

* Reverse geocoding

These plugins are available through the Dataiku Plugin store, and you can find the instructions for installing plugins in the reference documentation. To check whether the plugin is already installed on your instance, go to the **Installed** tab in the Plugin Store to see a list of all installed plugins.

Note

If your goal is to complete **only** the tutorials in Visual Recipes 102, the Census USA plugin is not required.

Tip

Users of Dataiku Online should note that plugin installation follows a different path compared to on-premises or local instances.

* Navigate to the **Plugins** tab of your launchpad.

* Click **Add a Plugin**.

* Search for the plugin by name, in this case `US Census`. (“Reverse geocoding” is already available by default, and so does not need to be installed).

* These tutorials use only a Design node, and so click **Install on Design**.

* Click **Close**.

After installation, it may take a few minutes before the plugin’s components appear, depending on the number of existing plugins and code environments on the instance.

The following lessons explain the concepts you’ll be working with in this hands-on lesson:

* Concept: Top N Recipe

* Concept: Common Steps in Recipes

### Workflow Overview[¶](https://knowledge.dataiku.com/latest/courses/advanced-data-prep/visual-recipes-102/topn/topn-hands-on.html#workflow-overview "Permalink to this headline")

The final Flow having added a Top N recipe is shown below.

## Create Your Project[¶](https://knowledge.dataiku.com/latest/courses/advanced-data-prep/visual-recipes-102/topn/topn-hands-on.html#create-your-project "Permalink to this headline")

* Click **+New Project > DSS Tutorials > Advanced Designer > Visual Recipes & Plugins(Tutorial)**.

Note

If you’ve already completed the Advanced Formula & Regex hands-on tutorials, you can use the same project.

Note

You can also download the starter project from this website and import it as a zip file.

Aside from the input datasets, all of the others are empty managed filesystem datasets.

You are welcome to leave the storage connection of these datasets in place, but you can also use another storage system depending on the infrastructure available to you.

To use another connection, such as a SQL database, follow these steps:

* Select the empty datasets from the Flow. (On a Mac, hold Shift to select multiple datasets).

* Click **Change connection** in the “Other actions” section of the Actions sidebar.

* Use the dropdown menu to select the new connection.

* Click **Save**.

Note

For a dataset that is already built, changing to a new connection clears the dataset so that it would need to be rebuilt.

Note

Another way to select datasets is from the **Datasets** page (G+D). There are also programmatic ways of doing operations like this that you’ll learn about in the Developer learning path.

The screenshots below demonstrate using a PostgreSQL database.

* Whether starting from an existing or fresh project, ensure that the dataset *transactions\_known\_prepared* is built, and its schema includes the columns created in the Window recipe.

* From the Flow, select the end dataset required for this tutorial: *transactions\_known\_prepared*

* Choose **Build** from the Actions sidebar.

* Choose **Recursive > Smart reconstruction**.

* Click **Build** to start the job, or click **Preview** to view the suggested job.

* If previewing, in the **Jobs** tab, you can see all the activities that Dataiku will perform.

* Click **Run**, and observe how Dataiku progresses through the list of activities.

If the *transactions\_known\_prepared* dataset does not include columns like *card\_purchase\_amount\_min*, then you need to propagate the schema changes downstream. If you completed the Advanced Formula & Regex tutorial, this should already be done.

* Enter the *compute\_transactions\_known\_prepared* recipe.

* Click **Run** from inside the recipe editor.

* Accept the schema change update, dropping and recreating the output.

* Confirm the output dataset includes the Window-generated columns.

Note

See the product documentation on schema propagation to learn more.

## Retrieve Top and Bottom Values[¶](https://knowledge.dataiku.com/latest/courses/advanced-data-prep/visual-recipes-102/topn/topn-hands-on.html#retrieve-top-and-bottom-values "Permalink to this headline")

Let’s find the largest and smallest purchases for every *card\_id*.

* From the **Actions** menu of the *transactions\_known\_prepared* dataset, choose **Top N**.

* Name the output dataset `top\_purchase\_amt\_by\_card`, and click **Create Recipe**.

The first step is to determine the number of top and/or bottom rows to return according to which column(s).

* In the **Top N** step, retrieve the top `5` and bottom `5` rows.

* Select *purchase\_amount* as the column to sort by.

* Click the icon to the right of the selected column to sort in descending order .

By default, Dataiku retrieves and sorts the top and bottom values from the whole dataset.

* Change this behavior in the “from” section by selecting **each group of rows identified by…** and specifying *card\_id* as the column to use as key.

* In addition, for each row, choose to compute the **count of rows in its group** and the **rank of row within its group**.

## Retrieve Columns[¶](https://knowledge.dataiku.com/latest/courses/advanced-data-prep/visual-recipes-102/topn/topn-hands-on.html#retrieve-columns "Permalink to this headline")

The Top N recipe provides the option of returning all or a selection of columns in the output dataset.

* On the **Retrieve columns** step, choose to retrieve “A selection of columns.”

* Select to retrieve the following columns, and then **Run** the recipe:

+ *transaction\_id*

+ *card\_id*

+ *merchant\_id*

+ *purchase\_amount*

+ *card\_purchase\_amount\_min*

+ *card\_purchase\_amount\_max*

+ *card\_purchase\_amount\_avg*

For every *card\_id*, the resulting dataset displays the top and bottom purchases according to the sort column (*purchase\_amount*), as well as the other retrieved columns.

In addition, the *\_rank* column shows how each transaction ranks from highest to lowest within its group, and the *\_duplicate\_count* column shows the total number of transactions made with a given card. Applying a filter to a single *card\_id* makes this easier to see.

## Compute Additional Columns[¶](https://knowledge.dataiku.com/latest/courses/advanced-data-prep/visual-recipes-102/topn/topn-hands-on.html#compute-additional-columns "Permalink to this headline")

Certain visual recipes also have the option of computing additional columns within the recipe instead of having to add a separate Prepare recipe.

Let’s compute the range of *purchase\_amount* for the rows and groups specified in the recipe.

* On the **Computed columns** step of the Top N recipe, click **+Add a Computed Column**.

* Name the new computed column `card\_purchase\_amount\_range`.

* In the **Mode** dropdown menu, you can choose between **DSS formula** and **SQL Expression**. Keep the default selection, **DSS formula**.

* To compute the difference between the min and max of the card purchase amount, type the following DSS formula expression into the formula editor:

§ (card\_purchase\_amount\_max - card\_purchase\_amount\_min)

The correct storage type in this case (double) should already be specified.

* **Run** the recipe again, updating the schema.

The output dataset contains the newly computed column *card\_purchase\_amount\_range*.

Note

Instead of using the **Computed columns** step in the Top N recipe, we could also have used the Formula processor in a Prepare recipe. However, the Computed columns step provides the flexibility of another option of how and when to calculate this column.

## What’s Next?[¶](https://knowledge.dataiku.com/latest/courses/advanced-data-prep/visual-recipes-102/topn/topn-hands-on.html#whats-next "Permalink to this headline")

In this lesson, we used the **Top N** recipe in Dataiku to filter a dataset based on the top and bottom values of some of its rows.

We also learned to display aggregated row statistics and to create additional columns in a dataset using the **Computed columns** step in the Top N recipe.

Now you can take your advanced data preparation skills to the next level with other Academy courses such as Plugin Store.
