Tutorial | Top N recipe #

Let’s try out the Top N recipe to isolate the biggest purchases found in a practice dataset.

Objectives #

In this tutorial, you will:

  • Find the five most expensive purchases recorded in a dataset.

  • Find the five most expensive purchases per item category in the dataset.

Prerequisites #

To reproduce the steps in this tutorial, you’ll need:

  • Access to an instance of Dataiku 12+.

  • Basic knowledge of Dataiku ( Core Designer level or equivalent).

Create the project #

  1. From the Dataiku Design homepage, click + New Project > DSS tutorials > Advanced Designer > Top N recipe .

  2. From the project homepage, click Go to Flow .

Note

You can also download the starter project from this website and import it as a zip file.

You’ll next want to build the Flow.

  1. Click Flow Actions at the bottom right of the Flow.

  2. Click Build all .

  3. Keep the default settings and click Build .

Create the Top N recipe #

We’ll create a Top N recipe from the tx_prepared dataset.

  1. Select the tx_prepared dataset and click on the Top N recipe from the Actions tab.

  2. Change the output name to tx_topn and click Create Recipe .

A screenshot of the "New topn recipe" dialogue window.

Find most expensive purchases #

To find the five largest purchases in a dataset:

  1. Retrieve the 5 top rows, and 0 bottom rows.

  2. Select the purchase_amount column for sorting.

  3. Change the sort to descending order descending-button , so the most expensive orders appear at the top of the dataset.

  4. Run the recipe and open the output dataset.

A screenshot of the Top N step in the Top N recipe.

As you can see, the output dataset consists of just five records including the most expensive purchases in the whole dataset. Let’s add a little more complexity.

Group by item category #

Here, we’ll try a different example to find the five biggest purchases in the dataset per item category.

  1. Reopen the Top N recipe.

  2. Beneath from , select each group of rows identified by… .

  3. In the dropdown that appears, choose item_category as the key column.

  4. Check the row number within its group checkbox.

A screenshot of the Top N step showing how to group top purchases by item category.

Retrieve columns #

To make the output easier to interpret:

  1. Navigate to the Retrieve columns step.

  2. Change the Mode to Select columns .

  3. Move all of the columns to the left using the double arrow double-arrow .

  4. Move item_category and purchase_amount back to the right using the single arrow single-arrow .

  5. Run the recipe and then open the output dataset.

The output should have three columns: item_category , purchase_amount , and _row_number .

Screenshot of the output dataset showing top purchases grouped by item category.

You’ll see that for each category — A , B , C , and D — there are five purchase amounts that decrease within their grouping. We can confirm from the _row_number values that each grouping has five values.

What’s next? #

You just practiced using the Top N recipe to find the most expensive transactions in the dataset.

To try out more visual recipes, visit our page on Visual Recipes !