# Hands-On Tutorial: Visual Logic for Data Preparation[¶](https://knowledge.dataiku.com/latest/kb/data-prep/prepare-recipe/visual-logic.html#hands-on-tutorial-visual-logic-for-data-preparation "Permalink to this headline")

A common data preparation task is defining complex rules based on several conditions or input columns. Think of cases where you needed to recode categorical values or define logic in the style of “if-then” statements.

Dataiku offers many tools to solve these kinds of problems. Here, we’ll focus on four tools in particular, each with its strengths and weaknesses.

## Let’s Get Started[¶](https://knowledge.dataiku.com/latest/kb/data-prep/prepare-recipe/visual-logic.html#let-s-get-started "Permalink to this headline")

In this tutorial, you will practice defining rules and complex visual filters using four different methods:

* Switch Formula function

* Switch case processor

* “Create if, then, else statement” processor

* filtering conditions in visual recipes and other screens throughout Dataiku

### Prerequisites[¶](https://knowledge.dataiku.com/latest/kb/data-prep/prepare-recipe/visual-logic.html#prerequisites "Permalink to this headline")

* A Dataiku 11 instance

* A basic level of knowledge about Dataiku is helpful. If you’ve never used Dataiku before, try the Core Designer learning path or a Quick Start tutorial!

## Create the Project[¶](https://knowledge.dataiku.com/latest/kb/data-prep/prepare-recipe/visual-logic.html#create-the-project "Permalink to this headline")

We’ll reuse the Flow from the AI Consumer quick start.

* From the Dataiku homepage, click **+New Project > DSS Tutorials > Quick Start > AI Consumer Quick Start (Tutorial)**.

Note

You can also download the starter project from this website and import it as a zip file.

### Use Case Summary[¶](https://knowledge.dataiku.com/latest/kb/data-prep/prepare-recipe/visual-logic.html#use-case-summary "Permalink to this headline")

This project is a simplified credit card fraud use case. Using data about transactions, merchants, and cardholders, we have a model that predicts which transactions should be authorized and which are potentially fraudulent.

A score of 1 for the target variable, *authorized\_flag*, represents an authorized transaction. A score of 0, on the other hand, is a transaction that failed authorization.

## Switch Formula Function[¶](https://knowledge.dataiku.com/latest/kb/data-prep/prepare-recipe/visual-logic.html#id1 "Permalink to this headline")

If you’re familiar with the `SWITCH()` function in Microsoft Excel, you’ll be glad to know the same functionality can be found in Dataiku’s Formula language.

Let’s use the **switch function** in Dataiku’s Formula language to remap values in a single categorical column.

* Open the Prepare recipe that computes *transactions\_joined\_prepared*.

* Add a new **Formula** step to the script.

* Name the output column `item\_category\_recoded`.

* Open the editor panel, and paste the following expression below to recode the first three values.

* Observe the sample output, and then click **Apply**.

§ switch(item\_category, "A", "Domestic", "B", "International", "C", "Unknown")

Note

Note that any values not specified, such as category “D” in this case, remain empty in the output column.

## Switch Case Processor[¶](https://knowledge.dataiku.com/latest/kb/data-prep/prepare-recipe/visual-logic.html#id2 "Permalink to this headline")

In some cases, the switch function will be the perfect tool—particularly when creating a formula to be copied and pasted from a spreadsheet outside of Dataiku.

On the other hand, it may be difficult to read and maintain this kind of remapping for a column like *merchant\_subsector\_description*, which has 38 distinct values.

The **Switch case** processor step provides a more robust definition of rules for remapping a single variable.

* In the same Prepare recipe, add a **Switch case** step to the script.

* Choose *merchant\_subsector\_description* as the column.

* Name the output `spending\_type`.

* Click **+Add Rule** repeatedly to create each of the following sets of key-value pairs:

>

>

> 	+ `gas` -> `essential`

> 	+ `internet` -> `essential`

> 	+ `luxury goods` -> `discretionary`

> 	+ `consumer electronics` -> `discretionary`

>

* Add `unknown` as the value for unmatched rows, and observe the step preview.

Tip

Here we used the default case-sensitive “Exact” mode, but the processor also has the mode “Lowercase” (match “GAS”, “Gas”, and “gas” for example) and the accent-insensitive “Normalize” mode.

Compared to the Formula above, this solution makes it much easier to review the table of rules.

Note

This processor has similar functionality to the Find and Replace processor, but has a few differences, such as the ability to set a value for unmatched rows.

## Create if, then, else statements[¶](https://knowledge.dataiku.com/latest/kb/data-prep/prepare-recipe/visual-logic.html#create-if-then-else-statements "Permalink to this headline")

Like with the Switch formula function, the Switch case processor might be the perfect tool for your task. For more complicated logic, you can try the **Create if, then, else statements** processor. This processor allows you to implement if, else if, and else statements in a visual manner.

Let’s use this processor to create rules that define categories of risk.

* In the same Prepare recipe, add a **Create if, then, else statements** step to the script.

### Start With an If Statement[¶](https://knowledge.dataiku.com/latest/kb/data-prep/prepare-recipe/visual-logic.html#start-with-an-if-statement "Permalink to this headline")

Let’s start by defining parameters based on the minimum and maximum possible values for FICO scores, a common measure of creditworthiness in the United States.

* For the first condition, if the column *card\_fico\_score* is `<` the value `300`, then a new column (type `risk`) equals the value `invalid`.

* Click **Preview**. All rows in the preview of the *risk* column should be empty since none satisfy this condition.

This is only the minimum bound. Let’s also add the maximum.

* In the editor panel, add a second condition to the if statement.

* In the if block, click **Add a Condition**:

>

>

> 	+ if *card\_fico\_score* is `>` the value `850`, then the *risk* column should also be set to `invalid`.

>

When multiple conditions in an if or else if statement are present, we need to be mindful of the **conjunction** connecting them.

* Set the conjunction connecting these conditions to **OR** so that if either one of the two conditions is satisfied, then the row will be marked as `invalid`.

* Click **Preview** again. All rows remain empty.

### Add an “Else If” Block[¶](https://knowledge.dataiku.com/latest/kb/data-prep/prepare-recipe/visual-logic.html#add-an-else-if-block "Permalink to this headline")

Once we have the first if-then statement verifying valid FICO scores, let’s define more rules to help categorize the level of a transaction’s risk.

* In the editor panel, click **+ Add Else If Group**.

* Provide the condition *days\_active* (the number of days a credit card is active) is `<` the value `30`.

* If this condition is satisfied, **then** set the new column `risk` to the value `high`.

* Click **Preview**, and observe that some rows in the *risk* column now have the value “high”.

### Add an Else Statement[¶](https://knowledge.dataiku.com/latest/kb/data-prep/prepare-recipe/visual-logic.html#add-an-else-statement "Permalink to this headline")

Let’s use a final **else statement** to define a rule for any value that does not satisfy any of the conditions above it.

* In the editor panel, click **+ Add Else**.

* If none of the above conditions are met, set the new column `risk` to the value `low`.

* Click **Preview** to see all of the previously empty rows now filled with this new value.

### Add More “Else If” Blocks[¶](https://knowledge.dataiku.com/latest/kb/data-prep/prepare-recipe/visual-logic.html#add-more-else-if-blocks "Permalink to this headline")

We can imagine needing many more possible outputs than just “invalid”, “high”, and “low”. We can define them with additional **else if** blocks.

* Beneath the first “else if” block (above the else block), click **+ Add Else If Group**.

* Provide the condition *days\_active* is `<` the value `60`.

* If this condition is satisfied, **then** set the new column `risk` to the value `medium`.

* Click **Preview** to see that some values in the *risk* column now are labeled as “medium”.

Note

The order of if and “else if” blocks matters! Consider if the order of the two “else if” blocks in this example were reversed. A transaction on a card active for 15 days would satisfy the first “else if” block and be labeled as a medium risk transaction–even though it also satisfies the more stringent (but subsequent) condition for “high” risk transactions.

### Turn Conditions into Groups for Nested Logic[¶](https://knowledge.dataiku.com/latest/kb/data-prep/prepare-recipe/visual-logic.html#turn-conditions-into-groups-for-nested-logic "Permalink to this headline")

The current structure of the rules in place could be summarized by:

§ if (A or B) then "invalid"

§ else if C then "high"

§ else if D then "medium"

§ else "low"

We often need much more complex, nested logic. For example, the actual rules for a “medium” value might be `if D and (E or F)`.

For this kind of logic, you can organize conditions into **groups** of conditions. Let’s try this now.

* In the editor panel, return to the second “else if” block that defines the “medium” *risk* values.

* Click the dropdown arrow next to “+ Add a Condition” in order to **Add a group**.

* In addition to the first condition, “medium” *risk* values must also satisfy at least one of the following two conditions:

>

>

> 	+ *signature\_provided* `==` the value `0`.

> 	+ **OR**

> 	+ *purchase\_amount* `>` the value `200`.

>

* Click **Preview** to observe the new input, and then **Apply** to save the logic in this step.

* Finally, save the recipe and update the schema.

Note

Although we can define a quite complex set of rules with this processor, Dataiku provides a readable summary of this step in the script on the left.

### Going Further[¶](https://knowledge.dataiku.com/latest/kb/data-prep/prepare-recipe/visual-logic.html#going-further "Permalink to this headline")

This simple example only shows the beginning of what’s possible with this quite powerful processor. Two additional features are noteworthy:

#### More Operators[¶](https://knowledge.dataiku.com/latest/kb/data-prep/prepare-recipe/visual-logic.html#more-operators "Permalink to this headline")

This processor offers many types of **operators** when creating conditions that vary depending on the storage type of the column.

This example used only numeric operators (greater than, less than, etc.), but strings and dates, for example, have many others, such as “Contains” or “Is before date”.

#### Comparing Columns[¶](https://knowledge.dataiku.com/latest/kb/data-prep/prepare-recipe/visual-logic.html#comparing-columns "Permalink to this headline")

All of the conditions in this example are compared against a numeric value, but you can also evaluate conditions against a column. You can also use a column (or a formula) for the output of a then statement.

This is particularly useful when you have windowed or lagged features. For example, if we had the customer’s average purchase amount we could compare the *purchase\_amount* of the transaction to the column *purchase\_amount\_avg* instead of a fixed value.

## Use Filtering Conditions Everywhere[¶](https://knowledge.dataiku.com/latest/kb/data-prep/prepare-recipe/visual-logic.html#use-filtering-conditions-everywhere "Permalink to this headline")

Instead of checking for invalid FICO scores in the Prepare recipe, we might want to perform this kind of check earlier in the data pipeline, for example in the Pre-filter step of the Join recipe.

We can use the same style of conditions available in the “Create if, then, else statement” processor in many of Dataiku’s filtering screens, such as the Pre- and Post-filter steps of most visual recipes.

* Navigate to the Pre-filter step of the **Join** recipe.

* Turn **On** the Filter for the *cardholder\_info* dataset.

* Before performing the Join, keep only rows from this dataset that satisfy all of the following conditions:

>

>

> 	+ *age* `<= 100` (an arbitrary upper bound).

>

* Click the dropdown next to “+ Add a Condition” to add a group of conditions.

* In addition to the upper bound on age, set lower and upper bounds on FICO score:

>

>

> 	+ *fico\_score* must be `>=` to `300` **AND** *fico\_score* must be `<=` to `850`.

>

* Close the dialog, and review the summary of the filter. Then save and **Run** the recipe.

Note

In addition to visual recipes, you also have the power of this same filtering logic when configuring samples in the Explore and Charts tabs, defining cell value metrics in the Status tab, or creating an assertion in the Design tab of a prediction model.

## What’s Next?[¶](https://knowledge.dataiku.com/latest/kb/data-prep/prepare-recipe/visual-logic.html#what-s-next "Permalink to this headline")

Congratulations! Your data preparation toolkit for defining rules and complex logic without code in Dataiku has now expanded.

You can learn more about data preparation in Dataiku in the product documentation and the Knowledge Base.
