Tutorial | In-database data visualization and preparation #
Get started #
Once you have an SQL connection available in Dataiku, you can use it as a storage and computation layer underneath native charts and visual data preparation in Dataiku Flows.
Objectives #
In this tutorial, you will:
-
Build a chart with the in-database engine.
-
Create a visual recipe that runs in-database.
Prerequisites #
-
You will need access to a Dataiku instance with an available SQL connection. If you don’t have one yet, see Tutorial | Configure a connection between Dataiku and a SQL database .
Create the project #
-
From the Dataiku Design homepage, click + New Project > DSS tutorials > Core Designer > In-database Operations .
-
From the project homepage, click Go to Flow .
Note
You can also download the starter project from this website and import it as a zip file.
Change connections #
The datasets in the Flow are not yet synced to your specific SQL database. Let’s fix that first.
-
From the Flow, multi-select all datasets computed by Dataiku (all except the two leftmost uploaded files).
-
Near the bottom of the Actions sidebar, select Change connection .
-
For the new connection field, select an available SQL connection.
-
Click Save .
Note
For more information, see Concept | Connection Changes .
You’ll next want to build the Flow.
-
Click Flow Actions at the bottom right of the Flow.
-
Click Build all .
-
Keep the default settings and click Build .
Use case summary #
The Flow begins from two uploaded CSV files:
Dataset |
Description |
---|---|
orders |
Order information for t-shirts, including the order’s date, the type of t-shirt, and the customer who placed the order. |
customers_stacked |
Additional information about the customers who have placed a t-shirt order. |
Build charts in-database #
Like the dataset preview in the Explore tab, charts by default represent a sample of the dataset. When this sample is representative, a chart based on a sample may be adequate.
However, in other cases, you may need a complete representation of the entire data. If a dataset is stored in a SQL database, you can efficiently compute a chart on the complete dataset using the in-database engine.
Note
You may also wish to review Concept | In-database charts .
Create a chart #
Let’s start with a simple chart.
-
Open the customers_stacked_prepared dataset, and navigate to the Charts tab.
-
In the default bar chart, drag Count of records from the column on the left to the Y axis.
-
Drag campaign to the X axis.

Note
You can find more resources about charts in the Knowledge Base or reference documentation .
Use the in-database engine #
At the moment, the chart is based on a sample of the data. Let’s switch to the in-database engine.
-
Click on the Sampling & Engine tab of the left pane.
-
For the execution engine, select In-database .
-
Click Save at the bottom and see the chart update to the full set of records.

Note
Learn more in the reference documentation about sampling and engines for charts .
Run visual recipes in-database #
In addition to building charts using the in-database engine, we can also run many visual recipes in-database, provided certain conditions are met — most importantly, having the input and output datasets stored in the same database.
Note
For more information on running recipes in-database, see the reference documentation .
Create a Join recipe #
Let’s start with a typical Join recipe.
-
From the Flow, select the orders_copy and then customers_stacked_prepared datasets.
-
In the Actions sidebar on the right, select Join from the menu of visual recipes.
-
Click Create Recipe .

Set a join condition #
The next step is to define the join condition. A simple left join will do.
-
On the Join step, click Add a Condition .
-
On the left side of the dialog, select customer_id as the join column from orders_copy .
-
Click OK to close the dialog.

View the SQL query for a Join recipe #
Before running it, let’s see the actual SQL query generated by the Join recipe.
-
Navigate to the Output step on the left.
-
Click View Query to see the actual SQL query that the Join recipe will run, and then close the dialog.
-
Click Run at the bottom left, noting how the in-database SQL engine is selected by default.

Important
With a Join recipe, we always have the option of computing additional columns pre- or post-join. In this case though, because the recipe meets the conditions for the SQL engine, we could also create such columns using SQL in addition to DSS formulas .
Note
To understand how Dataiku selects the execution engine for a visual recipe, see Tutorial | Recipe engines .
What’s next? #
Congratulations! You executed routine tasks like building charts and running visual recipes, but did so with a more efficient in-database engine.
Using the in-database engine with visual recipes is convenient for the most common data transformations. However, when you need a more custom approach, you can turn to SQL notebooks and SQL recipes.
Try that in Tutorial | SQL notebooks and recipes !
Tip
You can find this content (and more) by registering for the Dataiku Academy course, Dataiku & SQL . When ready, challenge yourself to earn a certification !