Variables in scenarios
When run, scenarios setup a supplementary level of variables, to define or redefine instance- or project-wide variables. These definitions and redefinitions of variables are then accessible to all actions during the scenario run, and to all reporters executed at the end of the run.
(Re)defining new variables
In a step-based scenario, the user can insert a Define variables step to add new variables. This type of step evaluates a DSS formula and stores the result as a variable. Subsequent steps and variable definitions will then be able to use the newly defined variable. The formulas in a Define variables step have access to all instance- and project-wide variables, and to the parameters of the trigger that initiated the scenario run.
In a Python script scenario, variables are made available through a
Scenario
object, both for getting and setting.
Usage in partition identifiers
When a step-based scenario is used, it is commonplace to have actions on given datasets, and if the dataset is partitioned, then specifying the partition targeted by the action is needed. This is done by setting a partition identifier in the corresponding parameter of the step. Variables are available in these fields, making it possible to use expressions to pick a specific partition.
For time partitioning, you can use special keywords. For example, “CURRENT_DAY” will be replaced by the current date when the scheduler runs the task. The complete list of time partitioning keywords is:
-
CURRENT_HOUR
-
CURRENT_DAY
-
CURRENT_MONTH
-
CURRENT_YEAR
-
PREVIOUS_HOUR
-
PREVIOUS_DAY
-
PREVIOUS_MONTH
-
PREVIOUS_YEAR
Examples
Using the current date to refresh a partition
Selecting the partition corresponding to the current date can be done using the CURRENT_DAY and CURRENT_MONTH keywords as partition identifier. In the context of scenarios, a more flexible approach is to build the partition identifier with variables. For example, to build the partition identifier corresponding the previous month:
-
add a Define scenario variables step
-
add a first variable
today
whose expression isnow()
-
add a second variable
last_month
whose expression istoday.inc(-1, 'month')
-
finally, prepare the date as a partition identifier with a third variable
last_month_id
whose expression islast_month.substring(0,7)
And in a
Build
step, the partition for a dataset can be set to
${last_month_id}
. A natural extension is to launch the building of several partitions at once, ie. doing dynamic partitioning. A list of partitions to build would then be a comma-separated list of partition identifiers. For more advanced usage of partitions, see
Partition identifiers
.
Using the date from a time-based trigger
Triggers produce parameters that can be used in expressions. To build a variable whose value is the date of the scenario launch minus 5 days:
-
add a Define scenario variables step
-
add a first variable
start
whose expression isasDate(scenarioTriggerParam_expectedStart)
-
add a second variable
five_days_before_start
whose expression isstart.trunc('day').inc(-5, 'day')
In case the scenario can be launched manually, the second variable should have as expression:
if(isNull(start),
now(),
start).trunc('day').inc(-5,
'day')
Using the results of a previous SQL step
Steps produce results, which can be used to define variables. In order to access the step’s result, the step must have a name. For example, if the
Define scenario variables
comes after a
the_sql
step of type
Execute SQL
, whose query is
select
max(order_date)
as
m
from
orders
, then building a variable from the maximum date of orders can be done:
-
add a Define scenario variables step
-
add a first variable
max_orderdate
whose expression isparseJson(stepOutput_the_sql)['rows'][0][0].asDate()
Retrieving the message of a check
The Run checks step keeps the results of the checks it has run for subsequent steps. A typical use is to insert checks results in reports sent at the end of the run.
For example, after a Run checks step named the_checks , the variable ${stepOutput_the_checks} contains the JSON of the checks’ output. If one is interested by the checks on a dataset named checked in the project PROJ , then the checks’ results for that dataset is a variable parseJson(stepOutput_the_checks)[‘PROJ.checked’].results .
If the goal is to retrieve the status of a check checkX , with a bit of filtering one obtain this status with ${filter(parseJson(stepOutput_the_checks)[‘PROJ.checked’].results, x, x.check.meta.label == ‘checkX’)[0].value.outcome}
Retrieving the value of a metric
The Compute metrics step keeps the results of the metrics it has run for subsequent steps.
For example, after a Compute metrics step named the_metrics , the variable ${stepOutput_the_metrics} contains the JSON of the metrics’ computation output, indicating which metrics got computed and their value, and which metrics were skipped. If one is interested by the value of the metrics on a dataset named computed in the project PROJ , then the metrics’ results for that dataset is a variable parseJson(stepOutput_the_metrics)[‘PROJ.computed’].results .
If the goal is to retrieve the value of the metric col_stats:MIN:cost , with a bit of filtering one obtains this status with ${filter(parseJson(stepOutput_the_metrics)[‘PROJ.computed’].computed, x, x.metricId == ‘col_stats:MIN:cost’)[0].value}