This feature is not available in DSS Free Edition

Partitioning is the ability to split a dataset along meaningful dimensions. Each partition contains a subset of the dataset.

For example:

  • A database of customers can be partitioned by country.
  • Web logs can be partitioned by day and by the server which generated the log line

Partitions can be used for:

  • Incrementality : When a dataset is partitioned, you don't need to build the full dataset, but instead partition by partition. Only out-of-date partitions need to be rebuilt
  • Advanced dependencies : Partitioning a dataset allows you to have partition-level dependencies management. Instead of just having the recipe specify that an output dataset depends from an input dataset, you can define what partitions of the input are required to compute a given partition of the output.

More info is available in the Concepts page