“Files in folder” dataset

A managed folder is a generic object in the DSS flow that contains files. It can be read and written by Python or R recipes, and files can be added to it either manually or via an API.

In some setups, it is useful to recreate a dataset from such a managed folder. To do this use the Create Dataset action for a manager folder, or use the Files in folder dataset. This is accessed visually via the menu +DATASET / Internal / Files from Folder .

This enables advanced setups like:

Note

Even when the folder backing the “Files in folder” dataset is on HDFS, the dataset can’t be used as Hive table. Use a normal HDFS dataset pointing at the same location as the folder instead.