The initial flow zone contains the workflow that was inputted by the user. It can either be a flat file dragged and dropped into the input dataset or a connection to an existing dataset can be setup through the Dataiku application and its data model is described in [this article](article:12).

![Data Input.png](g3Ibb3wXit14)

The first sync recipe only transfers the input data to a managed dataset, which will be the starting point of the whole analysis.

The second recipe incorporates the configuration defined in the [Dataiku application](article:7) and creates the 3 datasets that will be used in the webapp:

- [workflow_clean](dataset:workflow_clean): contains the logs with the columns renamed according to the Dataiku application and some cleaning is applied.
- [dropped_cases](dataset:dropped_cases): all the cases that were dropped because they could not be ordered. Use a sorting column to avoid dropping cases when activities share the same timestamp.
- [dropped_cases_number](dataset:dropped_cases_number): the number of dropped cases.

The pre-processing steps to create the above datasets included in the [python recipe](recipe:compute_workflow_filtered) are the following:

 1. The three mandatory columns (case, activity, timestamp, and end timestamp when available) are renamed from the source. Additional specified attributes are kept in the dataset, while all others are dropped.
 2. Timestamps are parsed. Cases containing activities with concurrent timestamps are dropped.
 3. Dummy start and end activities are added to the data to make the processes more easily visualizable.



