# Flow zone presentation
The [preprocessing_zone](flow_zone:hB8R0Xc) is dedicated to preprocess datasources. 
It is the backstage of the dataiku application *Preprocessing* section ([See the corresponding wiki article](article:38)). 

![preprocessing_zone.png](URCHoDuUHHmy)

You can see that this flow zone has 2 branches: a *locations* branch and a *customers* branch. 
The outputs of these two branches are respectively [locations_prepared](dataset:locations_prepared) (generated by [compute_locations_prepared](recipe:compute_locations_prepared)) and [customers_prepared](dataset:customers_prepared) (generated by [compute_customers_prepared](recipe:compute_customers_prepared) ).

# Flow zone specificity
Depending on the input datasources types of locations ([See datasets requirements ](article:4)), geocoding will be applied on datasets or not. Depending on that, scenarios [switch_locations_input_dataset](scenario:SWITCH_LOCATIONS_INPUT_DATASET) and [switch_customers_input_dataset](scenario:SWITCH_CUSTOMERS_INPUT_DATASET) will be responsible for switching  [compute_locations_prepared](recipe:compute_locations_prepared) and [compute_customers_prepared](recipe:compute_customers_prepared) inputs.

## Inputs locations defined with ***latitude*** and ***longitude*** columns
If input datasources locate rows with ***latitude*** and ***longitude***  columns: there is no need to geocode data. In that case recipes [compute_locations_prepared](recipe:compute_locations_prepared) and [compute_customers_prepared](recipe:compute_customers_prepared) directly takes datasets [locations_dataset](dataset:locations_dataset)  and [customers_dataset](dataset:customers_dataset) as inputs.

### Examples: 
*locations* branch example: 
![locations_lat_lon.png](dzsHvJMCftKA)

*customers* branch example: 
![customers_lat_lon.png](zvbyQvZwIwXx)


## Inputs locations defined with an ***address*** column
If input datasources locate rows with  an ***address*** column: addresses will first be geocoded, then the split recipes [split_locations_geocoded](recipe:split_locations_geocoded) and [split_customers_dataset](recipe:split_customers_dataset) will isolate rows where geocoding succeeded, leading to datasets [locations_well_geocoded](dataset:locations_well_geocoded) and [customers_well_geocoded](dataset:customers_well_geocoded). In that case recipes [compute_locations_prepared](recipe:compute_locations_prepared) and [compute_customers_prepared](recipe:compute_customers_prepared) directly takes these datasets as inputs.

### Examples: 
*locations* branch example: 
![locations_addresses.png](9CIMIMz44EUO)

*customers* branch example: 
![customers_addresses.png](glEk1NKmgK91)

## Wrapup
Both datasets, [locations_dataset](dataset:locations_dataset)  and [customers_dataset](dataset:customers_dataset), can have different ways to define a location. 
Below is an example where [locations_dataset](dataset:locations_dataset) rows are located with addresses while [customers_dataset](dataset:customers_dataset) rows are located with latitudes and longitudes: 
![locations_addresses&customers_lat_lon.png](ZAhJeaAQ3RCq)


# When geocoding is not needed
When geocoding is not needed (ex: one of your input datasets has *latitude* and *longitude* columns),  flow elements are put in the [Default](flow_zone:default) zone. [See the article about the Default zone](article:29)

