# Project Goal

The goal of this project is to extract various POI (Points Of Interests) from Open Data sources (OSM and Foursquare) to be able to do a relevant geographical segmentation of Paris.

# How do we do this

<p  class="text-center"> <a href="/projects/DKU_GEO_CLUS_PARIS/flow/" class="btn
btn-datasets-color btn-cta-big-mod"><i class="icon-dku-sample_project"
class="btn-cta-big-mod-icon" />FLOW</a> </p>

We have two datasources: Open Street Maps and Foursquare

- We assume that we already have the tables **[Ways](/projects/DKU_GEO_CLUS_PARIS/datasets/ways/explore/)** and **[Nodes](/projects/DKU_GEO_CLUS_PARIS/datasets/nodes/explore/)**  from OSM. They contain all the information about streets and buildings in Paris with their localization.

- We retrieve **[Foursquare data](/projects/DKU_GEO_CLUS_PARIS/datasets/foursquare_iris/explore/)** from their Public API

We will use the **[IRIS dataset](https://www.data.gouv.fr/fr/datasets/contours-iris/)** from the French statistics institute as a grid for the city of Paris. They represent "small neighborhoods" encompassing 1,800 to 5,000 inhabitants.

All the POI retrieved from OSM and Foursquare can be associated to a specific **[IRIS location](/projects/DKU_GEO_CLUS_PARIS/datasets/CONTOURS_IRIS_PARIS/explore/)** of Paris.

We will compute **[features](/projects/DKU_GEO_CLUS_PARIS/datasets/heatmap_iris_osm_joined/explore/)** for each IRIS zone of Paris based on aggregations of the POI we retrieved. We aggregate businesses and locations based on their type. So for example we have for each zone the number of food-related locations, from both OSM and Foursquare.

We then create a segmentation with a **[KMEANS](/projects/DKU_GEO_CLUS_PARIS/savedmodels/IdalnavA/versions/)** clustering.

# Explore the project
- You can first look at the flow to understand the global structure of the project

<p  class="text-center"> <a href="/projects/DKU_GEO_CLUS_PARIS/flow/" class="btn
btn-datasets-color btn-cta-big-mod"><i class="icon-dku-sample_project"
class="btn-cta-big-mod-icon" />FLOW</a> </p>

**Tag: IRIS**
- IRIS data is uploaded, and we use a preparation recipe to recover IRIS information for the city of Paris only, clean it and generate some geopoints.
<p  class="text-center"> <a href="/projects/DKU_GEO_CLUS_PARIS/datasets/CONTOURS_IRIS_PARIS/explore/" class="btn
btn-datasets-color btn-cta-big-mod"><i class="icon-dku-sample_project"
class="btn-cta-big-mod-icon" />Paris IRIS data</a> </p>

## Foursquare part of the Flow
**Tag: foursquare_data**

- You can have a look at how we extract Foursquare data with their public API. (We need to specify at the beginning of code a public and private client id, and a directory where you want to store the data).

<p  class="text-center"> <a href="/projects/DKU_GEO_CLUS_PARIS/recipes/compute_foursquare_iris/" class="btn
btn-datasets-color btn-cta-big-mod"><i class="icon-dku-sample_project"
class="btn-cta-big-mod-icon" />Get Foursquare data</a> </p>

- You can [look at the raw data](datasets/foursquare_iris/explore/) or have a look at the steps we use to clean it in a prepare recipe:

<p  class="text-center"> <a href="/projects/DKU_GEO_CLUS_PARIS/recipes/compute_foursquare_iris_prepared/" class="btn
btn-datasets-color btn-cta-big-mod"><i class="icon-dku-sample_project"
class="btn-cta-big-mod-icon" />Cleaning</a> </p>

- We then aggregate the data at the zone level with a visual grouping recipe.

<p  class="text-center"> <a href="/projects/DKU_GEO_CLUS_PARIS/recipes/group_poi_foursquare/" class="btn
btn-datasets-color btn-cta-big-mod"><i class="icon-dku-sample_project"
class="btn-cta-big-mod-icon" />Aggregation: grouping recipe </a> </p>

## OSM part of the Flow
**Tag: osm_data**

- You can have a look at how we extract POI from OSM data. It is a rather complicated SQL script that checks the tags of the tables ways and nodes to allocate them into several pre-defined categories.

<p  class="text-center"> <a href="/projects/DKU_GEO_CLUS_PARIS/recipes/compute_poi_osm/" class="btn
btn-datasets-color btn-cta-big-mod"><i class="icon-dku-sample_project"
class="btn-cta-big-mod-icon" />Extract POI from OSM</a> </p>

- Then, we do a Geo-Join between the POI we retrieved and the IRIS locations, and perform the same aggregation as for Foursquare data. We perform all of this in a single SQL query using **[Postgis](http://www.postgis.net/)** which is a very powerful Postgres Library to compute geographical queries.

<p  class="text-center"> <a href="/projects/DKU_GEO_CLUS_PARIS/recipes/compute_heatmap_iris_osm/" class="btn
btn-datasets-color btn-cta-big-mod"><i class="icon-dku-sample_project"
class="btn-cta-big-mod-icon" />Geo Join and Aggregation</a> </p>

## Segmentation
**Tag: clustering data and model**

- We start with a visual join between our foursquare and OSM data.

<p  class="text-center"> <a href="/projects/DKU_GEO_CLUS_PARIS/recipes/join_heatmap_iris_osm/" class="btn
btn-datasets-color btn-cta-big-mod"><i class="icon-dku-sample_project"
class="btn-cta-big-mod-icon" />Visual join</a> </p>

- The data we will use for clustering is now complete:
<p  class="text-center"> <a href="/projects/DKU_GEO_CLUS_PARIS/datasets/heatmap_iris_osm_joined/explore/" class="btn
btn-datasets-color btn-cta-big-mod"><i class="icon-dku-sample_project"
class="btn-cta-big-mod-icon" />See the final dataset</a> </p>

- We perform clustering on these data, after several tries we decide to choose a K-means algorithm with 5 clusters. Using the Numerical Heatmap, we are able to characterize our clusters and rename them accordingly.

We find the following clusters:
1. Residential: majority of residential places
2. Residential with services: residential areas with a good number of shops
3. Parks or low density: very few places of interest on average but parkings and commuting facilities
4. Touristic areas & hubs: locations with an unusually large activity (museums, trainstations...)
5. Getting out: large number of restaurants and clubs


<p  class="text-center"> <a href="/projects/DKU_GEO_CLUS_PARIS/savedmodels/IdalnavA/versions/" class="btn
btn-datasets-color btn-cta-big-mod"><i class="icon-dku-sample_project"
class="btn-cta-big-mod-icon" />Model</a> </p>

## Visualization

- After deploying our model in the flow and scoring the data, we are able to visualize the results on a chart. If you know Paris a little bit, you can have a look at the map we published on the dashboard to see how relevant this clustering is.

<p  class="text-center"> <a href="/projects/DKU_GEO_CLUS_PARIS/dashboards/IPqiwYn_paris-clustering-maps/view/ZOXUuqw" class="btn
btn-datasets-color btn-cta-big-mod"><i class="icon-dku-sample_project"
class="btn-cta-big-mod-icon" />Dashboard</a> <br><br> We believe you'll find it to be pretty accurate!</p>