# Hands-On Tutorial: Geo Join[¶](https://knowledge.dataiku.com/latest/kb/data-prep/geospatial/geojoin-handson.html#hands-on-tutorial-geo-join "Permalink to this headline")

## Getting Started[¶](https://knowledge.dataiku.com/latest/kb/data-prep/geospatial/geojoin-handson.html#getting-started "Permalink to this headline")

In this tutorial, you’ll get hands-on practice with geospatial features by preparing and joining geospatial data to visualize credit card fraud activity on a map.

### Prerequisites[¶](https://knowledge.dataiku.com/latest/kb/data-prep/geospatial/geojoin-handson.html#prerequisites "Permalink to this headline")

You’ll need access to Dataiku version 11.1 or above (the free edition is enough). To get the free edition, visit the Get Started page, and click **Install Now** on the bottom left.

### Geospatial Features Used in This Tutorial[¶](https://knowledge.dataiku.com/latest/kb/data-prep/geospatial/geojoin-handson.html#geospatial-features-used-in-this-tutorial "Permalink to this headline")

When you’ve completed this tutorial, you’ll have a better understanding of the following features:

* **Create area around geopoint** processor

* **Geo Join** recipe

* **Density** map

* **geoMakeValid** formula

### What We’re Building[¶](https://knowledge.dataiku.com/latest/kb/data-prep/geospatial/geojoin-handson.html#what-were-building "Permalink to this headline")

We’ll start by building out an existing project by creating new geographic features. Then, we’ll further build out the pipeline by joining on these features. Then we’ll create a density map that can be published to a dashboard.

### How We’ll Build The Project[¶](https://knowledge.dataiku.com/latest/kb/data-prep/geospatial/geojoin-handson.html#how-well-build-the-project "Permalink to this headline")

Our goal is to build a density map of merchants at risk of credit card fraud. To do this, we’ll create geographic features using the Prepare recipe and then join datasets using the Geo Join recipe.

We’ll limit our analysis to a specific merchant subsector, “gas”; and only those transactions that are flagged as fraudulent.

By creating geographic features, we’ll be able to focus on visualizing where the fraudulent transactions are occurring. This will allow us to identify “merchants at risk”. In other words, we’ll be able to visualize the density of merchants at risk of being targeted by the user of the stolen credit card.

## Create and Explore the Project[¶](https://knowledge.dataiku.com/latest/kb/data-prep/geospatial/geojoin-handson.html#create-and-explore-the-project "Permalink to this headline")

In this section, we’ll create and explore the project. To create our project, we’ll open a tutorial that already has the input datasets we want to transform.

To open the tutorial:

* Sign in to your instance of Dataiku.

* From the Dataiku homepage, click on **+New Project > DSS Tutorials > General Topics > Geospatial Analysis: CC Fraud Data**.

Note

You can also download the starter project from this website and import it as a zip file.

Dataiku opens the Summary tab of the project, also known as the project homepage.

* Click **Go To Flow**.

* In the lower right corner, click **Flow Actions**, then select **Build all**.

* Click **Build**.

* Wait while Dataiku builds the Flow, then refresh your browser window.

## Explore the Flow[¶](https://knowledge.dataiku.com/latest/kb/data-prep/geospatial/geojoin-handson.html#explore-the-flow "Permalink to this headline")

You’ll notice that the project contains more than one flow zone. For this lesson, we’ll focus on the following datasets in the Geo Join flow zone:

* *gas\_sector\_unauth\_transactions*: This dataset has been prepared from *transactions\_joined* and it contains the geopoints for all unauthorized transactions belonging to the gas merchant subsector. We’ll further prepare this dataset to create areas around the geopoints.

* *gas\_sector\_merchants*: This dataset has been prepared from *merchants\_info* and it contains the geopoints for all merchants belonging to the gas merchant subsector.

## Create Geographic Areas Around the Unauthorized Transactions[¶](https://knowledge.dataiku.com/latest/kb/data-prep/geospatial/geojoin-handson.html#create-geographic-areas-around-the-unauthorized-transactions "Permalink to this headline")

To visualize the number of merchants at risk around each unauthorized transaction, we’ll first need to create a geographic area around each unauthorized transaction’s geopoint. For this, we’ll use the **Create area around geopoint** processor in the **Prepare recipe** to operate on the existing geopoints in the dataset.

Let’s define a 20-mile geographic area.

* Go to the **Flow**.

* In the Geo Join flow zone, click the *gas\_sector\_unauth\_transactions* dataset once to select it.

* Open the right-side panel to see the available **Actions**, then choose the **Prepare** recipe from Visual recipes.

* Keep the default output dataset name and click **Create Recipe**.

Now, we’ll add a step to the script.

* In the Script, click **+ Add a New Step**.

* In the processors library, search for `geo` and select **Create area around geopoint**.

* Select *transaction\_geopoint* as the Input column.

* Name the Output column `20\_miles`.

* Set the Shape to “Circle”.

* Define a Radius of `20`, and a Distance unit of “Miles”.

* **Save** your changes.

### Set Column Storage Type[¶](https://knowledge.dataiku.com/latest/kb/data-prep/geospatial/geojoin-handson.html#set-column-storage-type "Permalink to this headline")

Dataiku has inferred the storage type of our new geographic area column as “string”. Let’s set the storage type of the column to “Geometry” to match the values in the column.

* At the top of the *20\_miles* column, click the storage type, **string**, and select **Geometry / Geography** from the drop-down list.

* Likewise, set the storage type of *transactions\_geopoint* to **Geo Point**.

* **Save** and **Run** the recipe.

* Explore the output dataset.

### Optional: Ensure Valid Geometry[¶](https://knowledge.dataiku.com/latest/kb/data-prep/geospatial/geojoin-handson.html#optional-ensure-valid-geometry "Permalink to this headline")

We can apply a formula, “geoMakeValid”, to ensure the values in our geometry column are valid according to the Open Geospatial Consortium (OGC). To do this:

* In the Script, click **+ Add a New Step**.

* In the processors library, search for and select **Formula**.

* Name the Output column `20\_miles`.

* Open the editor panel and type `geoMakeValid(20\_miles)`.

* Apply the formula and **Run** the recipe.

Now that we have defined geographic areas around the geopoints of unauthorized transactions, we can use the Geo Join recipe to map merchants to the areas.

## Join Two Datasets Using Geospatial Data[¶](https://knowledge.dataiku.com/latest/kb/data-prep/geospatial/geojoin-handson.html#join-two-datasets-using-geospatial-data "Permalink to this headline")

In this section, we’ll create a dataset that includes all of the merchants located within each 20-mile geographic area. Later, we’ll create a chart to visualize the density of the merchants within these areas.

To create our dataset, we’ll use the Geo Join recipe.

* Return to the **Flow**.

* In the Geo Join flow zone, click the newly created *gas\_sector\_unauth\_transactions\_prepared* dataset to select it.

* Hold down the *Shift* key then click the *gas\_sector\_merchants* dataset to select it, too.

* Open the right-side panel and choose the **Geo join** recipe from Visual recipes.

Dataiku displays the **New geojoin recipe** window.

* Name the output dataset `merchants\_contained\_in\_20\_miles` instead of the default name.

* Click **Create Recipe**.

Note

If Dataiku displays “No available join columns”, return to your input datasets and ensure that the storage types of the geospatial columns (*transaction\_geopoint*, *merchant\_gas\_geopoint*, and *geometry*) are set to type Geo Point or Geometry / Geography.

Dataiku displays the Join step of the recipe. By default, Dataiku selects available geospatial columns to use for the join and specifies a join condition. Here, Dataiku has selected the *transaction\_geopoint* column from our left dataset as the join key. However, we’ll perform this join using the polygon we created.

Let’s configure the Join conditions.

* Click **transaction\_geopoint** to open the join conditions.

* Keep the default **Match when all conditions are satisfied**.

* Click **transaction\_geopoint** to open the configuration options.

* Set “Column from” to **20\_miles** and the join condition to **Contains**.

* Click **OK** to close the *Join conditions* window.

Let’s configure the columns in the output dataset, removing any that we don’t need.

* In the left panel, go to the **Selected columns** step.

* In *gas\_sector\_unauth\_transactions*, remove the *transaction\_geopoint* column.

* In *gas\_sector\_merchants*, remove the *merchant\_subsector\_description* column.

* **Save** and **Run** the recipe.

Let’s explore the output dataset.

* Open the output dataset to explore it.

* In the *20\_miles* column, right-click one of the values and choose **Preview**.

This feature lets you preview a polygon or point on a map.

## Create a Density Map[¶](https://knowledge.dataiku.com/latest/kb/data-prep/geospatial/geojoin-handson.html#create-a-density-map "Permalink to this headline")

In this section, we’ll build a type of chart known as a density map. The map takes as an input a column containing geo points and represents them in the form of a heat map. We can customize our map by selecting a color palette, configuring the intensity, and setting the radius of the circles for each point.

While a density map and scatter map may seem similar, we’ll soon discover how valuable a density map can be when plotting a large set of points on a map. Since the density map allows us to show where geopoints are concentrated on a map, we can quickly understand where, on a map, there are areas with a lot of points, or where there are areas with very few points.

Let’s create our map!

* Return to the **Flow**.

* Open the *merchants\_contained\_in\_20\_miles* dataset.

* Click on the **Charts** tab.

* Click the **Chart symbol** in the upper left to view the chart options and types.

* Choose the **Density map**.

* From the **Columns** panel on the left, drag and drop *merchant\_gas\_geopoint* to **Geo**.

* Drag and drop *purchase\_amount* to **Details** to set the heatmap’s intensity.

Let’s configure the radius of our circles, the color palette, and add a tooltip.

* Open the Color panel, then set the **Radius** to approximately “0.25”.

* Select **OrRd** from the Palette.

* Open the Tooltip panel, then drag and drop the *signature\_provided* column to set the tooltip.

You can zoom in on the map to see more details. You can also configure the sampling settings by clicking on **Sample**.

* Edit the title of the chart to `Density of Merchants at Risk`.

Here’s what the chart looks like when zoomed in.

Using our Density Map, we can now visualize the geographic relationship between merchant location and transactions flagged as fraudulent. More specifically, we can visualize the concentration of merchants at risk in each geographic area.
