**Goal**: The aim of Process Mining is to untangle raw process data to reveal how a process actually behaves, identify inefficiencies, and inform improvement actions.

This walkthrough demonstrates the approach using real-world data from a **loan application process** within a Dutch financial institution. The dataset is sourced from the BPIC 2012 Challenge (see [Resources](article:9)).

We’ll walk through the typical steps to analyze a process using the **Dataiku Process Mining** solution:

- [Prepare process data](#prepare-process-data-1)
- [Instantiate the solution](#instantiate-the-solution-1)
- [Customize to your needs](#customize-to-your-needs-1)
- [Uncover business insights](#uncover-business-insights-1)
- [Analyze conformance](#analyze-conformance-1)

We assume that the solution was already installed and that the [Technical Requirements](article:10) are met.

# Prepare Process Data

Each row in the event log corresponds to a single activity performed by a resource (human or system) on a case. Event logs and case attributes must be merged into a single file or table for use with the Process Mining solution. See more in the [Data Model](article:12).

We describe typical data preparation steps, followed by specific steps for the loan process. We recommend implementing this in a dedicated Dataiku project, separate from the Process Mining solution instances. Dataiku’s Visual Recipes greatly facilitate these steps, and the Visual Flow helps non-technical users understand the data preparation process.

## Typical Steps

- **Merge logs and attributes**: Stack multiple event log sources and join them with case-level attributes.
- **Filter incomplete cases**: Remove executions missing initial or final activities, often caused by time-window extraction.
- **Merge activity start/end rows**: If `start` and `complete` timestamps are stored in separate rows of the event logs, merge them to retain both start and end times.
- **Cap overlapping activities**: Prevent overlaps by adjusting end timestamps based on the next activity's start.
- **Remove noisy activities**: Strip system-generated or irrelevant events.
- **Add a `sequence_id` column**: Used to disambiguate activities with identical timestamps, especially if timestamp granularity is coarse.

## Loan Process: Specific Preparation

- Convert the original XES file to CSV using Python, so the data can be used in Dataiku.
- Translate activity names from Dutch to English.
- Merge start and end rows of the event log, corresponding to values of `lifecycle:transition` being `START` and `COMPLETE`.
- Remove rows where `lifecycle:transition` is equal to `SCHEDULE` and remove this column.
- Exclude activities: `A_SUBMITTED`, `A_PARTLYSUBMITTED`, `A_REGISTERED`, `A_ACTIVATED`.
- Remove rework events to simplify analysis.

## Resulting Dataset

| Attribute Type  | Columns                                                               |
| --------------- | --------------------------------------------------------------------- |
| Case-level      | `case_id`, `amount_requested`                                         |
| Execution-level | `execution_touch_time`, `execution_touches`                           |
| Event-level     | `step`, `start_timestamp`, `end_timestamp`, `resource`, `sequence_id` |

![Process Data.png](S5b0RU5LNmoM)

# Instantiate the Solution

The [Dataiku Application](article:7) acts as the configuration UI for launching a new instance of the solution. We use the built-in `workflow.csv` (containing the loan process data) under _Input Dataset - Option 1_.

In the configuration step:

- **Case ID** → `case_id`
- **Activity** → `step`
- **Timestamp** → `start_timestamp`
- **End Timestamp** → `end_timestamp`
- **Sorting Column** → `sequence_id`
- **Case Attributes** → `amount_requested`, `execution_touches`

Build the instance and open the [Data Quality dashboard](article:25) to validate data integrity.

Then, navigate to the **Process Mining dashboard** to view the **top 5 process variants**, which in this case represent ~75% of all executions. If the Process Graph is slow to load, we recommend creating a representative sample of cases and events and using it as input to the Solution. Insights found via interactive exploration of the Process Graph on sampled data can then be confirmed by generating static image representations of the graph on the whole data.

# Customize to Your Needs

## Dashboard

The Process Mining dashboard is built to be shared with subject matter experts. It will help them find insights that can then be passed to management or operations teams and lead to process improvement efforts. Tailor it to match your process KPIs and business context:

- **Customize the Flow**: Compute additional execution attributes (e.g. execution cost, SLA breach, profitability). Learn more about the existing [Flow](article:6).
- **Enhance dashboards**:
  - Add metrics in [Process Explorer](article:13) tied to execution attributes
  - Add filters using case attributes (e.g. `amount_requested` in the loan process).
- **Update Help Pages**: document your dashboard changes for business users, and your changes to the Flow.

> ⚠️ **Important**: Changes applicable to all Process Mining instances (e.g. default data connections) should be made in the source project (`SOL_PROCESS_MINING`) rather than in the instance projects.

## Process Graph

- The Process Graph is a Dash webapp whose code can be edited in [Process Mining Insight](web_app:MS4bzy9). 
- It leverages functions defined in the project's Libraries, under `lib/python/process_mining/`. This directory contains a `README.md` file with more information on the implementation of these functions and how to customize them.
- One example is computing and displaying Median (instead of Average) times.

# Uncover Business Insights

Business insights can emerge within minutes by exploring the **Process Explorer**. Use filters and toggle between **Frequency** and **Time Views** of the Top Variants graph.

## Example: Declined Applications

Filter on `activity_end = A_DECLINED`. We observe:

![Declined Applications.png](DdsHA4QzBraJ)

- **Top manual activities**:
  - Fixing Incoming Lead – 2.92K cases
  - Filling in application – 1.08K cases
  - Assessing Fraud – 52 cases
- **Time insights**:
  - Fixing Incoming Lead: starts after 5h, lasts 18 min
  - Filling in info: starts after 5h, lasts 14 min
  - Assessing Fraud: starts after 1d 8h, lasts 25 min

This highlights bottlenecks for rejected applications and helps prioritize process improvement efforts.

## Slice by Case Attributes

Focus on high-value applications (top 50% by `amount_requested`):

![Declined Applications 2.png](wnOmk4D91j7b)

- Execution time **doubles**.
- The top 5 variants become more complex.
- New costly activity: **Assessing the application**, avg. 31 min, starts after 4d 12h.

Learn more in [Process Explorer](article:13), and don’t miss deeper insights via [Case Explorer](article:15) and conformance scoring.

# Analyze Conformance

## Define a Reference Process

Work with business users to define the reference process model. If they can’t filter to display all reference variants cleanly, follow the guide on [Representing reference variants in the event log](article:34).

Use the [Run Conformance Checks](scenario:RUNCONFORMANCECHECKS) scenario to update conformance scores.

→ For loans: split into accepted and declined cases, create separate Process Mining instances, and define a reference model for each.

## Add Conformance KPIs and Charts

Add the following to the Process Mining dashboard:

- Metrics:
  - % of perfectly conforming cases
  - Average conformance score
- Charts:
  - Time series plot to track both metrics over time

The average score always exceeds the conformance share, as partial matches are scored >0. An historical view of compliance helps monitor compliance trends and identify periods of improvement or decline.

# Advanced Analytics

- **Root Cause Analysis**:
  - Have business users select an anomalous activity or transition in the Process Graph and export related process data.
  - Identify case attribute combinations that are most likely to lead to this anomaly.
- **Predictive Analytics**:
  - Model total execution time
  - Understand performance drivers by reviewing feature importance
  - Predict time remaining on ongoing cases
  - Optimize case routing for faster outcomes

→ Reach out to the Dataiku team for implementation support or demos.

# Automation

Keep dashboards always up to date and enable continuous process monitoring: use [Scenarios](article:19) to automate data refresh, rebuilds, and conformance scoring.

- Use **Scenario Triggers** to automatically refresh the dashboard when new data arrives.
- Let operations managers observe the results of process optimization efforts.
- Set up **Scenario Reporters** to alert them when KPIs degrade, and help them proactively address any issues that arise in the process.
