# Process Graph

This repository contains code for the web application that powers the Process Graph in Dataiku Process Mining. It leverages Dash and Dash Bootstrap Components.

In this document, we provide an overview of how this repository is used, file descriptions, and examples of customizations.

## Usage

The repository is added to the SOL_PROCESS_MINING project's **Libraries** as the `process_mining/` directory under `lib/python/`. It provides Python classes which are imported into the project's **Process Mining Insight** Dash webapp.

## File Descriptions

- **assets/app.css**: Contains the CSS styles for the application.
- **data_management.py**: Functions for managing data, including reading and writing pickled data.
- **process_mining/**: Functions for processing and analyzing workflow data.
  - **filtering.py**: Filtering workflow data based on various criteria.
  - **graph_creation.py**: Creating graphs with Graphviz's [`Digraph`](https://graphviz.readthedocs.io/en/stable/api.html#graphviz.Digraph) class.
  - **process_mining.py**: Mining process data from workflow logs.
  - **variants.py**: Handling process variants.
- **webapp/**: Functions to use processed data in the Dash web application.
  - **attribute_selectors.py**: Creating attribute filters.
  - **dashboard_filtering.py**: Filtering event logs based on Dashboard Filters.
  - **explanation.py**: Explanations for different sections of the application.
  - **export_modal.py**: Modals for the Export Dataset functionality.
  - **filter_processing.py**: Processing filters applied to the workflow data.
  - **formatting.py**: Utility functions for formatting data.
  - **header.py**: Application header.
  - **layout.py**: Application layout.
  - **legend.py**: Legends for the process graphs.
  - **process_management.py**: Managing processes, including saving and clearing datasets.
  - **save_modal.py**: Modals for the Save Reference Process and Export Visualization functionalities.
  - **selected_elements.py**: Creating cards for selected elements in the process graph.
  - **selectors.py**
  - **sink.py**: Creating the SINK meta-activity in the graph.
  - **styles.py**: Style definitions for various components in the application.
  - **variants.py**: Creating variant filters and containers.

For more information, please refer to the individual files and their respective docstrings and comments.

## Example Customizations

### Computing Median instead of Average Times

To compute and show median times in the Time View of the Process Graph (instead of average times), had over to the `mine_process` function of [`process_mining.py`](process_mining/process_mining.py) and replace functions that calculate averages with functions that compute medians.

When `mine_process` is called for a "time" view of transitions, it computes time values attached to...

- Transitions -> Change "mean" by "median" in the code used to aggregate `diff_time` values (time difference between an activity and the previous one):

```python
dfg = (
  workflow_df[["origin", "activity", "diff_time"]]
  .groupby(["origin", "activity"])
  .agg(weight=("diff_time", "mean"), frequency=("activity", "count"))
  .reset_index()
)
```

- Activities; we distinguish between two cases:

  - End timestamps available -> Change `.mean()` by `.median()` in the code that aggregates the `process_time` values (time spent in an activity):

  ```python
  activity_count = workflow_df.groupby('activity')['process_time'].mean().reset_index()
  ```

  - End timestamps NOT available -> The average time spent in an activity as the weighted average of the time spent between the start of this activity and the start of the next one. We now want to compute the weighted median instead of the average. Change `np.average` by `np.median` in the code below.

  ```python
  weighted_average = lambda x: np.average(
    x, weights=dfg.loc[x.index, "frequency"]
  )
  activity_count = (
    dfg.groupby("source")
    .agg(weighted_time=("weight", weighted_average))
    .reset_index()
  )
  ```
