This case study offers a walkthrough of the solution using gas compressors data. It showcases how to create an [interactive dashboard](dashboard:xCtFJN3) that leverages your maintenance data and documents to provide insights using the [Visual Machine Learning](https://knowledge.dataiku.com/latest/ml-analytics/custom-models/concept-custom-modeling.html) and [LLM Mesh](https://blog.dataiku.com/llm-mesh) features of Dataiku.

# User Story

Equipment reliability is crucial for manufacturers, ensuring responsible, safe, and consistent product production. Maintenance is key to mitigate unplanned downtime and ensure safe, continuous operation. However, finding the right time for maintenance is a challenge to balance operational and cost objectives. In many industries, maintenance is either reactive or driven by excessive time-based preventative routines, both of which are costly, erode asset performance and lower operational efficiency, costing billions each year. 

Time based preventative maintenance approaches and reactive fire fighting represent two default strategies that no longer need to be the norm. Using AI and ML, manufacturers can refine their maintenance tactics by leveraging service history and equipment attributes.  Techniques including survival analysis transform static time based maintenance schedules into tailored plans that reflect the true risk of mechanical failure by asset.
 
With Dataiku’s Maintenance Performance and Planning solution, organizations quickly turn vast volumes of maintenance history into optimized maintenance plans. Thanks to standard performance metrics like MTBF, MTTR and task paretos, Reliability Engineers easily explore their fleet behaviors with descriptive analytics. ML algorithms provide remaining useful life based on maintenance history and a recommended maintenance schedule per asset, allowing Service Managers to adjust strategies. Whether for internal equipment maintenance or improving customer service, Dataiku’s Maintenance Performance and Planning solution enables organizations to promptly revisit their manufacturing strategies. 

# Data

- **[Maintenance Operations Dataset](dataset:maintenance_operations)** spanning a year, encapsulates around 40 distinct maintenance operations such as "Cleaning", "Lubrication", "Sensor Check", and more. Each log entry details the date, type of maintenance, duration, and whether the operation was planned or unplanned.
    - `equipment_id` (_string_): Equipment's unique identifier
    - `equipment_stop_time` (_date_): Start of maintenance
    - `equipment_restart_time` (_date_): End of maintenance
    - `is_planned` (_boolean_): Indicates planned maintenance
    - `maintenance_operation` (_string_): Maintenance category. Entries should be predefined types or part names, not raw text.

- **[Equipments Information Dataset](dataset:equipment_information)** provides static details on each of the 200+ compressors, including attributes like location, OS version, manufacturer, maximum pressure, flow rate, and the date of engine installation.
    - `equipment_id` (_string_): Equipment's unique ID
    - `XXX` (_string_ / _date_ / _boolean_ / _float_): Include any additional columns essential for maintenance analysis. Feel free to add multiple columns as necessary.

**[Context Documents](dataset:context_documents)** contains all context documents necessary to understand the determinants of your maintenance performance.
- `aging.txt`: A generic text document regarding issues arising with aging industrial equipment and their maintenance consequences.
- `equipment_observations.csv`: A CSV file consisting of the equipment information dataset enriched with operators' observations. This showcases the ability for a Retrieval Augmented LLM to fetch information from datasets in file formats like CSV.
- `flow_rate.pdf`: A PDF file containing general information about how flow rate impacts maintenance performance for gas compressors.
- `manufacturer.pdf`: A competitive intelligence report on gas compressor manufacturers (these are fictitious companies).
- `os_version.pdf`: A report about the impact of OS versions on gas compressor maintenance, noting specific bugs, stability issues, etc.

# Insights

## Key Performance Indicators (KPIs):
![KPIs.png](qnWuTCyDFzkG)

In this instance, we see an average of 22 days between maintenance operations (both planned and unplanned), and an 8-hour resolution time, leading to 98% availability of our compressors. The number of maintenance operations typically ranges from 800 to 1100 throughout the year, but a substantial drop toward the end suggests potential missing data that warrants further investigation.

## Evolution of Targets:
![targets.png](yjoaeL90zFXJ)

Our goal of achieving a **Planned Maintenance Ratio** was not met throughout the year, averaging around 55%. Therefore, we will adhere to the solution's maintenance recommendations to reach the 75% target. An upward trend in our **Planned Maintenance Interval** is encouraging as it did not result in a decrease in the Planned Maintenance Ratio. This indicates that the extended interval did not contribute to an increase in unplanned service calls. With the optimized survival model, the **Optimized Planned Maintenance Interval** reaches 41 days. This suggests that by using a custom maintenance schedule for each compressor, we can achieve a 75% Planned Maintenance Ratio and slightly decrease the Planned Maintenance Frequency, showcasing the power of tailored schedule optimization based on each machine's configuration and history.

**Note:** This is the optimized default frequency. It should not be confused with the Remaining Useful Life computed in the subsequent section, which is based on the same model but also considers each equipment's current uptime to adjust the prediction.

## Maintenance Schedule
![planning.png](cvOjKYBXLZLa)

The final section displays a calendar outlining optimized maintenance schedules for each compressor to meet the target ratio. The table on the right enumerates the remaining useful life and current uptime for each compressor.

## Understanding Failure Patterns
The second slide of the dashboard enables understanding the failure patterns for each piece of equipment.

![univariate.png](6q9dPERS8zsx)

The first chart presents the survival curves of each piece of equipment, modeling the likelihood of the equipment operating without any unplanned maintenance depending on the current uptime since the last operation. Naturally, this likelihood decreases over time. The curve's shape can provide significant insights into the failure pattern over time. An early downward shape indicates early-life failures, while a flat curve that plunges after a lengthy period signifies wear-out type failures.

This analysis is available for every type of maintenance operation, aiding in understanding the various maintenance patterns.

## Identifying Risk Factors

The survival model can analyze each variable of the equipment information and discern how different values influence the risk of failure. This can be invaluable when aiming to enhance machine reliability.

![risk_multipliers.png](JJN7efmOs2y9)

In this example, the OS version has a significant impact on the hazard, with some versions (like v2.1 Vanguard and v3.0 Hydra) increasing the risk compared to the baseline (v1.0 Optis), while others (v1.1 Aegis) slightly reduce it.

Among the numerical factors, only the maximum flow rate shows a modest increase in risk, while max pressure and days since the last inspection do not appear to affect the risk.

This type of analysis can help prioritize updates or interventions, focusing on systems running riskier OS versions or operating under higher flow rates.

**Note:** The risk multipliers should not be interpreted as raw values but as relative risk compared to each other. They are always computed relative to an arbitrary baseline value.

## Automatic Reporting using GenAI

The solution showcases the [LLM Mesh capabilities](https://blog.dataiku.com/llm-mesh) by producing an automatic report to help interpret the machine learning results and establish links with your documents.

![genAI.png](gjE3TXuznG5s)

In this example, the statistical model established that the location "Houston" was significant. The LLM capabilities of the solution detected in the CSV file that most of the equipment stationed in Houston mentioned humidity and/or corrosion issues in the observations column.

![humidity.png](5GCbFDsM8sBD)
<center><i>First lines of the CSV file</i></center>

In this example, the LLM capabilities are able to detect other potential factors hindering maintenance performance, such as calibration issues with OS version 2.1 (mentioned in a PDF document) or transportation damage occurring to specific equipment.

# Conclusion

This case study demonstrates how Dataiku's Visual Machine Learning and LLM Mesh capabilities can leverage maintenance data and related documents to optimize maintenance schedules and enhance operational reliability. By using a combination of survival analysis and document analysis via LLM, organizations can identify failure patterns, assess risk factors, and improve planned maintenance ratios. The insights provided here not only help prioritize updates and interventions but also lay the groundwork for long-term machine reliability improvements based on tailored maintenance strategies. This approach can ultimately increases equipment availability, reduces downtime, and ensures more efficient maintenance planning.