# Data Used in the Project 
This sample project uses two datasets, containing measurements from a batch-wise clean-in-place (CIP) process. CIP systems are automated solutions designed for industries like food, pharmaceutical and biotechnology. These systems clean the interior surfaces of manufacturing equipment, including pipes, vessels, tanks, bioreactors, fermenters, and associated fittings, without requiring disassembly of the process setup. 
This is data taken from a real company's manufacturing process, so further details about the industry and process cannot be shared due to confidentiality. 

The following datasets are used:
## batch_data
[Wide format](https://knowledge.dataiku.com/latest/ml-analytics/time-series/concept-data-types-formats.html#wide-format) dataset downloaded from an SQL database using Dataiku and then converted to CSV in order to make it accessible for this sample project. See details on SQL connection capabilities of Dataiku [here](https://knowledge.dataiku.com/latest/data-sourcing/connections/concept-sql-connections.html). The dataset contains the start and end timestamps of each cleaning run/batch, which machine was cleaned, and whether or not it failed cleanliness inspection.
| **Column Name** | **Type**       | **Description**                                                                                             |
|------------------|---------------|-------------------------------------------------------------------------------------------------------------|
| batch_id         | `String`      | Identifier for cleaning batch or process run.                                                              |
| equipment_id     | `String`      | Identifier for the machine that was processed.                                                             |
| start_time       | `String`      | Start time of the batch.                                                                                   |
| end_time         | `String`      | End time of the batch.                                                                                     |
| recipe           | `String`      | Type of cleaning performed, depending on the contents in the machine before cleaning, and other factors.   |
| failure          | `Integer`     | Indicator of whether the cleaning process passed inspection (`1 = fail`, `0 = pass`).                      |

## sensor_data
[Long format](https://knowledge.dataiku.com/latest/ml-analytics/time-series/concept-data-types-formats.html#long-format) dataset manually extracted from a historian database as a CSV and then imported to Dataiku. Dataiku cannot connect directly to historians - instead, data needs to be extracted from the historians and then imported to Dataiku. The exception is OsiSoftPI, for which a [connection](https://www.dataiku.com/product/plugins/pi-system/) is available. The dataset contains the continuous measurements from 9 types of sensors installed in each equipment unit (two machines total for this process).
| **Column Name**   | **Type**    | **Description**                                                                                                                                            |
|--------------------|-------------|------------------------------------------------------------------------------------------------------------------------------------------------------------|
| equipment_id       | `String`   | Identifier for the machine that was processed.                                                                                                             |
| sensor_id          | `String`   | Identifier for the sensor whose measurement is contained in the respective row (there are 9 different sensors, measuring various parameters as indicated). |
| timestamp          | `String`   | Timestamp of the sensor reading.                                                                                                                           |
| sensor_value       | `Double`   | Measurement of the sensor.                                                                                                                                 |

See [documentation](https://doc.dataiku.com/dss/latest/connecting/connections.html) for full list of supported database connections. 


## Other learning resources

Depending on your objectives, there are many ways to learn about Dataiku.

- [Dataiku Academy](https://academy.dataiku.com/) for guided learning paths and certifications
- [Dataiku Community](https://community.dataiku.com/) for discussions and user programs
- [Reference Documentation](https://doc.dataiku.com/dss/latest/) for comprehensive specifications of Dataiku
- [Knowledge Base](https://knowledge.dataiku.com/latest/) for articles and tutorials on Dataiku features
- [Developer Guide](https://developer.dataiku.com/latest/) for API reference and other resources for coders