# Data Sources requirements

This solution relies on **purchase order and delivery history** to analyze supplier spend, monitor performance, and score delay risk on open orders. One dataset is mandatory (orders history), with an optional enrichment dataset for duty/tariff context.

## Purchase Orders & Delivery Dataset (mandatory)

This solution requires a primary dataset containing historical purchase orders and their delivery outcomes, referred to as the **orders dataset**.

**[Orders Dataset schema](dataset:input_supplier_orders)**

### Required columns

- `TRANSACTION_ID` (_string_): Unique identifier for the order line / transaction.
- `PURCHASE_ORDER_ID` (_string_): Purchase order identifier.
- `PART_NUMBER` (_string_): Part identifier.
- `SUPPLIER_NAME` (_string_): Supplier name.
- `PO_ORIGINAL_QTY` (_integer_): Ordered quantity.
- `PO_DELIVERED_QTY` (_integer_): Delivered quantity.
- `PO_OPEN_QTY` (_integer_): Remaining open quantity.
- `PO_CONFIRMED_QTY` (_integer_): Confirmed quantity (if applicable).
- `PO_UN_CONFIRMED_QTY` (_integer_): Unconfirmed quantity (if applicable).
- `PO_REQUEST_DELIVERY_DATE` (_date/timestamp_): Requested delivery date.
- `PO_ACTUAL_DELIVERY_DATE` (_date/timestamp_): Actual delivery date (may be null for open orders).
- `ORDER_STATUS` (_string_): Order status (used to derive/open orders and filtering).
- `IS_DELAY` (_boolean_): Whether the order is delayed (training target / KPI).
- `DELAY` (_integer_): Delay duration (e.g., days), used for analysis and KPIs.
- `ITEM_UNIT_PRICE` (_float_): Unit price.
- `TRANSACTION_TOTAL_COST` (_float_): Transaction cost (used for spend analysis).
- `SAFETY_STOCK_QTY` (_integer_): Safety stock quantity.
- `ON_HAND_STOCK` (_integer_): Current stock on hand.
- `COUNTRY_OF_ORIGIN` (_string_): Supplier country of origin (also used for tariff enrichment).
- `CITY` (_string_): Location metadata (optional).
- `POSTAL_CODE` (_string_): Location metadata (optional).

### Notes

- Each row should represent a purchase order line / transaction.
- Open orders are typically identified from a combination of `ORDER_STATUS`, `PO_OPEN_QTY`, and/or a missing `PO_ACTUAL_DELIVERY_DATE`.
- Date columns should be parseable as timestamps and consistent in timezone/format.
- **Column names need to match these exactly**

Example schema:

| TRANSACTION_ID | PURCHASE_ORDER_ID | PART_NUMBER     | SUPPLIER_NAME | PO_ORIGINAL_QTY | PO_REQUEST_DELIVERY_DATE | PO_ACTUAL_DELIVERY_DATE | IS_DELAY | DELAY | ORDER_STATUS | TRANSACTION_TOTAL_COST |
|---|---|---|---|---:|---|---|---:|---:|---|---:|
| TX_001 | PO_987 | RADIATOR_089 | SPEEDLINE | 120 | 2025-01-15 | 2025-01-20 | true | 5 | DELIVERED | 15420.0 |

---

## Duty / Tariff Dataset (optional)

This dataset provides duty/tariff enrichment to add trade context to historical orders analysis and to improve explainability of supplier comparisons.

**[Duty/Tariff Dataset schema](dataset:Updated_duty_tariffs)**

### Required columns

- `PART_NUMBER` (_string_): Part identifier (must align with orders dataset).
- `SUPPLIER_NAME` (_string_): Supplier name (must align with orders dataset).
- `COUNTRY_OF_ORIGIN` (_string_): Country of origin (must align with orders dataset).
- `HTS_NUMBER` (_string_): Harmonized Tariff Schedule / classification code.
- `IMPORT_DUTY_RATE` (_float_): Duty rate applied to the part (as a decimal or percentage, depending on your conventions).

### Notes

- This dataset is typically joined to the orders history using a combination of `PART_NUMBER`, `SUPPLIER_NAME`, and `COUNTRY_OF_ORIGIN`.

Example schema:

| PART_NUMBER     | SUPPLIER_NAME | COUNTRY_OF_ORIGIN | HTS_NUMBER | IMPORT_DUTY_RATE |
|---|---|---|---|---:|
| RADIATOR_089 | SPEEDLINE | CN | 8708.91 | 0.045 |