Input dataset

{{getTaskTypeHelper("inputColumn")}}
{{getTaskTypeHelper("outputColumn")}}
{{getTaskTypeHelper("groundTruthColumn")}}
{{getTaskTypeHelper("contextColumn")}}

Outputs

If empty, defaults to a random value. Limited to a-z, 0-9 and _. Specify same id as other Model Evaluation to overwrite.
If empty, defaults to the date and time of evaluation
Can contain variables in ${} notation, which are expanded when the recipe is run.
A Model Evaluation will include dynamically generated evaluation:date and evaluationDataset:dataset-name labels, that you may override there.

Evaluation metrics will be stored in the evaluation store and the metrics dataset. The highlighted metrics below are the recommended metrics associated with your task. A summary table on the right provides details on the required data for each metric. Learn more about metrics

Required columns by metric
Input Output Ground Truth Context
{{ contextWarning }}

LLM-as-a-judge metrics computation

Define LLM connection to use for metrics computation (see metrics help for when each is required).

Add custom evaluation metrics to score model on.

Warning: You do not have permission to run arbitrary code. The recipe will fail if it includes custom metrics and is run by a user not having this permission.

Python environment

Warning: Running this recipe with code-env {{ codeEnvWarning.envName }} may fail. {{ codeEnvWarning.reason }}

Container configuration

Execution configuration

Stop recipe execution if any metric produce an error. If disabled, metrics in error only produce empty values.

Completion LLM parameters

Metric configuration

If empty, defaults to bert-base-uncased
If empty, defaults to 13a
Sets the maximum number of requests in parallel for RAGAS metrics. If empty, defaults to {{defaultRagasMaxWorkers}}