Parameter governing the randomness in the responses, 0 for more predictable, 0.5 to 1.0 for balanced creativity.
Number of tokens to pick from when sampling.
Sample from the top tokens whose probabilities add up to p.
Maximum tokens to output per row (in English, 1 word ~ 1.37 tokens)
Penalty applied on the next token proportional to how many times that token already appeared in the response and prompt.
Penalty to apply on repeated tokens, no matter how many times the token already appeared in the response and prompt.
Request the model to generate a valid JSON response, for models that support it.