• Model settings
  • Inference settings
  • Deployment settings
  • In $/1k tokens
    In $/1k tokens
    In $/1k tokens
    Set the maximum number of tokens that can be processed in a single request (including both input and output). Reducing the context length can help avoid out-of-memory errors. Set the maximum number of tokens that can be processed in a single request.
    Set the maximum number of images allowed in a single query. Reducing this parameter can help avoid out-of-memory errors with multimodal models.
    Tools are not supported on all models. Please refer to vLLM documentation for more details.
    Required
    Overriding the default chat template is not recommended in general. However, some models require a custom chat template in order to support tool calling.