interface ChatTogetherAIInputOmit<OpenAIChatInput, "openAIApiKey" | TogetherAIUnsupportedArgs>BaseChatModelParamsWhether to include the raw OpenAI response in the output message's "additional_kwargs" field. Currently in experimental beta.
Parameters for audio output. Required when audio output is requested with
modalities: ["audio"].
Learn more.
Whether to return log probabilities of the output tokens or not. If true, returns the log probabilities of each output token returned in the content of message.
The maximum number of tokens that can be generated in the chat completion. The total length of input tokens and generated tokens is limited by the model's context length. Set to 0 for the model's configured max generated tokens.
The maximum number of concurrent calls that can be made.
Defaults to Infinity, which means no limit.
The maximum number of retries that can be made for a single call, with an exponential backoff between each attempt. Defaults to 6.
Output types that you would like the model to generate for this request. Most models are capable of generating text, which is the default:
["text"]
The gpt-4o-audio-preview model can also be used to
generate audio. To request that
this model generate both text and audio responses, you can use:
["text", "audio"]
Model name to use. Available options are: qwen-turbo, qwen-plus, qwen-max, or Other compatible models.
Additional kwargs to pass to the model.
Model name to use. Available options are: qwen-turbo, qwen-plus, qwen-max, or Other compatible models.
Alias for model
Number of completions to generate for each prompt
Custom handler to handle failed attempts. Takes the originally thrown error object as input, and should itself throw an error if the input error is not retryable.
Messages to pass as a prefix to the prompt
Used by OpenAI to cache responses for similar requests to optimize your cache hit rates. Learn more.
Used by OpenAI to set cache retention time
Options for reasoning models.
Note that some options, like reasoning summaries, are only available when using the responses API. This option is ignored when not using a reasoning model.
Service tier to use for this request. Can be "auto", "default", or "flex" or "priority". Specifies the service tier for prioritization and latency optimization.
Stop tokens to use for this call. If not provided, the default stop tokens for the model will be used.
List of stop words to use when generating
Whether to stream the results or not. Defaults to false.
Whether or not to include token usage data in streamed chunks.
Whether the model supports the strict argument when passing in tools.
If undefined the strict argument will not be passed to OpenAI.
Amount of randomness injected into the response. Ranges from 0 to 1 (0 is not included). Use temp closer to 0 for analytical / multiple choice, and temp closer to 1 for creative and generative tasks. Defaults to 0.95.
Timeout for this call in milliseconds.
The TogetherAI API key to use for requests.
Alias for apiKey
An integer between 0 and 5 specifying the number of most likely tokens to return at each token position, each with an associated log probability. logprobs must be set to true if this parameter is used.
Total probability mass of tokens to consider at each step. Range from 0 to 1.0. Defaults to 0.8.
Unique string identifier representing your end-user, which can help OpenAI to monitor and detect abuse.
Whether to print out response text.
The verbosity of the model's response.
Must be set to true in tenancies with Zero Data Retention. Setting to true will disable
output storage in the Responses API, but this DOES NOT enable Zero Data Retention in your
OpenAI organization or project. This must be configured directly with OpenAI.
See: https://platform.openai.com/docs/guides/your-data https://platform.openai.com/docs/api-reference/responses/create#responses-create-store