Ask a question to get started
Enter to send•Shift+Enter new line
ChatLiteLLM()
BaseChatModel
Model name to use.
Run inference with this temperature. Must be in the closed interval [0.0, 2.0].
Holds any model parameters valid for API call not explicitly specified.
Decode using nucleus sampling: consider the smallest set of tokens whose probability sum is at least top_p. Must be in the closed interval [0.0, 1.0].
Decode using top-k sampling: consider the set of top_k most probable tokens. Must be positive.
Number of chat completions to generate for each prompt. Note that the API may not return the full n completions if duplicates are generated.
The maximum number of tokens to generate in the reply.
Context window size (e.g. for Ollama models).
Use tenacity to retry the completion call.
Use tenacity to retry the async completion call.
Validate api key, python package exists, temperature, top_p, and top_k.
Bind tool-like objects to this chat model.
LiteLLM expects tools argument in OpenAI format.
Chat model that uses the LiteLLM API.