| Name | Type | Description |
|---|---|---|
identifier Union[int, str]* | unknown | |
token_ratelimit Optional[Ratelimit]* | unknown | |
request_ratelimit Optional[Ratelimit]* | unknown | |
include_output_tokens bool* | unknown |
Callback to handle rate limiting based on the number of requests or the number of tokens in the input.
It uses Upstash Ratelimit to track the ratelimit which utilizes Upstash Redis to track the state.
Should not be passed to the chain when initialising the chain. This is because the handler has a state which should be fresh every time invoke is called. Instead, initialise and pass a handler every time you invoke.
the identifier
Ratelimit to limit the number of tokens. Only works with OpenAI models since only these models provide the number of tokens as information in their output.
Ratelimit to limit the number of requests
Whether to count output tokens when
rate limiting based on number of tokens. Only used when
token_ratelimit is passed. False by default.
Run when chain starts running.
on_chain_start runs multiple times during a chain execution. To make
sure that it's only called once, we keep a bool state _checked. If
not self._checked, we call limit with request_ratelimit and raise
UpstashRatelimitError if the identifier is rate limited.
Run when LLM starts running
Run when LLM ends running
If the include_output_tokens is set to True, number of tokens
in LLM completion are counted for rate limiting
Creates a new UpstashRatelimitHandler object with the same ratelimit configurations but with a new identifier if it's provided.
Also resets the state of the handler.