count_tokens_approximately(
messages: Iterable[MessageLikeRepresentation],
*,
chars_per_token: float = 4.0,| Name | Type | Description |
|---|---|---|
messages* | Iterable[MessageLikeRepresentation] | List of messages to count tokens for. |
chars_per_token | float | Default: 4.0Number of characters per token to use for the approximation.
One token corresponds to ~4 chars for common English text.
You can also specify |
extra_tokens_per_message | float | Default: 3.0 |
count_name | bool | Default: True |
tokens_per_image | int | Default: 85 |
use_usage_metadata_scaling | bool | Default: False |
tools | list[BaseTool | dict[str, Any]] | None | Default: None |
Approximate the total number of tokens in messages.
The token count includes stringified message content, role, and (optionally) name.
Note:
This is a simple approximation that may not match the exact token count used by specific models. For accurate counts, use model-specific tokenizers.
For multimodal messages containing images, a fixed token penalty is applied per image instead of counting base64-encoded characters, which provides a more realistic approximation.
Number of extra tokens to add per message, e.g.
special tokens, including beginning/end of message.
You can also specify float values for more fine-grained control.
See more here.
Whether to include message names in the count.
Fixed token cost per image (default: 85, aligned with OpenAI's low-resolution image token cost).
If True, and all AI messages have consistent
response_metadata['model_provider'], scale the approximate token count
using the most recent AI message that has
usage_metadata['total_tokens']. The scaling factor is:
AI_total_tokens / approx_tokens_up_to_that_AI_message
List of tools to include in the token count. Each tool can be either
a BaseTool instance or a dict representing a tool schema. BaseTool
instances are converted to OpenAI tool format before counting.