LangChain Reference home pageLangChain ReferenceLangChain Reference
  • GitHub
  • Main Docs
Deep Agents
LangChain
LangGraph
Integrations
LangSmith
  • Overview
    • Overview
    • Caches
    • Callbacks
    • Documents
    • Document loaders
    • Embeddings
    • Exceptions
    • Language models
    • Serialization
    • Output parsers
    • Prompts
    • Rate limiters
    • Retrievers
    • Runnables
    • Utilities
    • Vector stores
    MCP Adapters
    Standard Tests
    Text Splitters
    ⌘I

    LangChain Assistant

    Ask a question to get started

    Enter to send•Shift+Enter new line

    Menu

    OverviewCachesCallbacksDocumentsDocument loadersEmbeddingsExceptionsLanguage modelsSerializationOutput parsersPromptsRate limitersRetrieversRunnablesUtilitiesVector stores
    MCP Adapters
    Standard Tests
    Text Splitters
    Language
    Theme
    Pythonlangchain-coremessagesutilscount_tokens_approximately
    Function●Since v0.3

    count_tokens_approximately

    Approximate the total number of tokens in messages.

    The token count includes stringified message content, role, and (optionally) name.

    • For AI messages, the token count also includes stringified tool calls.
    • For tool messages, the token count also includes the tool call ID.
    • For multimodal messages with images, applies a fixed token penalty per image instead of counting base64-encoded characters.
    • If tools are provided, the token count also includes stringified tool schemas.
    Copy
    count_tokens_approximately(
      messages: Iterable[MessageLikeRepresentation],
      *,
      chars_per_token: float = 4.0,
      extra_tokens_per_message: float = 3.0,
      count_name: bool = True,
      tokens_per_image: int = 85,
      use_usage_metadata_scaling: bool = False,
      tools: list[BaseTool | dict[str, Any]] | None = None
    ) -> int

    Note:

    This is a simple approximation that may not match the exact token count used by specific models. For accurate counts, use model-specific tokenizers.

    For multimodal messages containing images, a fixed token penalty is applied per image instead of counting base64-encoded characters, which provides a more realistic approximation.

    Parameters

    NameTypeDescription
    messages*Iterable[MessageLikeRepresentation]

    List of messages to count tokens for.

    chars_per_tokenfloat
    Default:4.0

    Number of characters per token to use for the approximation. One token corresponds to ~4 chars for common English text. You can also specify float values for more fine-grained control. See more here.

    extra_tokens_per_messagefloat
    Default:3.0

    Number of extra tokens to add per message, e.g. special tokens, including beginning/end of message. You can also specify float values for more fine-grained control. See more here.

    count_namebool
    Default:True

    Whether to include message names in the count.

    tokens_per_imageint
    Default:85

    Fixed token cost per image (default: 85, aligned with OpenAI's low-resolution image token cost).

    use_usage_metadata_scalingbool
    Default:False

    If True, and all AI messages have consistent response_metadata['model_provider'], scale the approximate token count using the most recent AI message that has usage_metadata['total_tokens']. The scaling factor is: AI_total_tokens / approx_tokens_up_to_that_AI_message

    toolslist[BaseTool | dict[str, Any]] | None
    Default:None

    List of tools to include in the token count. Each tool can be either a BaseTool instance or a dict representing a tool schema. BaseTool instances are converted to OpenAI tool format before counting.

    View source on GitHub