Class●Since v1.0

ModelCallLimitMiddleware

Tracks model call counts and enforces limits.

This middleware monitors the number of model calls made during agent execution and can terminate the agent when specified limits are reached. It supports both thread-level and run-level call counting with configurable exit behaviors.

Thread-level: The middleware tracks the number of model calls and persists call count across multiple runs (invocations) of the agent.

Run-level: The middleware tracks the number of model calls made during a single run (invocation) of the agent.

ModelCallLimitMiddleware(
  self,
  *,
  thread_limit: int | None = None,
  run_limit: int | None = None,
  exit_behavior: Literal['end', 'error'] = 'end'
)

Bases

AgentMiddleware[ModelCallLimitState[ResponseT], ContextT, ResponseT]

Example:

from langchain.agents.middleware.call_tracking import ModelCallLimitMiddleware
from langchain.agents import create_agent

# Create middleware with limits
call_tracker = ModelCallLimitMiddleware(thread_limit=10, run_limit=5, exit_behavior="end")

agent = create_agent("openai:gpt-4o", middleware=[call_tracker])

# Agent will automatically jump to end when limits are exceeded
result = await agent.invoke({"messages": [HumanMessage("Help me with a task")]})

Parameters

Name	Type	Description
`thread_limit`	`int \| None`	Default:`None` Maximum number of model calls allowed per thread. `None` means no limit.
`run_limit`	`int \| None`	Default:`None` Maximum number of model calls allowed per run. `None` means no limit.
`exit_behavior`	`Literal['end', 'error']`	Default:`'end'` What to do when limits are exceeded. `'end'`: Jump to the end of the agent execution and inject an artificial AI message indicating that the limit was exceeded. `'error'`: Raise a `ModelCallLimitExceededError`