Tracks model call counts and enforces limits.
This middleware monitors the number of model calls made during agent execution and can terminate the agent when specified limits are reached. It supports both thread-level and run-level call counting with configurable exit behaviors.
Thread-level: The middleware tracks the number of model calls and persists call count across multiple runs (invocations) of the agent.
Run-level: The middleware tracks the number of model calls made during a single run (invocation) of the agent.
ModelCallLimitMiddleware(
self,
*,
thread_limit: int | None = None,
run_limit: int | None = None,
exit_behavior: Literal['end', 'error'] = 'end'
)Example:
from langchain.agents.middleware.call_tracking import ModelCallLimitMiddleware
from langchain.agents import create_agent
# Create middleware with limits
call_tracker = ModelCallLimitMiddleware(thread_limit=10, run_limit=5, exit_behavior="end")
agent = create_agent("openai:gpt-4o", middleware=[call_tracker])
# Agent will automatically jump to end when limits are exceeded
result = await agent.invoke({"messages": [HumanMessage("Help me with a task")]})| Name | Type | Description |
|---|---|---|
thread_limit | int | None | Default: NoneMaximum number of model calls allowed per thread.
|
run_limit | int | None | Default: NoneMaximum number of model calls allowed per run.
|
exit_behavior | Literal['end', 'error'] | Default: 'end'What to do when limits are exceeded.
|
Start the shell session and run startup commands.
Async start the shell session and run startup commands.
Update the system message to include the todo system prompt.
Update the system message to include the todo system prompt.
Run shutdown commands and release resources when an agent completes.
Async run shutdown commands and release resources when an agent completes.
Intercept tool execution for retries, monitoring, or modification.
Intercept and control async tool execution via handler callback.