Tracks model call counts and enforces limits.
This middleware monitors the number of model calls made during agent execution and can terminate the agent when specified limits are reached. It supports both thread-level and run-level call counting with configurable exit behaviors.
Thread-level: The middleware tracks the number of model calls and persists call count across multiple runs (invocations) of the agent.
Run-level: The middleware tracks the number of model calls made during a single run (invocation) of the agent.
ModelCallLimitMiddleware(
self,
*,
thread_limit: int | None = None,
run_limit: int | None = None,
exit_behavior: Literal['end', 'error'] = 'end'
)Example:
from langchain.agents.middleware.call_tracking import ModelCallLimitMiddleware
from langchain.agents import create_agent
# Create middleware with limits
call_tracker = ModelCallLimitMiddleware(thread_limit=10, run_limit=5, exit_behavior="end")
agent = create_agent("openai:gpt-4o", middleware=[call_tracker])
# Agent will automatically jump to end when limits are exceeded
result = await agent.invoke({"messages": [HumanMessage("Help me with a task")]})| Name | Type | Description |
|---|---|---|
thread_limit | int | None | Default: NoneMaximum number of model calls allowed per thread.
|
run_limit | int | None | Default: NoneMaximum number of model calls allowed per run.
|
exit_behavior | Literal['end', 'error'] | Default: 'end'What to do when limits are exceeded.
|
Logic to run before the agent execution starts.
Async logic to run before the agent execution starts.
Intercept and control model execution via handler callback.
Intercept and control async model execution via handler callback.
Logic to run after the agent execution completes.
Async logic to run after the agent execution completes.
Intercept tool execution for retries, monitoring, or modification.
Intercept and control async tool execution via handler callback.