Logic to run before the agent execution starts.
Async logic to run before the agent execution starts.
Intercept and control model execution via handler callback.
Intercept and control async model execution via handler callback.
Logic to run after the agent execution completes.
Async logic to run after the agent execution completes.
Intercept tool execution for retries, monitoring, or modification.
Intercept and control async tool execution via handler callback.
Tracks model call counts and enforces limits.
This middleware monitors the number of model calls made during agent execution and can terminate the agent when specified limits are reached. It supports both thread-level and run-level call counting with configurable exit behaviors.
Thread-level: The middleware tracks the number of model calls and persists call count across multiple runs (invocations) of the agent.
Run-level: The middleware tracks the number of model calls made during a single run (invocation) of the agent.
Example:
from langchain.agents.middleware import ModelCallLimitMiddleware
from langchain.agents import create_agent
# Create middleware with limits
call_tracker = ModelCallLimitMiddleware(thread_limit=10, run_limit=5, exit_behavior="end")
agent = create_agent("openai:gpt-5.5", middleware=[call_tracker])
# Agent will automatically jump to end when limits are exceeded
result = await agent.invoke({"messages": [HumanMessage("Help me with a task")]})What to do when limits are exceeded.
'end': Jump to the end of the agent execution and
inject an artificial AI message indicating that the limit was
exceeded.'error': Raise a ModelCallLimitExceededError