Creates a middleware to limit the number of model calls at both thread and run levels.
This middleware helps prevent excessive model API calls by enforcing limits on how many times the model can be invoked. It supports two types of limits:
The middleware intercepts model requests before they are sent and checks the current call counts
against the configured limits. If either limit is exceeded, it throws a ModelCallLimitMiddlewareError
to stop execution and prevent further API calls.
modelCallLimitMiddleware(
middlewareOptions: Partial<__type>
): AgentMiddleware<ZodObject<__type, "strip", ZodTypeAny, __type, __type>, ZodObject<__type, "strip", ZodTypeAny, __type, __type>, __type, readonly ClientTool | ServerTool[]>import { createAgent, modelCallLimitMiddleware } from "langchain";
// Limit to 10 calls per thread and 3 calls per run
const agent = createAgent({
model: "openai:gpt-4o-mini",
tools: [myTool],
middleware: [
modelCallLimitMiddleware({
threadLimit: 10,
runLimit: 3
})
]
});// Limits can also be configured at runtime via context
const result = await agent.invoke(
{ messages: ["Hello"] },
{
configurable: {
threadLimit: 5 // Override the default limit for this run
}
}
);