Function●Since v1.1

modelCallLimitMiddleware

Creates a middleware to limit the number of model calls at both thread and run levels.

This middleware helps prevent excessive model API calls by enforcing limits on how many times the model can be invoked. It supports two types of limits:

Thread-level limit: Restricts the total number of model calls across an entire conversation thread
Run-level limit: Restricts the number of model calls within a single agent run/invocation

How It Works

The middleware intercepts model requests before they are sent and checks the current call counts against the configured limits. If either limit is exceeded, it throws a ModelCallLimitMiddlewareError to stop execution and prevent further API calls.

Use Cases

Cost Control: Prevent runaway costs from excessive model calls in production
Testing: Ensure agents don't make too many calls during development/testing
Safety: Limit potential infinite loops or recursive agent behaviors
Rate Limiting: Enforce organizational policies on model usage per conversation

modelCallLimitMiddleware(
  middlewareOptions: Partial<__type>
): AgentMiddleware<ZodObject<__type, "strip", ZodTypeAny, __type, __type>, ZodObject<__type, "strip", ZodTypeAny, __type, __type>, __type, readonly ClientTool | ServerTool[]>

Parameters

Name	Type	Description
`middlewareOptions`	`Partial<__type>`	Configuration options for the call limits

Example 1

import { createAgent, modelCallLimitMiddleware } from "langchain";

// Limit to 10 calls per thread and 3 calls per run
const agent = createAgent({
  model: "openai:gpt-4o-mini",
  tools: [myTool],
  middleware: [
    modelCallLimitMiddleware({
      threadLimit: 10,
      runLimit: 3
    })
  ]
});

Example 2

// Limits can also be configured at runtime via context
const result = await agent.invoke(
  { messages: ["Hello"] },
  {
    configurable: {
      threadLimit: 5  // Override the default limit for this run
    }
  }
);

View source on GitHub

modelCallLimitMiddleware

Creates a middleware to limit the number of model calls at both thread and run levels.

This middleware helps prevent excessive model API calls by enforcing limits on how many times the model can be invoked. It supports two types of limits:

Thread-level limit: Restricts the total number of model calls across an entire conversation thread
Run-level limit: Restricts the number of model calls within a single agent run/invocation

How It Works

Use Cases

Cost Control: Prevent runaway costs from excessive model calls in production
Testing: Ensure agents don't make too many calls during development/testing
Safety: Limit potential infinite loops or recursive agent behaviors
Rate Limiting: Enforce organizational policies on model usage per conversation

modelCallLimitMiddleware( middlewareOptions: Partial<__type> ): AgentMiddleware<ZodObject<__type, "strip", ZodTypeAny, __type, __type>, ZodObject<__type, "strip", ZodTypeAny, __type, __type>, __type, readonly ClientTool | ServerTool[]>

Name

Type

Description

middlewareOptions

Partial<__type>

Configuration options for the call limits

Example 1

import { createAgent, modelCallLimitMiddleware } from "langchain";

// Limit to 10 calls per thread and 3 calls per run
const agent = createAgent({
  model: "openai:gpt-4o-mini",
  tools: [myTool],
  middleware: [
    modelCallLimitMiddleware({
      threadLimit: 10,
      runLimit: 3
    })
  ]
});

// Limits can also be configured at runtime via context const result = await agent.invoke( { messages: ["Hello"] }, { configurable: { threadLimit: 5 // Override the default limit for this run } } );

modelCallLimitMiddleware

How It Works

Use Cases

Parameters

Example 1

Example 2

LangChain Assistant

Menu

modelCallLimitMiddleware

How It Works

Use Cases

Parameters

Example 1

Example 2

modelCallLimitMiddleware

How It Works

Use Cases

Used in Docs

Parameters

Example 1

Example 2

Menu

modelCallLimitMiddleware

How It Works

Use Cases

Used in Docs

Parameters

Example 1

Example 2