index

Ask a question to get started

variable

MIDDLEWARE_BRAND: unique symbol

Unique symbol used to brand middleware instances. This prevents functions from being accidentally assignable to AgentMiddleware since functions have a 'name' property that would otherwise make them structurally compatible.

variable

TODO_LIST_MIDDLEWARE_SYSTEM_PROMPT

namespace

ContentBlock

LangChain Messages

Creates a prompt caching middleware for Anthropic models to optimize API usage.

This middleware automatically adds cache control headers to the last messages when using Anthropic models, enabling their prompt caching feature. This can significantly reduce costs for applications with repetitive prompts, long system messages, or extensive conversation histories.

How It Works

The middleware intercepts model requests and adds cache control metadata that tells Anthropic's API to cache processed prompt prefixes. On subsequent requests with matching prefixes, the cached representations are reused, skipping redundant token processing.

Benefits

Cost Reduction: Avoid reprocessing the same tokens repeatedly (up to 90% savings on cached portions)
Lower Latency: Cached prompts are processed faster as embeddings are pre-computed
Better Scalability: Reduced computational load enables handling more requests
Consistent Performance: Stable response times for repetitive queries

Apply strategy to content based on matches

Middleware that automatically prunes tool results to manage context size.

This middleware applies a sequence of edits when the total input token count exceeds configured thresholds. By default, it uses the ClearToolUsesEdit strategy which mirrors Anthropic's clear_tool_uses_20250919 behaviour by clearing older tool results once the conversation exceeds 100,000 tokens.

Basic Usage

Use the middleware with default settings to automatically manage context:

Default token counter that approximates based on character count.

If tools are provided, the token count also includes stringified tool schemas.

Creates a production-ready ReAct (Reasoning + Acting) agent that combines language models with tools and middleware to create systems that can reason about tasks, decide which tools to use, and iteratively work towards solutions.

The agent follows the ReAct pattern, interleaving reasoning steps with tool calls to iteratively work towards solutions. It can handle multiple tool calls in sequence or parallel, maintain state across interactions, and provide auditable decision processes.

Core Components

Model

The reasoning engine can be specified as:

String identifier: "openai:gpt-4o" for simple setup
Model instance: Configured model object for full control
Dynamic function: Select models at runtime based on state

Tools

Tools give agents the ability to take actions:

Pass an array of tools created with the tool function
Or provide a configured ToolNode for custom error handling

Prompt

Shape how your agent approaches tasks:

String for simple instructions
SystemMessage for structured prompts
Function for dynamic prompts based on state

Middleware

Middleware allows you to extend the agent's behavior:

Add pre/post-model processing for context injection or validation
Add dynamic control flows, e.g. terminate invocation or retries
Add human-in-the-loop capabilities
Add tool calls to the agent
Add tool results to the agent

Advanced Features

Structured Output: Use responseFormat with a Zod schema to get typed responses
Memory: Extend the state schema to remember information across interactions
Streaming: Get real-time updates as the agent processes

Creates a middleware instance with automatic schema inference.

Detect credit card numbers in content (validated with Luhn algorithm)

Detect email addresses in content

Detect IP addresses in content (validated)

Detect MAC addresses in content

Detect URLs in content

Dynamic System Prompt Middleware

Allows setting the system prompt dynamically right before each model invocation. Useful when the prompt depends on the current agent state or per-invocation context.

Middleware for selecting tools using an LLM-based strategy.

When an agent has many tools available, this middleware filters them down to only the most relevant ones for the user's query. This reduces token usage and helps the main model focus on the right tools.

Creates a middleware to limit the number of model calls at both thread and run levels.

This middleware helps prevent excessive model API calls by enforcing limits on how many times the model can be invoked. It supports two types of limits:

Thread-level limit: Restricts the total number of model calls across an entire conversation thread
Run-level limit: Restricts the number of model calls within a single agent run/invocation

How It Works

The middleware intercepts model requests before they are sent and checks the current call counts against the configured limits. If either limit is exceeded, it throws a ModelCallLimitMiddlewareError to stop execution and prevent further API calls.

Use Cases

Cost Control: Prevent runaway costs from excessive model calls in production
Testing: Ensure agents don't make too many calls during development/testing
Safety: Limit potential infinite loops or recursive agent behaviors
Rate Limiting: Enforce organizational policies on model usage per conversation

Middleware that provides automatic model fallback on errors.

This middleware attempts to retry failed model calls with alternative models in sequence. When a model call fails, it tries the next model in the fallback list until either a call succeeds or all models have been exhausted.

Middleware that automatically retries failed model calls with configurable backoff.

Supports retrying on specific exceptions and exponential backoff.

Provider specific middleware

Creates a middleware that detects and handles personally identifiable information (PII) in conversations.

This middleware detects common PII types and applies configurable strategies to handle them. It can detect emails, credit cards, IP addresses, MAC addresses, and URLs in both user input and agent output.

Built-in PII types:

email: Email addresses
credit_card: Credit card numbers (validated with Luhn algorithm)
ip: IP addresses (validated)
mac_address: MAC addresses
url: URLs (both http/https and bare URLs)

Strategies:

block: Raise an exception when PII is detected
redact: Replace PII with [REDACTED_TYPE] placeholders
mask: Partially mask PII (e.g., ****-****-****-1234 for credit card)
hash: Replace PII with deterministic hash (e.g., <email_hash:a1b2c3d4>)

Strategy Selection Guide:

Strategy	Preserves Identity?	Best For
`block`	N/A	Avoid PII completely
`redact`	No	General compliance, log sanitization
`mask`	No	Human readability, customer service UIs
`hash`	Yes (pseudonymous)	Analytics, debugging

Creates a provider strategy for structured output using native JSON schema support.

This function is used to configure structured output for agents when the underlying model supports native JSON schema output (e.g., OpenAI's gpt-4o, gpt-4o-mini, and newer models). Unlike toolStrategy, which uses function calling to extract structured output, providerStrategy leverages the provider's native structured output capabilities, resulting in more efficient and reliable schema enforcement.

When used with a model that supports JSON schema output, the model will return responses that directly conform to the provided schema without requiring tool calls. This is the recommended approach for structured output when your model supports it.

Resolve a redaction rule to a concrete detector function

Summarization middleware that automatically summarizes conversation history when token limits are approached.

This middleware monitors message token counts and automatically summarizes older messages when a threshold is reached, preserving recent messages and maintaining context continuity by ensuring AI/Tool message pairs remain together.

Creates a middleware that provides todo list management capabilities to agents.

This middleware adds a write_todos tool that allows agents to create and manage structured task lists for complex multi-step operations. It's designed to help agents track progress, organize complex tasks, and provide users with visibility into task completion status.

The middleware automatically injects system prompts that guide the agent on when and how to use the todo functionality effectively. It also enforces that the write_todos tool is called at most once per model turn, since the tool replaces the entire todo list and parallel calls would create ambiguity about precedence.

Middleware that tracks tool call counts and enforces limits.

This middleware monitors the number of tool calls made during agent execution and can terminate the agent when specified limits are reached. It supports both thread-level and run-level call counting with configurable exit behaviors.

Thread-level: The middleware counts all tool calls in the entire message history and persists this count across multiple runs (invocations) of the agent.

Run-level: The middleware counts tool calls made after the last HumanMessage, representing the current run (invocation) of the agent.

Middleware that emulates specified tools using an LLM instead of executing them.

This middleware allows selective emulation of tools for testing purposes. By default (when tools is undefined), all tools are emulated. You can specify which tools to emulate by passing a list of tool names or tool instances.

Middleware that automatically retries failed tool calls with configurable backoff.

Supports retrying on specific exceptions and exponential backoff.

Creates a tool strategy for structured output using function calling.

This function configures structured output by converting schemas into function tools that the model calls. Unlike providerStrategy, which uses native JSON schema support, toolStrategy works with any model that supports function calling, making it more widely compatible across providers and model versions.

The model will call a function with arguments matching your schema, and the agent will extract and validate the structured output from the tool call. This approach is automatically used when your model doesn't support native JSON schema output.

Initialize a ChatModel from the model name and provider. Must have the integration package corresponding to the model provider installed.

Represents a chunk of an AI message, which can be concatenated with other AI message chunks.

Base class for all types of messages in a conversation. It includes properties like content, name, and additional_kwargs. It also includes methods like toDict() and _getType().

Represents a chunk of a message, which can be concatenated with other message chunks. It includes a method _merge_kwargs_dict() for merging additional keyword arguments from another BaseMessageChunk into this one. It also overrides the __add__() method to support concatenation of BaseMessageChunk instances.

Strategy for clearing tool outputs when token limits are exceeded.

This strategy mirrors Anthropic's clear_tool_uses_20250919 behavior by replacing older tool results with a placeholder text when the conversation grows too large. It preserves the most recent tool results and can exclude specific tools from being cleared.

Interface for interacting with a document.

A tool that can be created dynamically from a function, name, and description, designed to work with structured data. It extends the StructuredTool class and overrides the _call method to execute the provided function when the tool is called.

Schema can be passed as Zod or JSON schema. The tool will not validate input if JSON schema is passed.

A tool that can be created dynamically from a function, name, and description.

Fake chat model for testing tool calling functionality

Represents a human message in a conversation.

Represents a chunk of a human message, which can be concatenated with other human message chunks.

In-memory implementation of the BaseStore using a dictionary. Used for storing key-value pairs in memory.

Error thrown when a middleware fails.

Use MiddlewareError.wrap() to create instances. The constructor is private to ensure that GraphBubbleUp errors (like GraphInterrupt) are never wrapped.

Raised when model returns multiple structured output tool calls when only one is expected.

Error thrown when PII is detected and strategy is 'block'

Raised when structured output tool call arguments fail to parse according to the schema.

Base class for Tools that accept input of any shape defined by a Zod schema.

Represents a system message in a conversation.

Represents a chunk of a system message, which can be concatenated with other system message chunks.

Base class for Tools that accept input as a string.

Exception raised when tool call limits are exceeded.

This exception is raised when the configured exit behavior is 'error' and either the thread or run tool call limit has been exceeded.

Raised when a tool call is throwing an error.

Represents a tool message in a conversation.

Represents a chunk of a tool message, which can be concatenated with other tool message chunks.

Information for tracking structured output tool metadata. This contains all necessary information to handle structured responses generated via tool calls, including the original schema, its type classification, and the corresponding tool implementation used by the tools strategy.

Creates a Human-in-the-Loop (HITL) middleware for tool approval and oversight.

This middleware intercepts tool calls made by an AI agent and provides human oversight capabilities before execution. It enables selective approval workflows where certain tools require human intervention while others can execute automatically.

A invocation result that has been interrupted by the middleware will have a __interrupt__ property that contains the interrupt request.

import { type HITLRequest, type HITLResponse } from "langchain";
import { type Interrupt } from "langchain";

const result = await agent.invoke(request);
const interruptRequest = result.__interrupt__?.[0] as Interrupt<HITLRequest>;

// Examine the action requests and review configs
const actionRequests = interruptRequest.value.actionRequests;
const reviewConfigs = interruptRequest.value.reviewConfigs;

// Create decisions for each action
const resume: HITLResponse = {
  decisions: actionRequests.map((action, i) => {
    if (action.name === "calculator") {
      return { type: "approve" };
    } else if (action.name === "write_file") {
      return {
        type: "edit",
        editedAction: { name: "write_file", args: { filename: "safe.txt", content: "Safe content" } }
      };
    }
    return { type: "reject", message: "Action not allowed" };
  })
};

// Resume with decisions
await agent.invoke(new Command({ resume }), config);

Features

Selective Tool Approval: Configure which tools require human approval
Multiple Decision Types: Approve, edit, or reject tool calls
Asynchronous Workflow: Uses LangGraph's interrupt mechanism for non-blocking approval
Custom Approval Messages: Provide context-specific descriptions for approval requests

Decision Types

When a tool requires approval, the human operator can respond with:

approve: Execute the tool with original arguments
edit: Modify the tool name and/or arguments before execution
reject: Provide a manual response instead of executing the tool

Creates a middleware that detects and redacts personally identifiable information (PII) from messages before they are sent to model providers, and restores original values in model responses for tool execution.

Mechanism

The middleware intercepts agent execution at two points:

Request Phase (`wrapModelCall`)

Applies regex-based pattern matching to all message content (HumanMessage, ToolMessage, SystemMessage, AIMessage)
Processes both message text and AIMessage tool call arguments
Each matched pattern generates:
- Unique identifier: generateRedactionId() → "abc123"
- Redaction marker: [REDACTED_{RULE_NAME}_{ID}] → "[REDACTED_SSN_abc123]"
- Redaction map entry: { "abc123": "123-45-6789" }
Returns modified request with redacted message content

Response Phase (`afterModel`)

Scans AIMessage responses for redaction markers matching pattern: /\[REDACTED_[A-Z_]+_(\w+)\]/g
Replaces markers with original values from redaction map
Handles both standard responses and structured output (via tool calls or JSON content)
For structured output, restores values in both the tool call arguments and the structuredResponse state field
Returns new message instances via RemoveMessage/AIMessage to update state

Data Flow

User Input: "My SSN is 123-45-6789"
    ↓ [beforeModel]
Model Request: "My SSN is [REDACTED_SSN_abc123]"
    ↓ [model invocation]
Model Response: tool_call({ "ssn": "[REDACTED_SSN_abc123]" })
    ↓ [afterModel]
Tool Execution: tool({ "ssn": "123-45-6789" })

Limitations

This middleware provides model provider isolation only. PII may still be present in:

LangGraph state checkpoints (memory, databases)
Network traffic between client and application server
Application logs and trace data
Tool execution arguments and responses
Final agent output

For comprehensive PII protection, implement additional controls at the application, network, and storage layers.

LangChain Assistant

Menu

Functions

Classes

Interfaces

Type Aliases

Variables

Namespaces

How It Works

Benefits

Basic Usage

Core Components

Model

Tools

Prompt

Middleware

Advanced Features

How It Works

Use Cases

Features

Decision Types

Mechanism

Request Phase (`wrapModelCall`)

Response Phase (`afterModel`)

Data Flow

Limitations

Menu

index

Functions

Classes

Interfaces

Type Aliases

Variables

Namespaces

How It Works

Benefits

Basic Usage

Core Components

Model

Tools

Prompt

Middleware

Advanced Features

How It Works

Use Cases

Features

Decision Types

Mechanism

Request Phase (wrapModelCall)

Response Phase (afterModel)

Data Flow

Limitations

Request Phase (`wrapModelCall`)

Response Phase (`afterModel`)