import { ... } from "langchain";Unique symbol used to brand middleware instances. This prevents functions from being accidentally assignable to AgentMiddleware since functions have a 'name' property that would otherwise make them structurally compatible.
LangChain Messages
Creates a prompt caching middleware for Anthropic models to optimize API usage.
This middleware automatically adds cache control headers to the last messages when using Anthropic models, enabling their prompt caching feature. This can significantly reduce costs for applications with repetitive prompts, long system messages, or extensive conversation histories.
The middleware intercepts model requests and adds cache control metadata that tells Anthropic's API to cache processed prompt prefixes. On subsequent requests with matching prefixes, the cached representations are reused, skipping redundant token processing.
Apply strategy to content based on matches
LangChain utilities
Middleware that automatically prunes tool results to manage context size.
This middleware applies a sequence of edits when the total input token count
exceeds configured thresholds. By default, it uses the ClearToolUsesEdit strategy
which mirrors Anthropic's clear_tool_uses_20250919 behaviour by clearing older
tool results once the conversation exceeds 100,000 tokens.
Use the middleware with default settings to automatically manage context:
Default token counter that approximates based on character count.
If tools are provided, the token count also includes stringified tool schemas.
Creates a production-ready ReAct (Reasoning + Acting) agent that combines language models with tools and middleware to create systems that can reason about tasks, decide which tools to use, and iteratively work towards solutions.
The agent follows the ReAct pattern, interleaving reasoning steps with tool calls to iteratively work towards solutions. It can handle multiple tool calls in sequence or parallel, maintain state across interactions, and provide auditable decision processes.
The reasoning engine can be specified as:
"openai:gpt-4o" for simple setupTools give agents the ability to take actions:
tool functionToolNode for custom error handlingShape how your agent approaches tasks:
Middleware allows you to extend the agent's behavior:
responseFormat with a Zod schema to get typed responsesCreates a middleware instance with automatic schema inference.
Detect credit card numbers in content (validated with Luhn algorithm)
Detect email addresses in content
Detect IP addresses in content (validated)
Detect MAC addresses in content
Detect URLs in content
Dynamic System Prompt Middleware
Allows setting the system prompt dynamically right before each model invocation. Useful when the prompt depends on the current agent state or per-invocation context.
LangChain Messages
Middleware for selecting tools using an LLM-based strategy.
When an agent has many tools available, this middleware filters them down to only the most relevant ones for the user's query. This reduces token usage and helps the main model focus on the right tools.
Creates a middleware to limit the number of model calls at both thread and run levels.
This middleware helps prevent excessive model API calls by enforcing limits on how many times the model can be invoked. It supports two types of limits:
The middleware intercepts model requests before they are sent and checks the current call counts
against the configured limits. If either limit is exceeded, it throws a ModelCallLimitMiddlewareError
to stop execution and prevent further API calls.
Middleware that provides automatic model fallback on errors.
This middleware attempts to retry failed model calls with alternative models in sequence. When a model call fails, it tries the next model in the fallback list until either a call succeeds or all models have been exhausted.
Middleware that automatically retries failed model calls with configurable backoff.
Supports retrying on specific exceptions and exponential backoff.
Provider specific middleware
Creates a middleware that detects and handles personally identifiable information (PII) in conversations.
This middleware detects common PII types and applies configurable strategies to handle them. It can detect emails, credit cards, IP addresses, MAC addresses, and URLs in both user input and agent output.
Built-in PII types:
email: Email addressescredit_card: Credit card numbers (validated with Luhn algorithm)ip: IP addresses (validated)mac_address: MAC addressesurl: URLs (both http/https and bare URLs)Strategies:
block: Raise an exception when PII is detectedredact: Replace PII with [REDACTED_TYPE] placeholdersmask: Partially mask PII (e.g., ****-****-****-1234 for credit card)hash: Replace PII with deterministic hash (e.g., <email_hash:a1b2c3d4>)Strategy Selection Guide:
| Strategy | Preserves Identity? | Best For |
|---|---|---|
block |
N/A | Avoid PII completely |
redact |
No | General compliance, log sanitization |
mask |
No | Human readability, customer service UIs |
hash |
Yes (pseudonymous) | Analytics, debugging |
Creates a provider strategy for structured output using native JSON schema support.
This function is used to configure structured output for agents when the underlying model
supports native JSON schema output (e.g., OpenAI's gpt-4o, gpt-4o-mini, and newer models).
Unlike toolStrategy, which uses function calling to extract structured output, providerStrategy
leverages the provider's native structured output capabilities, resulting in more efficient
and reliable schema enforcement.
When used with a model that supports JSON schema output, the model will return responses that directly conform to the provided schema without requiring tool calls. This is the recommended approach for structured output when your model supports it.
Resolve a redaction rule to a concrete detector function
Summarization middleware that automatically summarizes conversation history when token limits are approached.
This middleware monitors message token counts and automatically summarizes older messages when a threshold is reached, preserving recent messages and maintaining context continuity by ensuring AI/Tool message pairs remain together.
Creates a middleware that provides todo list management capabilities to agents.
This middleware adds a write_todos tool that allows agents to create and manage
structured task lists for complex multi-step operations. It's designed to help
agents track progress, organize complex tasks, and provide users with visibility
into task completion status.
The middleware automatically injects system prompts that guide the agent on when
and how to use the todo functionality effectively. It also enforces that the
write_todos tool is called at most once per model turn, since the tool replaces
the entire todo list and parallel calls would create ambiguity about precedence.
LangChain Tools
Middleware that tracks tool call counts and enforces limits.
This middleware monitors the number of tool calls made during agent execution and can terminate the agent when specified limits are reached. It supports both thread-level and run-level call counting with configurable exit behaviors.
Thread-level: The middleware counts all tool calls in the entire message history and persists this count across multiple runs (invocations) of the agent.
Run-level: The middleware counts tool calls made after the last HumanMessage, representing the current run (invocation) of the agent.
Middleware that emulates specified tools using an LLM instead of executing them.
This middleware allows selective emulation of tools for testing purposes.
By default (when tools is undefined), all tools are emulated. You can specify
which tools to emulate by passing a list of tool names or tool instances.
Middleware that automatically retries failed tool calls with configurable backoff.
Supports retrying on specific exceptions and exponential backoff.
Creates a tool strategy for structured output using function calling.
This function configures structured output by converting schemas into function tools that
the model calls. Unlike providerStrategy, which uses native JSON schema support,
toolStrategy works with any model that supports function calling, making it more
widely compatible across providers and model versions.
The model will call a function with arguments matching your schema, and the agent will extract and validate the structured output from the tool call. This approach is automatically used when your model doesn't support native JSON schema output.
LangChain Messages
Initialize a ChatModel from the model name and provider. Must have the integration package corresponding to the model provider installed.
LangChain Messages
Represents a chunk of an AI message, which can be concatenated with other AI message chunks.
Base class for all types of messages in a conversation. It includes
properties like content, name, and additional_kwargs. It also
includes methods like toDict() and _getType().
Represents a chunk of a message, which can be concatenated with other
message chunks. It includes a method _merge_kwargs_dict() for merging
additional keyword arguments from another BaseMessageChunk into this
one. It also overrides the __add__() method to support concatenation
of BaseMessageChunk instances.
Strategy for clearing tool outputs when token limits are exceeded.
This strategy mirrors Anthropic's clear_tool_uses_20250919 behavior by
replacing older tool results with a placeholder text when the conversation
grows too large. It preserves the most recent tool results and can exclude
specific tools from being cleared.
Interface for interacting with a document.
A tool that can be created dynamically from a function, name, and description, designed to work with structured data. It extends the StructuredTool class and overrides the _call method to execute the provided function when the tool is called.
Schema can be passed as Zod or JSON schema. The tool will not validate input if JSON schema is passed.
A tool that can be created dynamically from a function, name, and description.
Fake chat model for testing tool calling functionality
Represents a human message in a conversation.
Represents a chunk of a human message, which can be concatenated with other human message chunks.
In-memory implementation of the BaseStore using a dictionary. Used for storing key-value pairs in memory.
Error thrown when a middleware fails.
Use MiddlewareError.wrap() to create instances. The constructor is private
to ensure that GraphBubbleUp errors (like GraphInterrupt) are never wrapped.
Raised when model returns multiple structured output tool calls when only one is expected.
Error thrown when PII is detected and strategy is 'block'
Raised when structured output tool call arguments fail to parse according to the schema.
Base class for Tools that accept input of any shape defined by a Zod schema.
Represents a system message in a conversation.
Represents a chunk of a system message, which can be concatenated with other system message chunks.
Base class for Tools that accept input as a string.
Exception raised when tool call limits are exceeded.
This exception is raised when the configured exit behavior is 'error' and either the thread or run tool call limit has been exceeded.
Raised when a tool call is throwing an error.
Represents a tool message in a conversation.
Represents a chunk of a tool message, which can be concatenated with other tool message chunks.
Information for tracking structured output tool metadata. This contains all necessary information to handle structured responses generated via tool calls, including the original schema, its type classification, and the corresponding tool implementation used by the tools strategy.
Creates a Human-in-the-Loop (HITL) middleware for tool approval and oversight.
This middleware intercepts tool calls made by an AI agent and provides human oversight capabilities before execution. It enables selective approval workflows where certain tools require human intervention while others can execute automatically.
A invocation result that has been interrupted by the middleware will have a __interrupt__
property that contains the interrupt request.
import { type HITLRequest, type HITLResponse } from "langchain";
import { type Interrupt } from "langchain";
const result = await agent.invoke(request);
const interruptRequest = result.__interrupt__?.[0] as Interrupt<HITLRequest>;
// Examine the action requests and review configs
const actionRequests = interruptRequest.value.actionRequests;
const reviewConfigs = interruptRequest.value.reviewConfigs;
// Create decisions for each action
const resume: HITLResponse = {
decisions: actionRequests.map((action, i) => {
if (action.name === "calculator") {
return { type: "approve" };
} else if (action.name === "write_file") {
return {
type: "edit",
editedAction: { name: "write_file", args: { filename: "safe.txt", content: "Safe content" } }
};
}
return { type: "reject", message: "Action not allowed" };
})
};
// Resume with decisions
await agent.invoke(new Command({ resume }), config);
When a tool requires approval, the human operator can respond with:
approve: Execute the tool with original argumentsedit: Modify the tool name and/or arguments before executionreject: Provide a manual response instead of executing the toolCreates a middleware that detects and redacts personally identifiable information (PII) from messages before they are sent to model providers, and restores original values in model responses for tool execution.
The middleware intercepts agent execution at two points:
wrapModelCall)generateRedactionId() → "abc123"[REDACTED_{RULE_NAME}_{ID}] → "[REDACTED_SSN_abc123]"{ "abc123": "123-45-6789" }afterModel)/\[REDACTED_[A-Z_]+_(\w+)\]/gstructuredResponse state fieldUser Input: "My SSN is 123-45-6789"
↓ [beforeModel]
Model Request: "My SSN is [REDACTED_SSN_abc123]"
↓ [model invocation]
Model Response: tool_call({ "ssn": "[REDACTED_SSN_abc123]" })
↓ [afterModel]
Tool Execution: tool({ "ssn": "123-45-6789" })
This middleware provides model provider isolation only. PII may still be present in:
For comprehensive PII protection, implement additional controls at the application, network, and storage layers.