Middleware¶
langchain.agents.middleware
¶
Entrypoint to using Middleware plugins with Agents.
Reference docs
This page contains reference documentation for Middleware. See the docs for conceptual guides, tutorials, and examples on using Middleware.
CLASS | DESCRIPTION |
---|---|
ContextEditingMiddleware |
Middleware that automatically prunes tool results to manage context size. |
HumanInTheLoopMiddleware |
Human in the loop middleware. |
LLMToolSelectorMiddleware |
Uses an LLM to select relevant tools before calling the main model. |
LLMToolEmulator |
Middleware that emulates specified tools using an LLM instead of executing them. |
ModelCallLimitMiddleware |
Middleware that tracks model call counts and enforces limits. |
ModelFallbackMiddleware |
Automatic fallback to alternative models on errors. |
PIIMiddleware |
Detect and handle Personally Identifiable Information (PII) in agent conversations. |
PIIDetectionError |
Raised when configured to block on detected sensitive values. |
SummarizationMiddleware |
Middleware that summarizes conversation history when token limits are approached. |
ToolCallLimitMiddleware |
Middleware that tracks tool call counts and enforces limits. |
AgentMiddleware |
Base middleware class for an agent. |
AgentState |
State schema for the agent. |
ClearToolUsesEdit |
Configuration for clearing tool outputs when token limits are exceeded. |
InterruptOnConfig |
Configuration for an action requiring human in the loop. |
ModelRequest |
Model request information for the agent. |
ModelResponse |
Response from model execution including messages and optional structured output. |
ModelRequest |
Model request information for the agent. |
ContextEditingMiddleware
¶
Bases: AgentMiddleware
Middleware that automatically prunes tool results to manage context size.
The middleware applies a sequence of edits when the total input token count
exceeds configured thresholds. Currently the ClearToolUsesEdit
strategy is
supported, aligning with Anthropic's clear_tool_uses_20250919
behaviour.
state_schema
class-attribute
instance-attribute
¶
state_schema: type[StateT] = cast('type[StateT]', AgentState)
The schema for state passed to the middleware nodes.
name
property
¶
name: str
The name of the middleware instance.
Defaults to the class name, but can be overridden for custom naming.
before_agent
¶
Logic to run before the agent execution starts.
abefore_agent
async
¶
Async logic to run before the agent execution starts.
before_model
¶
Logic to run before the model is called.
abefore_model
async
¶
Async logic to run before the model is called.
after_model
¶
Logic to run after the model is called.
aafter_model
async
¶
Async logic to run after the model is called.
after_agent
¶
Logic to run after the agent execution completes.
aafter_agent
async
¶
Async logic to run after the agent execution completes.
wrap_tool_call
¶
wrap_tool_call(
request: ToolCallRequest,
handler: Callable[[ToolCallRequest], ToolMessage | Command],
) -> ToolMessage | Command
Intercept tool execution for retries, monitoring, or modification.
Multiple middleware compose automatically (first defined = outermost).
Exceptions propagate unless handle_tool_errors
is configured on ToolNode
.
PARAMETER | DESCRIPTION |
---|---|
request
|
Tool call request with call
TYPE:
|
handler
|
Callable to execute the tool (can be called multiple times).
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
ToolMessage | Command
|
|
The handler callable can be invoked multiple times for retry logic. Each call to handler is independent and stateless.
Examples:
Modify request before execution:
def wrap_tool_call(self, request, handler):
request.tool_call["args"]["value"] *= 2
return handler(request)
Retry on error (call handler multiple times):
def wrap_tool_call(self, request, handler):
for attempt in range(3):
try:
result = handler(request)
if is_valid(result):
return result
except Exception:
if attempt == 2:
raise
return result
Conditional retry based on response:
awrap_tool_call
async
¶
awrap_tool_call(
request: ToolCallRequest,
handler: Callable[[ToolCallRequest], Awaitable[ToolMessage | Command]],
) -> ToolMessage | Command
Intercept and control async tool execution via handler callback.
The handler callback executes the tool call and returns a ToolMessage
or
Command
. Middleware can call the handler multiple times for retry logic, skip
calling it to short-circuit, or modify the request/response. Multiple middleware
compose with first in list as outermost layer.
PARAMETER | DESCRIPTION |
---|---|
request
|
Tool call request with call
TYPE:
|
handler
|
Async callable to execute the tool and returns
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
ToolMessage | Command
|
|
The handler callable can be invoked multiple times for retry logic. Each call to handler is independent and stateless.
Examples:
Async retry on error:
__init__
¶
__init__(
*,
edits: Iterable[ContextEdit] | None = None,
token_count_method: Literal["approximate", "model"] = "approximate"
) -> None
Initializes a context editing middleware instance.
PARAMETER | DESCRIPTION |
---|---|
edits
|
Sequence of edit strategies to apply. Defaults to a single
TYPE:
|
token_count_method
|
Whether to use approximate token counting (faster, less accurate) or exact counting implemented by the chat model (potentially slower, more accurate).
TYPE:
|
wrap_model_call
¶
wrap_model_call(
request: ModelRequest, handler: Callable[[ModelRequest], ModelResponse]
) -> ModelCallResult
Apply context edits before invoking the model via handler.
awrap_model_call
async
¶
awrap_model_call(
request: ModelRequest, handler: Callable[[ModelRequest], Awaitable[ModelResponse]]
) -> ModelCallResult
Apply context edits before invoking the model via handler (async version).
HumanInTheLoopMiddleware
¶
Bases: AgentMiddleware
Human in the loop middleware.
state_schema
class-attribute
instance-attribute
¶
state_schema: type[StateT] = cast('type[StateT]', AgentState)
The schema for state passed to the middleware nodes.
name
property
¶
name: str
The name of the middleware instance.
Defaults to the class name, but can be overridden for custom naming.
before_agent
¶
Logic to run before the agent execution starts.
abefore_agent
async
¶
Async logic to run before the agent execution starts.
before_model
¶
Logic to run before the model is called.
abefore_model
async
¶
Async logic to run before the model is called.
aafter_model
async
¶
Async logic to run after the model is called.
wrap_model_call
¶
wrap_model_call(
request: ModelRequest, handler: Callable[[ModelRequest], ModelResponse]
) -> ModelCallResult
Intercept and control model execution via handler callback.
The handler callback executes the model request and returns a ModelResponse
.
Middleware can call the handler multiple times for retry logic, skip calling
it to short-circuit, or modify the request/response. Multiple middleware
compose with first in list as outermost layer.
PARAMETER | DESCRIPTION |
---|---|
request
|
Model request to execute (includes state and runtime).
TYPE:
|
handler
|
Callback that executes the model request and returns
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
ModelCallResult
|
|
Examples:
Retry on error:
def wrap_model_call(self, request, handler):
for attempt in range(3):
try:
return handler(request)
except Exception:
if attempt == 2:
raise
Rewrite response:
def wrap_model_call(self, request, handler):
response = handler(request)
ai_msg = response.result[0]
return ModelResponse(
result=[AIMessage(content=f"[{ai_msg.content}]")],
structured_response=response.structured_response,
)
Error to fallback:
def wrap_model_call(self, request, handler):
try:
return handler(request)
except Exception:
return ModelResponse(result=[AIMessage(content="Service unavailable")])
Cache/short-circuit:
def wrap_model_call(self, request, handler):
if cached := get_cache(request):
return cached # Short-circuit with cached result
response = handler(request)
save_cache(request, response)
return response
Simple AIMessage return (converted automatically):
awrap_model_call
async
¶
awrap_model_call(
request: ModelRequest, handler: Callable[[ModelRequest], Awaitable[ModelResponse]]
) -> ModelCallResult
Intercept and control async model execution via handler callback.
The handler callback executes the model request and returns a ModelResponse
.
Middleware can call the handler multiple times for retry logic, skip calling
it to short-circuit, or modify the request/response. Multiple middleware
compose with first in list as outermost layer.
PARAMETER | DESCRIPTION |
---|---|
request
|
Model request to execute (includes state and runtime).
TYPE:
|
handler
|
Async callback that executes the model request and returns
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
ModelCallResult
|
ModelCallResult |
Examples:
Retry on error:
after_agent
¶
Logic to run after the agent execution completes.
aafter_agent
async
¶
Async logic to run after the agent execution completes.
wrap_tool_call
¶
wrap_tool_call(
request: ToolCallRequest,
handler: Callable[[ToolCallRequest], ToolMessage | Command],
) -> ToolMessage | Command
Intercept tool execution for retries, monitoring, or modification.
Multiple middleware compose automatically (first defined = outermost).
Exceptions propagate unless handle_tool_errors
is configured on ToolNode
.
PARAMETER | DESCRIPTION |
---|---|
request
|
Tool call request with call
TYPE:
|
handler
|
Callable to execute the tool (can be called multiple times).
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
ToolMessage | Command
|
|
The handler callable can be invoked multiple times for retry logic. Each call to handler is independent and stateless.
Examples:
Modify request before execution:
def wrap_tool_call(self, request, handler):
request.tool_call["args"]["value"] *= 2
return handler(request)
Retry on error (call handler multiple times):
def wrap_tool_call(self, request, handler):
for attempt in range(3):
try:
result = handler(request)
if is_valid(result):
return result
except Exception:
if attempt == 2:
raise
return result
Conditional retry based on response:
awrap_tool_call
async
¶
awrap_tool_call(
request: ToolCallRequest,
handler: Callable[[ToolCallRequest], Awaitable[ToolMessage | Command]],
) -> ToolMessage | Command
Intercept and control async tool execution via handler callback.
The handler callback executes the tool call and returns a ToolMessage
or
Command
. Middleware can call the handler multiple times for retry logic, skip
calling it to short-circuit, or modify the request/response. Multiple middleware
compose with first in list as outermost layer.
PARAMETER | DESCRIPTION |
---|---|
request
|
Tool call request with call
TYPE:
|
handler
|
Async callable to execute the tool and returns
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
ToolMessage | Command
|
|
The handler callable can be invoked multiple times for retry logic. Each call to handler is independent and stateless.
Examples:
Async retry on error:
__init__
¶
__init__(
interrupt_on: dict[str, bool | InterruptOnConfig],
*,
description_prefix: str = "Tool execution requires approval"
) -> None
Initialize the human in the loop middleware.
PARAMETER | DESCRIPTION |
---|---|
interrupt_on
|
Mapping of tool name to allowed actions. If a tool doesn't have an entry, it's auto-approved by default.
TYPE:
|
description_prefix
|
The prefix to use when constructing action requests.
This is used to provide context about the tool call and the action being
requested. Not used if a tool has a
TYPE:
|
LLMToolSelectorMiddleware
¶
Bases: AgentMiddleware
Uses an LLM to select relevant tools before calling the main model.
When an agent has many tools available, this middleware filters them down to only the most relevant ones for the user's query. This reduces token usage and helps the main model focus on the right tools.
Examples:
Limit to 3 tools:
from langchain.agents.middleware import LLMToolSelectorMiddleware
middleware = LLMToolSelectorMiddleware(max_tools=3)
agent = create_agent(
model="openai:gpt-4o",
tools=[tool1, tool2, tool3, tool4, tool5],
middleware=[middleware],
)
Use a smaller model for selection:
state_schema
class-attribute
instance-attribute
¶
state_schema: type[StateT] = cast('type[StateT]', AgentState)
The schema for state passed to the middleware nodes.
name
property
¶
name: str
The name of the middleware instance.
Defaults to the class name, but can be overridden for custom naming.
before_agent
¶
Logic to run before the agent execution starts.
abefore_agent
async
¶
Async logic to run before the agent execution starts.
before_model
¶
Logic to run before the model is called.
abefore_model
async
¶
Async logic to run before the model is called.
after_model
¶
Logic to run after the model is called.
aafter_model
async
¶
Async logic to run after the model is called.
after_agent
¶
Logic to run after the agent execution completes.
aafter_agent
async
¶
Async logic to run after the agent execution completes.
wrap_tool_call
¶
wrap_tool_call(
request: ToolCallRequest,
handler: Callable[[ToolCallRequest], ToolMessage | Command],
) -> ToolMessage | Command
Intercept tool execution for retries, monitoring, or modification.
Multiple middleware compose automatically (first defined = outermost).
Exceptions propagate unless handle_tool_errors
is configured on ToolNode
.
PARAMETER | DESCRIPTION |
---|---|
request
|
Tool call request with call
TYPE:
|
handler
|
Callable to execute the tool (can be called multiple times).
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
ToolMessage | Command
|
|
The handler callable can be invoked multiple times for retry logic. Each call to handler is independent and stateless.
Examples:
Modify request before execution:
def wrap_tool_call(self, request, handler):
request.tool_call["args"]["value"] *= 2
return handler(request)
Retry on error (call handler multiple times):
def wrap_tool_call(self, request, handler):
for attempt in range(3):
try:
result = handler(request)
if is_valid(result):
return result
except Exception:
if attempt == 2:
raise
return result
Conditional retry based on response:
awrap_tool_call
async
¶
awrap_tool_call(
request: ToolCallRequest,
handler: Callable[[ToolCallRequest], Awaitable[ToolMessage | Command]],
) -> ToolMessage | Command
Intercept and control async tool execution via handler callback.
The handler callback executes the tool call and returns a ToolMessage
or
Command
. Middleware can call the handler multiple times for retry logic, skip
calling it to short-circuit, or modify the request/response. Multiple middleware
compose with first in list as outermost layer.
PARAMETER | DESCRIPTION |
---|---|
request
|
Tool call request with call
TYPE:
|
handler
|
Async callable to execute the tool and returns
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
ToolMessage | Command
|
|
The handler callable can be invoked multiple times for retry logic. Each call to handler is independent and stateless.
Examples:
Async retry on error:
__init__
¶
__init__(
*,
model: str | BaseChatModel | None = None,
system_prompt: str = DEFAULT_SYSTEM_PROMPT,
max_tools: int | None = None,
always_include: list[str] | None = None
) -> None
Initialize the tool selector.
PARAMETER | DESCRIPTION |
---|---|
model
|
Model to use for selection. If not provided, uses the agent's main model. Can be a model identifier string or BaseChatModel instance.
TYPE:
|
system_prompt
|
Instructions for the selection model.
TYPE:
|
max_tools
|
Maximum number of tools to select. If the model selects more, only the first max_tools will be used. No limit if not specified.
TYPE:
|
always_include
|
Tool names to always include regardless of selection. These do not count against the max_tools limit. |
wrap_model_call
¶
wrap_model_call(
request: ModelRequest, handler: Callable[[ModelRequest], ModelResponse]
) -> ModelCallResult
Filter tools based on LLM selection before invoking the model via handler.
awrap_model_call
async
¶
awrap_model_call(
request: ModelRequest, handler: Callable[[ModelRequest], Awaitable[ModelResponse]]
) -> ModelCallResult
Filter tools based on LLM selection before invoking the model via handler.
LLMToolEmulator
¶
Bases: AgentMiddleware
Middleware that emulates specified tools using an LLM instead of executing them.
This middleware allows selective emulation of tools for testing purposes. By default (when tools=None), all tools are emulated. You can specify which tools to emulate by passing a list of tool names or BaseTool instances.
Examples:
Emulate all tools (default behavior):
from langchain.agents.middleware import LLMToolEmulator
middleware = LLMToolEmulator()
agent = create_agent(
model="openai:gpt-4o",
tools=[get_weather, get_user_location, calculator],
middleware=[middleware],
)
Emulate specific tools by name:
Use a custom model for emulation:
Emulate specific tools by passing tool instances:
state_schema
class-attribute
instance-attribute
¶
state_schema: type[StateT] = cast('type[StateT]', AgentState)
The schema for state passed to the middleware nodes.
name
property
¶
name: str
The name of the middleware instance.
Defaults to the class name, but can be overridden for custom naming.
before_agent
¶
Logic to run before the agent execution starts.
abefore_agent
async
¶
Async logic to run before the agent execution starts.
before_model
¶
Logic to run before the model is called.
abefore_model
async
¶
Async logic to run before the model is called.
after_model
¶
Logic to run after the model is called.
aafter_model
async
¶
Async logic to run after the model is called.
wrap_model_call
¶
wrap_model_call(
request: ModelRequest, handler: Callable[[ModelRequest], ModelResponse]
) -> ModelCallResult
Intercept and control model execution via handler callback.
The handler callback executes the model request and returns a ModelResponse
.
Middleware can call the handler multiple times for retry logic, skip calling
it to short-circuit, or modify the request/response. Multiple middleware
compose with first in list as outermost layer.
PARAMETER | DESCRIPTION |
---|---|
request
|
Model request to execute (includes state and runtime).
TYPE:
|
handler
|
Callback that executes the model request and returns
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
ModelCallResult
|
|
Examples:
Retry on error:
def wrap_model_call(self, request, handler):
for attempt in range(3):
try:
return handler(request)
except Exception:
if attempt == 2:
raise
Rewrite response:
def wrap_model_call(self, request, handler):
response = handler(request)
ai_msg = response.result[0]
return ModelResponse(
result=[AIMessage(content=f"[{ai_msg.content}]")],
structured_response=response.structured_response,
)
Error to fallback:
def wrap_model_call(self, request, handler):
try:
return handler(request)
except Exception:
return ModelResponse(result=[AIMessage(content="Service unavailable")])
Cache/short-circuit:
def wrap_model_call(self, request, handler):
if cached := get_cache(request):
return cached # Short-circuit with cached result
response = handler(request)
save_cache(request, response)
return response
Simple AIMessage return (converted automatically):
awrap_model_call
async
¶
awrap_model_call(
request: ModelRequest, handler: Callable[[ModelRequest], Awaitable[ModelResponse]]
) -> ModelCallResult
Intercept and control async model execution via handler callback.
The handler callback executes the model request and returns a ModelResponse
.
Middleware can call the handler multiple times for retry logic, skip calling
it to short-circuit, or modify the request/response. Multiple middleware
compose with first in list as outermost layer.
PARAMETER | DESCRIPTION |
---|---|
request
|
Model request to execute (includes state and runtime).
TYPE:
|
handler
|
Async callback that executes the model request and returns
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
ModelCallResult
|
ModelCallResult |
Examples:
Retry on error:
after_agent
¶
Logic to run after the agent execution completes.
aafter_agent
async
¶
Async logic to run after the agent execution completes.
__init__
¶
__init__(
*,
tools: list[str | BaseTool] | None = None,
model: str | BaseChatModel | None = None
) -> None
Initialize the tool emulator.
PARAMETER | DESCRIPTION |
---|---|
tools
|
List of tool names (str) or BaseTool instances to emulate. If None (default), ALL tools will be emulated. If empty list, no tools will be emulated. |
model
|
Model to use for emulation. Defaults to "anthropic:claude-3-5-sonnet-latest". Can be a model identifier string or BaseChatModel instance.
TYPE:
|
wrap_tool_call
¶
wrap_tool_call(
request: ToolCallRequest,
handler: Callable[[ToolCallRequest], ToolMessage | Command],
) -> ToolMessage | Command
Emulate tool execution using LLM if tool should be emulated.
PARAMETER | DESCRIPTION |
---|---|
request
|
Tool call request to potentially emulate.
TYPE:
|
handler
|
Callback to execute the tool (can be called multiple times).
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
ToolMessage | Command
|
ToolMessage with emulated response if tool should be emulated, |
ToolMessage | Command
|
otherwise calls handler for normal execution. |
awrap_tool_call
async
¶
awrap_tool_call(
request: ToolCallRequest,
handler: Callable[[ToolCallRequest], Awaitable[ToolMessage | Command]],
) -> ToolMessage | Command
Async version of wrap_tool_call.
Emulate tool execution using LLM if tool should be emulated.
PARAMETER | DESCRIPTION |
---|---|
request
|
Tool call request to potentially emulate.
TYPE:
|
handler
|
Async callback to execute the tool (can be called multiple times).
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
ToolMessage | Command
|
ToolMessage with emulated response if tool should be emulated, |
ToolMessage | Command
|
otherwise calls handler for normal execution. |
ModelCallLimitMiddleware
¶
Bases: AgentMiddleware[ModelCallLimitState, Any]
Middleware that tracks model call counts and enforces limits.
This middleware monitors the number of model calls made during agent execution and can terminate the agent when specified limits are reached. It supports both thread-level and run-level call counting with configurable exit behaviors.
Thread-level: The middleware tracks the number of model calls and persists call count across multiple runs (invocations) of the agent.
Run-level: The middleware tracks the number of model calls made during a single run (invocation) of the agent.
Example
from langchain.agents.middleware.call_tracking import ModelCallLimitMiddleware
from langchain.agents import create_agent
# Create middleware with limits
call_tracker = ModelCallLimitMiddleware(thread_limit=10, run_limit=5, exit_behavior="end")
agent = create_agent("openai:gpt-4o", middleware=[call_tracker])
# Agent will automatically jump to end when limits are exceeded
result = await agent.invoke({"messages": [HumanMessage("Help me with a task")]})
name
property
¶
name: str
The name of the middleware instance.
Defaults to the class name, but can be overridden for custom naming.
before_agent
¶
Logic to run before the agent execution starts.
abefore_agent
async
¶
Async logic to run before the agent execution starts.
abefore_model
async
¶
Async logic to run before the model is called.
aafter_model
async
¶
Async logic to run after the model is called.
wrap_model_call
¶
wrap_model_call(
request: ModelRequest, handler: Callable[[ModelRequest], ModelResponse]
) -> ModelCallResult
Intercept and control model execution via handler callback.
The handler callback executes the model request and returns a ModelResponse
.
Middleware can call the handler multiple times for retry logic, skip calling
it to short-circuit, or modify the request/response. Multiple middleware
compose with first in list as outermost layer.
PARAMETER | DESCRIPTION |
---|---|
request
|
Model request to execute (includes state and runtime).
TYPE:
|
handler
|
Callback that executes the model request and returns
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
ModelCallResult
|
|
Examples:
Retry on error:
def wrap_model_call(self, request, handler):
for attempt in range(3):
try:
return handler(request)
except Exception:
if attempt == 2:
raise
Rewrite response:
def wrap_model_call(self, request, handler):
response = handler(request)
ai_msg = response.result[0]
return ModelResponse(
result=[AIMessage(content=f"[{ai_msg.content}]")],
structured_response=response.structured_response,
)
Error to fallback:
def wrap_model_call(self, request, handler):
try:
return handler(request)
except Exception:
return ModelResponse(result=[AIMessage(content="Service unavailable")])
Cache/short-circuit:
def wrap_model_call(self, request, handler):
if cached := get_cache(request):
return cached # Short-circuit with cached result
response = handler(request)
save_cache(request, response)
return response
Simple AIMessage return (converted automatically):
awrap_model_call
async
¶
awrap_model_call(
request: ModelRequest, handler: Callable[[ModelRequest], Awaitable[ModelResponse]]
) -> ModelCallResult
Intercept and control async model execution via handler callback.
The handler callback executes the model request and returns a ModelResponse
.
Middleware can call the handler multiple times for retry logic, skip calling
it to short-circuit, or modify the request/response. Multiple middleware
compose with first in list as outermost layer.
PARAMETER | DESCRIPTION |
---|---|
request
|
Model request to execute (includes state and runtime).
TYPE:
|
handler
|
Async callback that executes the model request and returns
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
ModelCallResult
|
ModelCallResult |
Examples:
Retry on error:
after_agent
¶
Logic to run after the agent execution completes.
aafter_agent
async
¶
Async logic to run after the agent execution completes.
wrap_tool_call
¶
wrap_tool_call(
request: ToolCallRequest,
handler: Callable[[ToolCallRequest], ToolMessage | Command],
) -> ToolMessage | Command
Intercept tool execution for retries, monitoring, or modification.
Multiple middleware compose automatically (first defined = outermost).
Exceptions propagate unless handle_tool_errors
is configured on ToolNode
.
PARAMETER | DESCRIPTION |
---|---|
request
|
Tool call request with call
TYPE:
|
handler
|
Callable to execute the tool (can be called multiple times).
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
ToolMessage | Command
|
|
The handler callable can be invoked multiple times for retry logic. Each call to handler is independent and stateless.
Examples:
Modify request before execution:
def wrap_tool_call(self, request, handler):
request.tool_call["args"]["value"] *= 2
return handler(request)
Retry on error (call handler multiple times):
def wrap_tool_call(self, request, handler):
for attempt in range(3):
try:
result = handler(request)
if is_valid(result):
return result
except Exception:
if attempt == 2:
raise
return result
Conditional retry based on response:
awrap_tool_call
async
¶
awrap_tool_call(
request: ToolCallRequest,
handler: Callable[[ToolCallRequest], Awaitable[ToolMessage | Command]],
) -> ToolMessage | Command
Intercept and control async tool execution via handler callback.
The handler callback executes the tool call and returns a ToolMessage
or
Command
. Middleware can call the handler multiple times for retry logic, skip
calling it to short-circuit, or modify the request/response. Multiple middleware
compose with first in list as outermost layer.
PARAMETER | DESCRIPTION |
---|---|
request
|
Tool call request with call
TYPE:
|
handler
|
Async callable to execute the tool and returns
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
ToolMessage | Command
|
|
The handler callable can be invoked multiple times for retry logic. Each call to handler is independent and stateless.
Examples:
Async retry on error:
state_schema
class-attribute
instance-attribute
¶
The schema for state passed to the middleware nodes.
__init__
¶
__init__(
*,
thread_limit: int | None = None,
run_limit: int | None = None,
exit_behavior: Literal["end", "error"] = "end"
) -> None
Initialize the call tracking middleware.
PARAMETER | DESCRIPTION |
---|---|
thread_limit
|
Maximum number of model calls allowed per thread. None means no limit.
TYPE:
|
run_limit
|
Maximum number of model calls allowed per run. None means no limit.
TYPE:
|
exit_behavior
|
What to do when limits are exceeded.
- "end": Jump to the end of the agent execution and
inject an artificial AI message indicating that the limit was exceeded.
- "error": Raise a
TYPE:
|
RAISES | DESCRIPTION |
---|---|
ValueError
|
If both limits are |
before_model
¶
Check model call limits before making a model call.
PARAMETER | DESCRIPTION |
---|---|
state
|
The current agent state containing call counts.
TYPE:
|
runtime
|
The langgraph runtime.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
dict[str, Any] | None
|
If limits are exceeded and exit_behavior is "end", returns |
dict[str, Any] | None
|
a Command to jump to the end with a limit exceeded message. Otherwise returns None. |
RAISES | DESCRIPTION |
---|---|
ModelCallLimitExceededError
|
If limits are exceeded and exit_behavior is "error". |
after_model
¶
ModelFallbackMiddleware
¶
Bases: AgentMiddleware
Automatic fallback to alternative models on errors.
Retries failed model calls with alternative models in sequence until success or all models exhausted. Primary model specified in create_agent().
Example
from langchain.agents.middleware.model_fallback import ModelFallbackMiddleware
from langchain.agents import create_agent
fallback = ModelFallbackMiddleware(
"openai:gpt-4o-mini", # Try first on error
"anthropic:claude-3-5-sonnet-20241022", # Then this
)
agent = create_agent(
model="openai:gpt-4o", # Primary model
middleware=[fallback],
)
# If primary fails: tries gpt-4o-mini, then claude-3-5-sonnet
result = await agent.invoke({"messages": [HumanMessage("Hello")]})
state_schema
class-attribute
instance-attribute
¶
state_schema: type[StateT] = cast('type[StateT]', AgentState)
The schema for state passed to the middleware nodes.
name
property
¶
name: str
The name of the middleware instance.
Defaults to the class name, but can be overridden for custom naming.
before_agent
¶
Logic to run before the agent execution starts.
abefore_agent
async
¶
Async logic to run before the agent execution starts.
before_model
¶
Logic to run before the model is called.
abefore_model
async
¶
Async logic to run before the model is called.
after_model
¶
Logic to run after the model is called.
aafter_model
async
¶
Async logic to run after the model is called.
after_agent
¶
Logic to run after the agent execution completes.
aafter_agent
async
¶
Async logic to run after the agent execution completes.
wrap_tool_call
¶
wrap_tool_call(
request: ToolCallRequest,
handler: Callable[[ToolCallRequest], ToolMessage | Command],
) -> ToolMessage | Command
Intercept tool execution for retries, monitoring, or modification.
Multiple middleware compose automatically (first defined = outermost).
Exceptions propagate unless handle_tool_errors
is configured on ToolNode
.
PARAMETER | DESCRIPTION |
---|---|
request
|
Tool call request with call
TYPE:
|
handler
|
Callable to execute the tool (can be called multiple times).
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
ToolMessage | Command
|
|
The handler callable can be invoked multiple times for retry logic. Each call to handler is independent and stateless.
Examples:
Modify request before execution:
def wrap_tool_call(self, request, handler):
request.tool_call["args"]["value"] *= 2
return handler(request)
Retry on error (call handler multiple times):
def wrap_tool_call(self, request, handler):
for attempt in range(3):
try:
result = handler(request)
if is_valid(result):
return result
except Exception:
if attempt == 2:
raise
return result
Conditional retry based on response:
awrap_tool_call
async
¶
awrap_tool_call(
request: ToolCallRequest,
handler: Callable[[ToolCallRequest], Awaitable[ToolMessage | Command]],
) -> ToolMessage | Command
Intercept and control async tool execution via handler callback.
The handler callback executes the tool call and returns a ToolMessage
or
Command
. Middleware can call the handler multiple times for retry logic, skip
calling it to short-circuit, or modify the request/response. Multiple middleware
compose with first in list as outermost layer.
PARAMETER | DESCRIPTION |
---|---|
request
|
Tool call request with call
TYPE:
|
handler
|
Async callable to execute the tool and returns
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
ToolMessage | Command
|
|
The handler callable can be invoked multiple times for retry logic. Each call to handler is independent and stateless.
Examples:
Async retry on error:
__init__
¶
__init__(
first_model: str | BaseChatModel, *additional_models: str | BaseChatModel
) -> None
Initialize model fallback middleware.
PARAMETER | DESCRIPTION |
---|---|
first_model
|
First fallback model (string name or instance).
TYPE:
|
*additional_models
|
Additional fallbacks in order.
TYPE:
|
wrap_model_call
¶
wrap_model_call(
request: ModelRequest, handler: Callable[[ModelRequest], ModelResponse]
) -> ModelCallResult
Try fallback models in sequence on errors.
PARAMETER | DESCRIPTION |
---|---|
request
|
Initial model request.
TYPE:
|
handler
|
Callback to execute the model.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
ModelCallResult
|
AIMessage from successful model call. |
RAISES | DESCRIPTION |
---|---|
Exception
|
If all models fail, re-raises last exception. |
awrap_model_call
async
¶
awrap_model_call(
request: ModelRequest, handler: Callable[[ModelRequest], Awaitable[ModelResponse]]
) -> ModelCallResult
Try fallback models in sequence on errors (async version).
PARAMETER | DESCRIPTION |
---|---|
request
|
Initial model request.
TYPE:
|
handler
|
Async callback to execute the model.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
ModelCallResult
|
AIMessage from successful model call. |
RAISES | DESCRIPTION |
---|---|
Exception
|
If all models fail, re-raises last exception. |
PIIMiddleware
¶
Bases: AgentMiddleware
Detect and handle Personally Identifiable Information (PII) in agent conversations.
This middleware detects common PII types and applies configurable strategies to handle them. It can detect emails, credit cards, IP addresses, MAC addresses, and URLs in both user input and agent output.
Built-in PII types
email
: Email addressescredit_card
: Credit card numbers (validated with Luhn algorithm)ip
: IP addresses (validated with stdlib)mac_address
: MAC addressesurl
: URLs (bothhttp
/https
and bare URLs)
Strategies
block
: Raise an exception when PII is detectedredact
: Replace PII with[REDACTED_TYPE]
placeholdersmask
: Partially mask PII (e.g.,****-****-****-1234
for credit card)hash
: Replace PII with deterministic hash (e.g.,<email_hash:a1b2c3d4>
)
Strategy Selection Guide:
Strategy | Preserves Identity? | Best For |
---|---|---|
block |
N/A | Avoid PII completely |
redact |
No | General compliance, log sanitization |
mask |
No | Human readability, customer service UIs |
hash |
Yes (pseudonymous) | Analytics, debugging |
Example
from langchain.agents.middleware import PIIMiddleware
from langchain.agents import create_agent
# Redact all emails in user input
agent = create_agent(
"openai:gpt-5",
middleware=[
PIIMiddleware("email", strategy="redact"),
],
)
# Use different strategies for different PII types
agent = create_agent(
"openai:gpt-4o",
middleware=[
PIIMiddleware("credit_card", strategy="mask"),
PIIMiddleware("url", strategy="redact"),
PIIMiddleware("ip", strategy="hash"),
],
)
# Custom PII type with regex
agent = create_agent(
"openai:gpt-5",
middleware=[
PIIMiddleware("api_key", detector=r"sk-[a-zA-Z0-9]{32}", strategy="block"),
],
)
state_schema
class-attribute
instance-attribute
¶
state_schema: type[StateT] = cast('type[StateT]', AgentState)
The schema for state passed to the middleware nodes.
before_agent
¶
Logic to run before the agent execution starts.
abefore_agent
async
¶
Async logic to run before the agent execution starts.
abefore_model
async
¶
Async logic to run before the model is called.
aafter_model
async
¶
Async logic to run after the model is called.
wrap_model_call
¶
wrap_model_call(
request: ModelRequest, handler: Callable[[ModelRequest], ModelResponse]
) -> ModelCallResult
Intercept and control model execution via handler callback.
The handler callback executes the model request and returns a ModelResponse
.
Middleware can call the handler multiple times for retry logic, skip calling
it to short-circuit, or modify the request/response. Multiple middleware
compose with first in list as outermost layer.
PARAMETER | DESCRIPTION |
---|---|
request
|
Model request to execute (includes state and runtime).
TYPE:
|
handler
|
Callback that executes the model request and returns
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
ModelCallResult
|
|
Examples:
Retry on error:
def wrap_model_call(self, request, handler):
for attempt in range(3):
try:
return handler(request)
except Exception:
if attempt == 2:
raise
Rewrite response:
def wrap_model_call(self, request, handler):
response = handler(request)
ai_msg = response.result[0]
return ModelResponse(
result=[AIMessage(content=f"[{ai_msg.content}]")],
structured_response=response.structured_response,
)
Error to fallback:
def wrap_model_call(self, request, handler):
try:
return handler(request)
except Exception:
return ModelResponse(result=[AIMessage(content="Service unavailable")])
Cache/short-circuit:
def wrap_model_call(self, request, handler):
if cached := get_cache(request):
return cached # Short-circuit with cached result
response = handler(request)
save_cache(request, response)
return response
Simple AIMessage return (converted automatically):
awrap_model_call
async
¶
awrap_model_call(
request: ModelRequest, handler: Callable[[ModelRequest], Awaitable[ModelResponse]]
) -> ModelCallResult
Intercept and control async model execution via handler callback.
The handler callback executes the model request and returns a ModelResponse
.
Middleware can call the handler multiple times for retry logic, skip calling
it to short-circuit, or modify the request/response. Multiple middleware
compose with first in list as outermost layer.
PARAMETER | DESCRIPTION |
---|---|
request
|
Model request to execute (includes state and runtime).
TYPE:
|
handler
|
Async callback that executes the model request and returns
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
ModelCallResult
|
ModelCallResult |
Examples:
Retry on error:
after_agent
¶
Logic to run after the agent execution completes.
aafter_agent
async
¶
Async logic to run after the agent execution completes.
wrap_tool_call
¶
wrap_tool_call(
request: ToolCallRequest,
handler: Callable[[ToolCallRequest], ToolMessage | Command],
) -> ToolMessage | Command
Intercept tool execution for retries, monitoring, or modification.
Multiple middleware compose automatically (first defined = outermost).
Exceptions propagate unless handle_tool_errors
is configured on ToolNode
.
PARAMETER | DESCRIPTION |
---|---|
request
|
Tool call request with call
TYPE:
|
handler
|
Callable to execute the tool (can be called multiple times).
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
ToolMessage | Command
|
|
The handler callable can be invoked multiple times for retry logic. Each call to handler is independent and stateless.
Examples:
Modify request before execution:
def wrap_tool_call(self, request, handler):
request.tool_call["args"]["value"] *= 2
return handler(request)
Retry on error (call handler multiple times):
def wrap_tool_call(self, request, handler):
for attempt in range(3):
try:
result = handler(request)
if is_valid(result):
return result
except Exception:
if attempt == 2:
raise
return result
Conditional retry based on response:
awrap_tool_call
async
¶
awrap_tool_call(
request: ToolCallRequest,
handler: Callable[[ToolCallRequest], Awaitable[ToolMessage | Command]],
) -> ToolMessage | Command
Intercept and control async tool execution via handler callback.
The handler callback executes the tool call and returns a ToolMessage
or
Command
. Middleware can call the handler multiple times for retry logic, skip
calling it to short-circuit, or modify the request/response. Multiple middleware
compose with first in list as outermost layer.
PARAMETER | DESCRIPTION |
---|---|
request
|
Tool call request with call
TYPE:
|
handler
|
Async callable to execute the tool and returns
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
ToolMessage | Command
|
|
The handler callable can be invoked multiple times for retry logic. Each call to handler is independent and stateless.
Examples:
Async retry on error:
__init__
¶
__init__(
pii_type: Literal["email", "credit_card", "ip", "mac_address", "url"] | str,
*,
strategy: Literal["block", "redact", "mask", "hash"] = "redact",
detector: Callable[[str], list[PIIMatch]] | str | None = None,
apply_to_input: bool = True,
apply_to_output: bool = False,
apply_to_tool_results: bool = False
) -> None
Initialize the PII detection middleware.
PARAMETER | DESCRIPTION |
---|---|
pii_type
|
Type of PII to detect. Can be a built-in type
(
TYPE:
|
strategy
|
How to handle detected PII:
TYPE:
|
detector
|
Custom detector function or regex pattern.
TYPE:
|
apply_to_input
|
Whether to check user messages before model call.
TYPE:
|
apply_to_output
|
Whether to check AI messages after model call.
TYPE:
|
apply_to_tool_results
|
Whether to check tool result messages after tool execution.
TYPE:
|
RAISES | DESCRIPTION |
---|---|
ValueError
|
If pii_type is not built-in and no detector is provided. |
before_model
¶
before_model(state: AgentState, runtime: Runtime) -> dict[str, Any] | None
Check user messages and tool results for PII before model invocation.
PARAMETER | DESCRIPTION |
---|---|
state
|
The current agent state.
TYPE:
|
runtime
|
The langgraph runtime.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
dict[str, Any] | None
|
Updated state with PII handled according to strategy, or None if no PII detected. |
RAISES | DESCRIPTION |
---|---|
PIIDetectionError
|
If PII is detected and strategy is "block". |
after_model
¶
after_model(state: AgentState, runtime: Runtime) -> dict[str, Any] | None
Check AI messages for PII after model invocation.
PARAMETER | DESCRIPTION |
---|---|
state
|
The current agent state.
TYPE:
|
runtime
|
The langgraph runtime.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
dict[str, Any] | None
|
Updated state with PII handled according to strategy, or None if no PII detected. |
RAISES | DESCRIPTION |
---|---|
PIIDetectionError
|
If PII is detected and strategy is "block". |
SummarizationMiddleware
¶
Bases: AgentMiddleware
Middleware that summarizes conversation history when token limits are approached.
This middleware monitors message token counts and automatically summarizes older messages when a threshold is reached, preserving recent messages and maintaining context continuity by ensuring AI/Tool message pairs remain together.
state_schema
class-attribute
instance-attribute
¶
state_schema: type[StateT] = cast('type[StateT]', AgentState)
The schema for state passed to the middleware nodes.
name
property
¶
name: str
The name of the middleware instance.
Defaults to the class name, but can be overridden for custom naming.
before_agent
¶
Logic to run before the agent execution starts.
abefore_agent
async
¶
Async logic to run before the agent execution starts.
abefore_model
async
¶
Async logic to run before the model is called.
after_model
¶
Logic to run after the model is called.
aafter_model
async
¶
Async logic to run after the model is called.
wrap_model_call
¶
wrap_model_call(
request: ModelRequest, handler: Callable[[ModelRequest], ModelResponse]
) -> ModelCallResult
Intercept and control model execution via handler callback.
The handler callback executes the model request and returns a ModelResponse
.
Middleware can call the handler multiple times for retry logic, skip calling
it to short-circuit, or modify the request/response. Multiple middleware
compose with first in list as outermost layer.
PARAMETER | DESCRIPTION |
---|---|
request
|
Model request to execute (includes state and runtime).
TYPE:
|
handler
|
Callback that executes the model request and returns
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
ModelCallResult
|
|
Examples:
Retry on error:
def wrap_model_call(self, request, handler):
for attempt in range(3):
try:
return handler(request)
except Exception:
if attempt == 2:
raise
Rewrite response:
def wrap_model_call(self, request, handler):
response = handler(request)
ai_msg = response.result[0]
return ModelResponse(
result=[AIMessage(content=f"[{ai_msg.content}]")],
structured_response=response.structured_response,
)
Error to fallback:
def wrap_model_call(self, request, handler):
try:
return handler(request)
except Exception:
return ModelResponse(result=[AIMessage(content="Service unavailable")])
Cache/short-circuit:
def wrap_model_call(self, request, handler):
if cached := get_cache(request):
return cached # Short-circuit with cached result
response = handler(request)
save_cache(request, response)
return response
Simple AIMessage return (converted automatically):
awrap_model_call
async
¶
awrap_model_call(
request: ModelRequest, handler: Callable[[ModelRequest], Awaitable[ModelResponse]]
) -> ModelCallResult
Intercept and control async model execution via handler callback.
The handler callback executes the model request and returns a ModelResponse
.
Middleware can call the handler multiple times for retry logic, skip calling
it to short-circuit, or modify the request/response. Multiple middleware
compose with first in list as outermost layer.
PARAMETER | DESCRIPTION |
---|---|
request
|
Model request to execute (includes state and runtime).
TYPE:
|
handler
|
Async callback that executes the model request and returns
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
ModelCallResult
|
ModelCallResult |
Examples:
Retry on error:
after_agent
¶
Logic to run after the agent execution completes.
aafter_agent
async
¶
Async logic to run after the agent execution completes.
wrap_tool_call
¶
wrap_tool_call(
request: ToolCallRequest,
handler: Callable[[ToolCallRequest], ToolMessage | Command],
) -> ToolMessage | Command
Intercept tool execution for retries, monitoring, or modification.
Multiple middleware compose automatically (first defined = outermost).
Exceptions propagate unless handle_tool_errors
is configured on ToolNode
.
PARAMETER | DESCRIPTION |
---|---|
request
|
Tool call request with call
TYPE:
|
handler
|
Callable to execute the tool (can be called multiple times).
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
ToolMessage | Command
|
|
The handler callable can be invoked multiple times for retry logic. Each call to handler is independent and stateless.
Examples:
Modify request before execution:
def wrap_tool_call(self, request, handler):
request.tool_call["args"]["value"] *= 2
return handler(request)
Retry on error (call handler multiple times):
def wrap_tool_call(self, request, handler):
for attempt in range(3):
try:
result = handler(request)
if is_valid(result):
return result
except Exception:
if attempt == 2:
raise
return result
Conditional retry based on response:
awrap_tool_call
async
¶
awrap_tool_call(
request: ToolCallRequest,
handler: Callable[[ToolCallRequest], Awaitable[ToolMessage | Command]],
) -> ToolMessage | Command
Intercept and control async tool execution via handler callback.
The handler callback executes the tool call and returns a ToolMessage
or
Command
. Middleware can call the handler multiple times for retry logic, skip
calling it to short-circuit, or modify the request/response. Multiple middleware
compose with first in list as outermost layer.
PARAMETER | DESCRIPTION |
---|---|
request
|
Tool call request with call
TYPE:
|
handler
|
Async callable to execute the tool and returns
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
ToolMessage | Command
|
|
The handler callable can be invoked multiple times for retry logic. Each call to handler is independent and stateless.
Examples:
Async retry on error:
__init__
¶
__init__(
model: str | BaseChatModel,
max_tokens_before_summary: int | None = None,
messages_to_keep: int = _DEFAULT_MESSAGES_TO_KEEP,
token_counter: TokenCounter = count_tokens_approximately,
summary_prompt: str = DEFAULT_SUMMARY_PROMPT,
summary_prefix: str = SUMMARY_PREFIX,
) -> None
Initialize the summarization middleware.
PARAMETER | DESCRIPTION |
---|---|
model
|
The language model to use for generating summaries.
TYPE:
|
max_tokens_before_summary
|
Token threshold to trigger summarization.
If
TYPE:
|
messages_to_keep
|
Number of recent messages to preserve after summarization.
TYPE:
|
token_counter
|
Function to count tokens in messages.
TYPE:
|
summary_prompt
|
Prompt template for generating summaries.
TYPE:
|
summary_prefix
|
Prefix added to system message when including summary.
TYPE:
|
ToolCallLimitMiddleware
¶
Bases: AgentMiddleware[ToolCallLimitState, Any]
Middleware that tracks tool call counts and enforces limits.
This middleware monitors the number of tool calls made during agent execution and can terminate the agent when specified limits are reached. It supports both thread-level and run-level call counting with configurable exit behaviors.
Thread-level: The middleware tracks the total number of tool calls and persists call count across multiple runs (invocations) of the agent.
Run-level: The middleware tracks the number of tool calls made during a single run (invocation) of the agent.
Example
from langchain.agents.middleware.tool_call_limit import ToolCallLimitMiddleware
from langchain.agents import create_agent
# Limit all tool calls globally
global_limiter = ToolCallLimitMiddleware(thread_limit=20, run_limit=10, exit_behavior="end")
# Limit a specific tool
search_limiter = ToolCallLimitMiddleware(
tool_name="search", thread_limit=5, run_limit=3, exit_behavior="end"
)
# Use both in the same agent
agent = create_agent("openai:gpt-4o", middleware=[global_limiter, search_limiter])
result = await agent.invoke({"messages": [HumanMessage("Help me with a task")]})
before_agent
¶
Logic to run before the agent execution starts.
abefore_agent
async
¶
Async logic to run before the agent execution starts.
abefore_model
async
¶
Async logic to run before the model is called.
aafter_model
async
¶
Async logic to run after the model is called.
wrap_model_call
¶
wrap_model_call(
request: ModelRequest, handler: Callable[[ModelRequest], ModelResponse]
) -> ModelCallResult
Intercept and control model execution via handler callback.
The handler callback executes the model request and returns a ModelResponse
.
Middleware can call the handler multiple times for retry logic, skip calling
it to short-circuit, or modify the request/response. Multiple middleware
compose with first in list as outermost layer.
PARAMETER | DESCRIPTION |
---|---|
request
|
Model request to execute (includes state and runtime).
TYPE:
|
handler
|
Callback that executes the model request and returns
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
ModelCallResult
|
|
Examples:
Retry on error:
def wrap_model_call(self, request, handler):
for attempt in range(3):
try:
return handler(request)
except Exception:
if attempt == 2:
raise
Rewrite response:
def wrap_model_call(self, request, handler):
response = handler(request)
ai_msg = response.result[0]
return ModelResponse(
result=[AIMessage(content=f"[{ai_msg.content}]")],
structured_response=response.structured_response,
)
Error to fallback:
def wrap_model_call(self, request, handler):
try:
return handler(request)
except Exception:
return ModelResponse(result=[AIMessage(content="Service unavailable")])
Cache/short-circuit:
def wrap_model_call(self, request, handler):
if cached := get_cache(request):
return cached # Short-circuit with cached result
response = handler(request)
save_cache(request, response)
return response
Simple AIMessage return (converted automatically):
awrap_model_call
async
¶
awrap_model_call(
request: ModelRequest, handler: Callable[[ModelRequest], Awaitable[ModelResponse]]
) -> ModelCallResult
Intercept and control async model execution via handler callback.
The handler callback executes the model request and returns a ModelResponse
.
Middleware can call the handler multiple times for retry logic, skip calling
it to short-circuit, or modify the request/response. Multiple middleware
compose with first in list as outermost layer.
PARAMETER | DESCRIPTION |
---|---|
request
|
Model request to execute (includes state and runtime).
TYPE:
|
handler
|
Async callback that executes the model request and returns
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
ModelCallResult
|
ModelCallResult |
Examples:
Retry on error:
after_agent
¶
Logic to run after the agent execution completes.
aafter_agent
async
¶
Async logic to run after the agent execution completes.
wrap_tool_call
¶
wrap_tool_call(
request: ToolCallRequest,
handler: Callable[[ToolCallRequest], ToolMessage | Command],
) -> ToolMessage | Command
Intercept tool execution for retries, monitoring, or modification.
Multiple middleware compose automatically (first defined = outermost).
Exceptions propagate unless handle_tool_errors
is configured on ToolNode
.
PARAMETER | DESCRIPTION |
---|---|
request
|
Tool call request with call
TYPE:
|
handler
|
Callable to execute the tool (can be called multiple times).
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
ToolMessage | Command
|
|
The handler callable can be invoked multiple times for retry logic. Each call to handler is independent and stateless.
Examples:
Modify request before execution:
def wrap_tool_call(self, request, handler):
request.tool_call["args"]["value"] *= 2
return handler(request)
Retry on error (call handler multiple times):
def wrap_tool_call(self, request, handler):
for attempt in range(3):
try:
result = handler(request)
if is_valid(result):
return result
except Exception:
if attempt == 2:
raise
return result
Conditional retry based on response:
awrap_tool_call
async
¶
awrap_tool_call(
request: ToolCallRequest,
handler: Callable[[ToolCallRequest], Awaitable[ToolMessage | Command]],
) -> ToolMessage | Command
Intercept and control async tool execution via handler callback.
The handler callback executes the tool call and returns a ToolMessage
or
Command
. Middleware can call the handler multiple times for retry logic, skip
calling it to short-circuit, or modify the request/response. Multiple middleware
compose with first in list as outermost layer.
PARAMETER | DESCRIPTION |
---|---|
request
|
Tool call request with call
TYPE:
|
handler
|
Async callable to execute the tool and returns
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
ToolMessage | Command
|
|
The handler callable can be invoked multiple times for retry logic. Each call to handler is independent and stateless.
Examples:
Async retry on error:
state_schema
class-attribute
instance-attribute
¶
The schema for state passed to the middleware nodes.
__init__
¶
__init__(
*,
tool_name: str | None = None,
thread_limit: int | None = None,
run_limit: int | None = None,
exit_behavior: Literal["end", "error"] = "end"
) -> None
Initialize the tool call limit middleware.
PARAMETER | DESCRIPTION |
---|---|
tool_name
|
Name of the specific tool to limit. If
TYPE:
|
thread_limit
|
Maximum number of tool calls allowed per thread.
None means no limit. Defaults to
TYPE:
|
run_limit
|
Maximum number of tool calls allowed per run.
None means no limit. Defaults to
TYPE:
|
exit_behavior
|
What to do when limits are exceeded. - "end": Jump to the end of the agent execution and inject an artificial AI message indicating that the limit was exceeded. - "error": Raise a ToolCallLimitExceededError Defaults to "end".
TYPE:
|
RAISES | DESCRIPTION |
---|---|
ValueError
|
If both limits are |
name
property
¶
name: str
The name of the middleware instance.
Includes the tool name if specified to allow multiple instances of this middleware with different tool names.
before_model
¶
Check tool call limits before making a model call.
PARAMETER | DESCRIPTION |
---|---|
state
|
The current agent state containing tool call counts.
TYPE:
|
runtime
|
The langgraph runtime.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
dict[str, Any] | None
|
If limits are exceeded and exit_behavior is "end", returns |
dict[str, Any] | None
|
a Command to jump to the end with a limit exceeded message. Otherwise returns None. |
RAISES | DESCRIPTION |
---|---|
ToolCallLimitExceededError
|
If limits are exceeded and exit_behavior is "error". |
after_model
¶
Increment tool call counts after a model call (when tool calls are made).
PARAMETER | DESCRIPTION |
---|---|
state
|
The current agent state.
TYPE:
|
runtime
|
The langgraph runtime.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
dict[str, Any] | None
|
State updates with incremented tool call counts if tool calls were made. |
AgentMiddleware
¶
Bases: Generic[StateT, ContextT]
Base middleware class for an agent.
Subclass this and implement any of the defined methods to customize agent behavior between steps in the main agent loop.
state_schema
class-attribute
instance-attribute
¶
state_schema: type[StateT] = cast('type[StateT]', AgentState)
The schema for state passed to the middleware nodes.
name
property
¶
name: str
The name of the middleware instance.
Defaults to the class name, but can be overridden for custom naming.
before_agent
¶
Logic to run before the agent execution starts.
abefore_agent
async
¶
Async logic to run before the agent execution starts.
before_model
¶
Logic to run before the model is called.
abefore_model
async
¶
Async logic to run before the model is called.
after_model
¶
Logic to run after the model is called.
aafter_model
async
¶
Async logic to run after the model is called.
wrap_model_call
¶
wrap_model_call(
request: ModelRequest, handler: Callable[[ModelRequest], ModelResponse]
) -> ModelCallResult
Intercept and control model execution via handler callback.
The handler callback executes the model request and returns a ModelResponse
.
Middleware can call the handler multiple times for retry logic, skip calling
it to short-circuit, or modify the request/response. Multiple middleware
compose with first in list as outermost layer.
PARAMETER | DESCRIPTION |
---|---|
request
|
Model request to execute (includes state and runtime).
TYPE:
|
handler
|
Callback that executes the model request and returns
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
ModelCallResult
|
|
Examples:
Retry on error:
def wrap_model_call(self, request, handler):
for attempt in range(3):
try:
return handler(request)
except Exception:
if attempt == 2:
raise
Rewrite response:
def wrap_model_call(self, request, handler):
response = handler(request)
ai_msg = response.result[0]
return ModelResponse(
result=[AIMessage(content=f"[{ai_msg.content}]")],
structured_response=response.structured_response,
)
Error to fallback:
def wrap_model_call(self, request, handler):
try:
return handler(request)
except Exception:
return ModelResponse(result=[AIMessage(content="Service unavailable")])
Cache/short-circuit:
def wrap_model_call(self, request, handler):
if cached := get_cache(request):
return cached # Short-circuit with cached result
response = handler(request)
save_cache(request, response)
return response
Simple AIMessage return (converted automatically):
awrap_model_call
async
¶
awrap_model_call(
request: ModelRequest, handler: Callable[[ModelRequest], Awaitable[ModelResponse]]
) -> ModelCallResult
Intercept and control async model execution via handler callback.
The handler callback executes the model request and returns a ModelResponse
.
Middleware can call the handler multiple times for retry logic, skip calling
it to short-circuit, or modify the request/response. Multiple middleware
compose with first in list as outermost layer.
PARAMETER | DESCRIPTION |
---|---|
request
|
Model request to execute (includes state and runtime).
TYPE:
|
handler
|
Async callback that executes the model request and returns
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
ModelCallResult
|
ModelCallResult |
Examples:
Retry on error:
after_agent
¶
Logic to run after the agent execution completes.
aafter_agent
async
¶
Async logic to run after the agent execution completes.
wrap_tool_call
¶
wrap_tool_call(
request: ToolCallRequest,
handler: Callable[[ToolCallRequest], ToolMessage | Command],
) -> ToolMessage | Command
Intercept tool execution for retries, monitoring, or modification.
Multiple middleware compose automatically (first defined = outermost).
Exceptions propagate unless handle_tool_errors
is configured on ToolNode
.
PARAMETER | DESCRIPTION |
---|---|
request
|
Tool call request with call
TYPE:
|
handler
|
Callable to execute the tool (can be called multiple times).
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
ToolMessage | Command
|
|
The handler callable can be invoked multiple times for retry logic. Each call to handler is independent and stateless.
Examples:
Modify request before execution:
def wrap_tool_call(self, request, handler):
request.tool_call["args"]["value"] *= 2
return handler(request)
Retry on error (call handler multiple times):
def wrap_tool_call(self, request, handler):
for attempt in range(3):
try:
result = handler(request)
if is_valid(result):
return result
except Exception:
if attempt == 2:
raise
return result
Conditional retry based on response:
awrap_tool_call
async
¶
awrap_tool_call(
request: ToolCallRequest,
handler: Callable[[ToolCallRequest], Awaitable[ToolMessage | Command]],
) -> ToolMessage | Command
Intercept and control async tool execution via handler callback.
The handler callback executes the tool call and returns a ToolMessage
or
Command
. Middleware can call the handler multiple times for retry logic, skip
calling it to short-circuit, or modify the request/response. Multiple middleware
compose with first in list as outermost layer.
PARAMETER | DESCRIPTION |
---|---|
request
|
Tool call request with call
TYPE:
|
handler
|
Async callable to execute the tool and returns
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
ToolMessage | Command
|
|
The handler callable can be invoked multiple times for retry logic. Each call to handler is independent and stateless.
Examples:
Async retry on error:
ClearToolUsesEdit
dataclass
¶
Bases: ContextEdit
Configuration for clearing tool outputs when token limits are exceeded.
trigger
class-attribute
instance-attribute
¶
trigger: int = 100000
Token count that triggers the edit.
clear_at_least
class-attribute
instance-attribute
¶
clear_at_least: int = 0
Minimum number of tokens to reclaim when the edit runs.
keep
class-attribute
instance-attribute
¶
keep: int = 3
Number of most recent tool results that must be preserved.
clear_tool_inputs
class-attribute
instance-attribute
¶
clear_tool_inputs: bool = False
Whether to clear the originating tool call parameters on the AI message.
exclude_tools
class-attribute
instance-attribute
¶
List of tool names to exclude from clearing.
placeholder
class-attribute
instance-attribute
¶
placeholder: str = DEFAULT_TOOL_PLACEHOLDER
Placeholder text inserted for cleared tool outputs.
apply
¶
apply(messages: list[AnyMessage], *, count_tokens: TokenCounter) -> None
Apply the clear-tool-uses strategy.
InterruptOnConfig
¶
Bases: TypedDict
Configuration for an action requiring human in the loop.
This is the configuration format used in the HumanInTheLoopMiddleware.__init__
method.
allowed_decisions
instance-attribute
¶
allowed_decisions: list[DecisionType]
The decisions that are allowed for this action.
description
instance-attribute
¶
description: NotRequired[str | _DescriptionFactory]
The description attached to the request for human input.
Can be either:
- A static string describing the approval request
- A callable that dynamically generates the description based on agent state, runtime, and tool call information
Example
# Static string description
config = ToolConfig(
allowed_decisions=["approve", "reject"],
description="Please review this tool execution"
)
# Dynamic callable description
def format_tool_description(
tool_call: ToolCall,
state: AgentState,
runtime: Runtime
) -> str:
import json
return (
f"Tool: {tool_call['name']}\n"
f"Arguments:\n{json.dumps(tool_call['args'], indent=2)}"
)
config = InterruptOnConfig(
allowed_decisions=["approve", "edit", "reject"],
description=format_tool_description
)
args_schema
instance-attribute
¶
args_schema: NotRequired[dict[str, Any]]
JSON schema for the args associated with the action, if edits are allowed.
ModelRequest
dataclass
¶
Model request information for the agent.
override
¶
override(**overrides: Unpack[_ModelRequestOverrides]) -> ModelRequest
Replace the request with a new request with the given overrides.
Returns a new ModelRequest
instance with the specified attributes replaced.
This follows an immutable pattern, leaving the original request unchanged.
PARAMETER | DESCRIPTION |
---|---|
**overrides
|
Keyword arguments for attributes to override. Supported keys: - model: BaseChatModel instance - system_prompt: Optional system prompt string - messages: List of messages - tool_choice: Tool choice configuration - tools: List of available tools - response_format: Response format specification - model_settings: Additional model settings
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
ModelRequest
|
New ModelRequest instance with specified overrides applied. |
Examples:
ModelResponse
dataclass
¶
Response from model execution including messages and optional structured output.
The result will usually contain a single AIMessage, but may include an additional ToolMessage if the model used a tool for structured output.
before_model
¶
before_model(
func: _CallableWithStateAndRuntime[StateT, ContextT] | None = None,
*,
state_schema: type[StateT] | None = None,
tools: list[BaseTool] | None = None,
can_jump_to: list[JumpTo] | None = None,
name: str | None = None
) -> (
Callable[
[_CallableWithStateAndRuntime[StateT, ContextT]],
AgentMiddleware[StateT, ContextT],
]
| AgentMiddleware[StateT, ContextT]
)
Decorator used to dynamically create a middleware with the before_model
hook.
PARAMETER | DESCRIPTION |
---|---|
func
|
The function to be decorated. Must accept:
TYPE:
|
state_schema
|
Optional custom state schema type. If not provided, uses the default
TYPE:
|
tools
|
Optional list of additional tools to register with this middleware. |
can_jump_to
|
Optional list of valid jump destinations for conditional edges.
Valid values are:
TYPE:
|
name
|
Optional name for the generated middleware class. If not provided, uses the decorated function's name.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Callable[[_CallableWithStateAndRuntime[StateT, ContextT]], AgentMiddleware[StateT, ContextT]] | AgentMiddleware[StateT, ContextT]
|
Either an |
Callable[[_CallableWithStateAndRuntime[StateT, ContextT]], AgentMiddleware[StateT, ContextT]] | AgentMiddleware[StateT, ContextT]
|
decorator function that can be applied to a function it is wrapping. |
The decorated function should return
dict[str, Any]
- State updates to merge into the agent stateCommand
- A command to control flow (e.g., jump to different node)None
- No state updates or flow control
Examples:
Basic usage:
@before_model
def log_before_model(state: AgentState, runtime: Runtime) -> None:
print(f"About to call model with {len(state['messages'])} messages")
With conditional jumping:
@before_model(can_jump_to=["end"])
def conditional_before_model(state: AgentState, runtime: Runtime) -> dict[str, Any] | None:
if some_condition(state):
return {"jump_to": "end"}
return None
With custom state schema:
after_model
¶
after_model(
func: _CallableWithStateAndRuntime[StateT, ContextT] | None = None,
*,
state_schema: type[StateT] | None = None,
tools: list[BaseTool] | None = None,
can_jump_to: list[JumpTo] | None = None,
name: str | None = None
) -> (
Callable[
[_CallableWithStateAndRuntime[StateT, ContextT]],
AgentMiddleware[StateT, ContextT],
]
| AgentMiddleware[StateT, ContextT]
)
Decorator used to dynamically create a middleware with the after_model
hook.
PARAMETER | DESCRIPTION |
---|---|
func
|
The function to be decorated. Must accept:
TYPE:
|
state_schema
|
Optional custom state schema type. If not provided, uses the
default
TYPE:
|
tools
|
Optional list of additional tools to register with this middleware. |
can_jump_to
|
Optional list of valid jump destinations for conditional edges.
Valid values are:
TYPE:
|
name
|
Optional name for the generated middleware class. If not provided, uses the decorated function's name.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Callable[[_CallableWithStateAndRuntime[StateT, ContextT]], AgentMiddleware[StateT, ContextT]] | AgentMiddleware[StateT, ContextT]
|
Either an |
Callable[[_CallableWithStateAndRuntime[StateT, ContextT]], AgentMiddleware[StateT, ContextT]] | AgentMiddleware[StateT, ContextT]
|
function that can be applied to a function. |
The decorated function should return
dict[str, Any]
- State updates to merge into the agent stateCommand
- A command to control flow (e.g., jump to different node)None
- No state updates or flow control
Examples:
Basic usage for logging model responses:
@after_model
def log_latest_message(state: AgentState, runtime: Runtime) -> None:
print(state["messages"][-1].content)
With custom state schema:
wrap_model_call
¶
wrap_model_call(
func: _CallableReturningModelResponse[StateT, ContextT] | None = None,
*,
state_schema: type[StateT] | None = None,
tools: list[BaseTool] | None = None,
name: str | None = None
) -> (
Callable[
[_CallableReturningModelResponse[StateT, ContextT]],
AgentMiddleware[StateT, ContextT],
]
| AgentMiddleware[StateT, ContextT]
)
Create middleware with wrap_model_call
hook from a function.
Converts a function with handler callback into middleware that can intercept model calls, implement retry logic, handle errors, and rewrite responses.
PARAMETER | DESCRIPTION |
---|---|
func
|
Function accepting (request, handler) that calls handler(request)
to execute the model and returns
TYPE:
|
state_schema
|
Custom state schema. Defaults to
TYPE:
|
tools
|
Additional tools to register with this middleware. |
name
|
Middleware class name. Defaults to function name.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Callable[[_CallableReturningModelResponse[StateT, ContextT]], AgentMiddleware[StateT, ContextT]] | AgentMiddleware[StateT, ContextT]
|
|
Examples:
Basic retry logic:
@wrap_model_call
def retry_on_error(request, handler):
max_retries = 3
for attempt in range(max_retries):
try:
return handler(request)
except Exception:
if attempt == max_retries - 1:
raise
Model fallback:
@wrap_model_call
def fallback_model(request, handler):
# Try primary model
try:
return handler(request)
except Exception:
pass
# Try fallback model
request.model = fallback_model_instance
return handler(request)
Rewrite response content (full ModelResponse):
@wrap_model_call
def uppercase_responses(request, handler):
response = handler(request)
ai_msg = response.result[0]
return ModelResponse(
result=[AIMessage(content=ai_msg.content.upper())],
structured_response=response.structured_response,
)
Simple AIMessage return (converted automatically):
wrap_tool_call
¶
wrap_tool_call(
func: _CallableReturningToolResponse | None = None,
*,
tools: list[BaseTool] | None = None,
name: str | None = None
) -> Callable[[_CallableReturningToolResponse], AgentMiddleware] | AgentMiddleware
Create middleware with wrap_tool_call
hook from a function.
Converts a function with handler callback into middleware that can intercept tool calls, implement retry logic, monitor execution, and modify responses.
PARAMETER | DESCRIPTION |
---|---|
func
|
Function accepting (request, handler) that calls
handler(request) to execute the tool and returns final
TYPE:
|
tools
|
Additional tools to register with this middleware. |
name
|
Middleware class name. Defaults to function name.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Callable[[_CallableReturningToolResponse], AgentMiddleware] | AgentMiddleware
|
|
Examples:
Retry logic:
@wrap_tool_call
def retry_on_error(request, handler):
max_retries = 3
for attempt in range(max_retries):
try:
return handler(request)
except Exception:
if attempt == max_retries - 1:
raise
Async retry logic:
@wrap_tool_call
async def async_retry(request, handler):
for attempt in range(3):
try:
return await handler(request)
except Exception:
if attempt == 2:
raise
Modify request:
@wrap_tool_call
def modify_args(request, handler):
request.tool_call["args"]["value"] *= 2
return handler(request)
Short-circuit with cached result:
ModelRequest
dataclass
¶
Model request information for the agent.
override
¶
override(**overrides: Unpack[_ModelRequestOverrides]) -> ModelRequest
Replace the request with a new request with the given overrides.
Returns a new ModelRequest
instance with the specified attributes replaced.
This follows an immutable pattern, leaving the original request unchanged.
PARAMETER | DESCRIPTION |
---|---|
**overrides
|
Keyword arguments for attributes to override. Supported keys: - model: BaseChatModel instance - system_prompt: Optional system prompt string - messages: List of messages - tool_choice: Tool choice configuration - tools: List of available tools - response_format: Response format specification - model_settings: Additional model settings
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
ModelRequest
|
New ModelRequest instance with specified overrides applied. |
Examples: