Summarization middleware for automatic and tool-based conversation compaction.
This module provides two middleware classes and a convenience factory:
SummarizationMiddleware — automatically compacts the conversation when token
usage exceeds a configurable threshold.
Older messages are summarized via an LLM call and the full history is offloaded to a backend for later retrieval.
SummarizationToolMiddleware — exposes a compact_conversation tool that
lets the agent (or a human-in-the-loop approval flow) trigger compaction on
demand.
Composes with a SummarizationMiddleware instance and reuses its
summarization engine.
create_summarization_tool_middleware — convenience factory that creates both
middleware layers with model-aware defaults.
from deepagents import create_deep_agent
from deepagents.middleware.summarization import (
SummarizationMiddleware,
SummarizationToolMiddleware,
)
from deepagents.backends import FilesystemBackend
backend = FilesystemBackend(root_dir="/data")
summ = SummarizationMiddleware(
model="gpt-5.4-mini",
backend=backend,
trigger=("fraction", 0.85),
keep=("fraction", 0.10),
)
tool_mw = SummarizationToolMiddleware(summ)
agent = create_deep_agent(middleware=[summ, tool_mw])
Offloaded messages are stored as markdown at /conversation_history/{thread_id}.md.
Each summarization event appends a new section to this file, creating a running log of all evicted messages.
Public alias for _DeepAgentsSummarizationMiddleware.
This is the name external callers should import and reference.
Emit a deprecation warning with caller-controlled stack attribution.
langchain_core.warn_deprecated formats a standard message but hardcodes
stacklevel=4 in its internal warnings.warn call. That value targets a
decorator-wrapped frame layout; when called directly from a deprecated
method's body the warning is attributed one frame too high (above the
user's call site). This wrapper captures the formatted upstream warning
and re-emits it with an explicit stacklevel, so the warning points at
the user's call site.
Append text to a system message.
Compute default summarization settings based on model profile.
Create a Deep Agents SummarizationMiddleware with model-aware defaults.
deepagentsThe Deep Agents SummarizationMiddleware wraps
langchain.agents.middleware.SummarizationMiddleware to add behavior
long-running, file-aware agents need. Prefer LangChain's middleware
directly if none of the below apply:
/conversation_history/{thread_id}.md (default path) on the
configured backend before the summary replaces them, and the
summary embeds that path so the agent can re-open it via
read_file when FilesystemMiddleware is registered. LangChain
drops evicted messages with no recovery path.write_file /
edit_file arguments in older messages are clipped at a lower
threshold than full compaction, often reclaiming enough context
to skip summarizing. Configured via truncate_args_settings.ContextOverflowError fallback. On a provider over-budget
rejection the middleware summarizes and retries instead of
bubbling the error up._summarization_event field via wrap_model_call,
leaving state["messages"] intact. LangChain rewrites it with
RemoveMessage(id=REMOVE_ALL_MESSAGES) from before_model.
Preserving the raw log enables replay, evals, and shared state
with SummarizationToolMiddleware's compact_conversation tool.trigger=None and
keep=("messages", 20). This factory picks fraction-based
defaults from the model's profile when max_input_tokens is
exposed, falling back to fixed counts otherwise — see
compute_summarization_defaults.Create a SummarizationToolMiddleware with model-aware defaults.
Convenience factory: builds a SummarizationMiddleware via
create_summarization_middleware
and wraps it in a SummarizationToolMiddleware. Saves a step and
accepts a model string.
Only the tool layer is registered — the wrapped SummarizationMiddleware
is the engine the tool calls into, not a middleware that runs on its
own. The agent gains:
compact_conversation tool to compact its own context windowFor automatic summarization at the trigger threshold, also register
a SummarizationMiddleware. create_deep_agent adds one by default,
so dropping create_summarization_tool_middleware(...) into its
middleware=[...] gives you both layers; they share state via the
_summarization_event key.
Routes file operations to different backends by path prefix.
Matches paths against route prefixes (longest first) and delegates to the corresponding backend. Unmatched paths use the default backend.
Input schema for the compact_conversation tool.
Represents a summarization event.
Settings for truncating large tool-call arguments in older messages.
This is a lightweight, pre-summarization optimization that fires at a lower
token threshold than full conversation compaction. When triggered, only the
args values on AIMessage.tool_calls in messages before the keep window
are shortened — recent messages are left intact.
Typical large arguments include write_file content, edit_file patches,
and verbose execute outputs.
State for the summarization middleware.
Extends AgentState with a private field for tracking summarization events.
Default settings computed from model profile.
Middleware that provides a compact_conversation tool for manual compaction.
This middleware composes with a SummarizationMiddleware instance, reusing
its summarization engine (model, backend, trigger thresholds) to let the
agent compact its own context window.
This middleware never compacts automatically. Compaction only occurs when
compact_conversation is called as a normal tool call (by the model or by
an explicit user action, e.g. as implemented in the deepagents-cli).
To avoid compacting too early, compact tool execution is gated by
_is_eligible_for_compaction, which requires reported usage to reach about
50% of the configured auto-summarization trigger.
The tool and auto-summarization share the same _summarization_event state
key, so they interoperate correctly.
For a simpler setup, use create_summarization_tool_middleware which
handles both steps.
Protocol for pluggable memory backends (single, unified).
Backends can store files in different locations (state, filesystem, database, etc.) and provide a uniform interface for file operations.
All file data is represented as dicts with the following structure:
{
"content": str, # Text content (utf-8) or base64-encoded binary
"encoding": str, # "utf-8" for text, "base64" for binary data
"created_at": str, # ISO format timestamp
"modified_at": str, # ISO format timestamp
}