Summarizes conversation history when token limits are approached.
This middleware monitors message token counts and automatically summarizes older messages when a threshold is reached, preserving recent messages and maintaining context continuity by ensuring AI/Tool message pairs remain together.
SummarizationMiddleware(
self,
model: str | BaseChatModel,
*,
trigger: ContextSize | list[ContextSize] | None = None,
keep: ContextSize = ('messages', _DEFAULT_MESSAGES_TO_KEEP),
token_counter: TokenCounter = count_tokens_approximately,
summary_prompt: str = DEFAULT_SUMMARY_PROMPT,
trim_tokens_to_summarize: int | None = _DEFAULT_TRIM_TOKEN_LIMIT,
**deprecated_kwargs: Any = {}
)| Name | Type | Description |
|---|---|---|
model* | str | BaseChatModel | The language model to use for generating summaries. |
trigger | ContextSize | list[ContextSize] | None | Default: NoneOne or more thresholds that trigger summarization. Provide a single
Example
See |
keep | ContextSize | Default: ('messages', _DEFAULT_MESSAGES_TO_KEEP)Context retention policy applied after summarization. Provide a Defaults to keeping the most recent Does not support multiple values like Example |
token_counter | TokenCounter | Default: count_tokens_approximatelyFunction to count tokens in messages. |
summary_prompt | str | Default: DEFAULT_SUMMARY_PROMPTPrompt template for generating summaries. |
trim_tokens_to_summarize | int | None | Default: _DEFAULT_TRIM_TOKEN_LIMITMaximum tokens to keep when preparing messages for the summarization call. Pass |
| Name | Type |
|---|---|
| model | str | BaseChatModel |
| trigger | ContextSize | list[ContextSize] | None |
| keep | ContextSize |
| token_counter | TokenCounter |
| summary_prompt | str |
| trim_tokens_to_summarize | int | None |
Logic to run before the agent execution starts.
Async logic to run before the agent execution starts.
Logic to run after the model is called.
Async logic to run after the model is called.
Intercept and control model execution via handler callback.
Intercept and control async model execution via handler callback.
Logic to run after the agent execution completes.
Async logic to run after the agent execution completes.
Intercept tool execution for retries, monitoring, or modification.
Intercept and control async tool execution via handler callback.