Prompt Caching Middleware for ChatBedrock (InvokeModel API).
Optimizes API usage by caching conversation prefixes for Anthropic models on AWS Bedrock. Adds cache_control to the last message, caching the entire conversation prefix (including system prompt) for improved cache hits in multi-turn conversations.
Requires both 'langchain' and 'langchain-aws' packages to be installed.
Learn more about Anthropic prompt caching here.
BedrockPromptCachingMiddleware(
self,
type: Literal['ephemeral'] = 'ephemeral',
ttl: Literal['5m', '1h'] = '5m',
min_messages_to_cache: int = 0,
unsupported_model_behavior: Literal['ignore', 'warn', 'raise'] = 'warn'
)| Name | Type | Description |
|---|---|---|
type | Literal['ephemeral'] | Default: 'ephemeral'The type of cache to use, only "ephemeral" is supported. |
ttl | Literal['5m', '1h'] | Default: '5m'The time to live for the cache, only "5m" and "1h" are supported. |
min_messages_to_cache | int | Default: 0The minimum number of messages until the cache is used, default is 0. |
unsupported_model_behavior | Literal['ignore', 'warn', 'raise'] | Default: 'warn'The behavior to take when an unsupported model is used. "ignore" will ignore the unsupported model and continue without caching. "warn" will warn the user and continue without caching. "raise" will raise an error and stop the agent. |