Function●Since v1.4

bedrockPromptCachingMiddleware

bedrockPromptCachingMiddleware(
  middlewareOptions: Partial<__type>
): AgentMiddleware<undefined, ZodObject<__type, "strip", ZodTypeAny

View source on GitHub

Name	Type	Description
`middlewareOptions`	`Partial<__type>`

Creates a prompt caching middleware for AWS Bedrock Converse models to optimize API usage.

This middleware automatically enables Bedrock's prompt caching when using AWS Bedrock Converse models. This can significantly reduce costs for applications with repetitive prompts, long system messages, or extensive conversation histories.

How It Works

The middleware intercepts model requests and sets a cache control signal that ChatBedrockConverse translates into Bedrock cachePoint breakpoints. Cache points are inserted after the system prompt, after the tool definitions, and after the final message, so the stable prefix of each request is cached. On subsequent requests with a matching prefix, the cached representations are reused, skipping redundant token processing. Exact placement varies by model (e.g. Amazon Nova models cache fewer breakpoints and ignore the "1h" TTL).

Benefits

Cost Reduction: Avoid reprocessing the same tokens repeatedly
Lower Latency: Cached prompts are processed faster as embeddings are pre-computed
Better Scalability: Reduced computational load enables handling more requests
Consistent Performance: Stable response times for repetitive queries

Bedrock Converse Only: This middleware only applies caching to AWS Bedrock Converse models. Other providers are handled per unsupportedModelBehavior
Supported Families: Bedrock prompt caching is only available on the Anthropic Claude and Amazon Nova model families. Other Bedrock Converse models (e.g. Mistral, Cohere, Meta) reject cache points at request time, so they are treated as unsupported and routed through unsupportedModelBehavior
Automatic Application: Caching is applied automatically when the message count reaches minMessagesToCache
TTL Options: Only supports "5m" (5 minutes) and "1h" (1 hour) as TTL values; actual support varies by model
Best Use Cases: Long system prompts, multi-turn conversations, repetitive queries, RAG applications

import { createAgent } from "langchain";
import { bedrockPromptCachingMiddleware } from "langchain";

const agent = createAgent({
  model: "bedrock:anthropic.claude-haiku-4-5-20251001-v1:0",
  middleware: [
    bedrockPromptCachingMiddleware()
  ]
});

const cachingMiddleware = bedrockPromptCachingMiddleware({
  ttl: "1h",  // Cache for 1 hour instead of default 5 minutes
  minMessagesToCache: 5  // Only cache after 5 messages
});

const agent = createAgent({
  model: "bedrock:anthropic.claude-haiku-4-5-20251001-v1:0",
  systemPrompt: "You are a helpful assistant with deep knowledge of...", // Long system prompt
  middleware: [cachingMiddleware]
});

const agent = createAgent({
  model: "bedrock:anthropic.claude-haiku-4-5-20251001-v1:0",
  middleware: [
    bedrockPromptCachingMiddleware({
      enableCaching: true,
      ttl: "5m"
    })
  ]
});

// Disable caching for specific requests
await agent.invoke(
  { messages: [new HumanMessage("Process this without caching")] },
  {
    configurable: {
      middleware_context: { enableCaching: false }
    }
  }
);

const supportAgent = createAgent({
  model: "bedrock:anthropic.claude-haiku-4-5-20251001-v1:0",
  systemPrompt: `You are a customer support agent for ACME Corp.

    Company policies:
    - Always be polite and professional
    - Refer to knowledge base for product information
    - Escalate billing issues to human agents
    ... (extensive policies and guidelines)
  `,
  tools: [searchKnowledgeBase, createTicket, checkOrderStatus],
  middleware: [
    bedrockPromptCachingMiddleware({
      ttl: "1h",  // Long TTL for stable system prompt
      minMessagesToCache: 1  // Cache immediately due to large system prompt
    })
  ]
});

LangChain Assistant

Menu

bedrockPromptCachingMiddleware

Parameters

Example 1

Example 2

Example 3

Example 4

How It Works

Benefits