Function●Since v1.1

anthropicPromptCachingMiddleware

Creates a prompt caching middleware for Anthropic models to optimize API usage.

This middleware automatically adds cache control headers to the last messages when using Anthropic models, enabling their prompt caching feature. This can significantly reduce costs for applications with repetitive prompts, long system messages, or extensive conversation histories.

How It Works

The middleware intercepts model requests and adds cache control metadata that tells Anthropic's API to cache processed prompt prefixes. On subsequent requests with matching prefixes, the cached representations are reused, skipping redundant token processing.

Benefits

Cost Reduction: Avoid reprocessing the same tokens repeatedly (up to 90% savings on cached portions)
Lower Latency: Cached prompts are processed faster as embeddings are pre-computed
Better Scalability: Reduced computational load enables handling more requests
Consistent Performance: Stable response times for repetitive queries

anthropicPromptCachingMiddleware(
  middlewareOptions: Partial<__type>
): AgentMiddleware<undefined, ZodObject<__type, "strip", ZodTypeAny, __type, __type>, __type, readonly ClientTool | ServerTool[]>

Anthropic Only: This middleware only works with Anthropic models and will throw an error if used with other providers
Automatic Application: Caching is applied automatically when message count exceeds minMessagesToCache
Cache Scope: Caches are isolated per API key and cannot be shared across different keys
TTL Options: Only supports "5m" (5 minutes) and "1h" (1 hour) as TTL values per Anthropic's API
Best Use Cases: Long system prompts, multi-turn conversations, repetitive queries, RAG applications
Cost Impact: Cached tokens are billed at 10% of the base input token price, cache writes are billed at 25% of the base

Parameters

Name	Type	Description
`middlewareOptions`	`Partial<__type>`	Configuration options for the caching behavior

Example 1

Basic usage with default settings

import { createAgent } from "langchain";
import { anthropicPromptCachingMiddleware } from "langchain";

const agent = createAgent({
  model: "anthropic:claude-sonnet-4-5",
  middleware: [
    anthropicPromptCachingMiddleware()
  ]
});

Example 2

Custom configuration for longer conversations

const cachingMiddleware = anthropicPromptCachingMiddleware({
  ttl: "1h",  // Cache for 1 hour instead of default 5 minutes
  minMessagesToCache: 5  // Only cache after 5 messages
});

const agent = createAgent({
  model: "anthropic:claude-sonnet-4-5",
  systemPrompt: "You are a helpful assistant with deep knowledge of...", // Long system prompt
  middleware: [cachingMiddleware]
});

Example 3

Conditional caching based on runtime context

const agent = createAgent({
  model: "anthropic:claude-sonnet-4-5",
  middleware: [
    anthropicPromptCachingMiddleware({
      enableCaching: true,
      ttl: "5m"
    })
  ]
});

// Disable caching for specific requests
await agent.invoke(
  { messages: [new HumanMessage("Process this without caching")] },
  {
    configurable: {
      middleware_context: { enableCaching: false }
    }
  }
);

Example 4

Optimal setup for customer support chatbot

const supportAgent = createAgent({
  model: "anthropic:claude-sonnet-4-5",
  systemPrompt: `You are a customer support agent for ACME Corp.

    Company policies:
    - Always be polite and professional
    - Refer to knowledge base for product information
    - Escalate billing issues to human agents
    ... (extensive policies and guidelines)
  `,
  tools: [searchKnowledgeBase, createTicket, checkOrderStatus],
  middleware: [
    anthropicPromptCachingMiddleware({
      ttl: "1h",  // Long TTL for stable system prompt
      minMessagesToCache: 1  // Cache immediately due to large system prompt
    })
  ]
});

View source on GitHub

anthropicPromptCachingMiddleware

Creates a prompt caching middleware for Anthropic models to optimize API usage.

How It Works

Benefits

Cost Reduction: Avoid reprocessing the same tokens repeatedly (up to 90% savings on cached portions)
Lower Latency: Cached prompts are processed faster as embeddings are pre-computed
Better Scalability: Reduced computational load enables handling more requests
Consistent Performance: Stable response times for repetitive queries

anthropicPromptCachingMiddleware( middlewareOptions: Partial<__type> ): AgentMiddleware<undefined, ZodObject<__type, "strip", ZodTypeAny, __type, __type>, __type, readonly ClientTool | ServerTool[]>

Name

Type

Description

middlewareOptions

Partial<__type>

Configuration options for the caching behavior

import { createAgent } from "langchain"; import { anthropicPromptCachingMiddleware } from "langchain"; const agent = createAgent({ model: "anthropic:claude-sonnet-4-5", middleware: [ anthropicPromptCachingMiddleware() ] });

Example 2

Custom configuration for longer conversations

const cachingMiddleware = anthropicPromptCachingMiddleware({
  ttl: "1h",  // Cache for 1 hour instead of default 5 minutes
  minMessagesToCache: 5  // Only cache after 5 messages
});

const agent = createAgent({
  model: "anthropic:claude-sonnet-4-5",
  systemPrompt: "You are a helpful assistant with deep knowledge of...", // Long system prompt
  middleware: [cachingMiddleware]
});

Example 3

Conditional caching based on runtime context

const agent = createAgent({
  model: "anthropic:claude-sonnet-4-5",
  middleware: [
    anthropicPromptCachingMiddleware({
      enableCaching: true,
      ttl: "5m"
    })
  ]
});

// Disable caching for specific requests
await agent.invoke(
  { messages: [new HumanMessage("Process this without caching")] },
  {
    configurable: {
      middleware_context: { enableCaching: false }
    }
  }
);

Example 4

Optimal setup for customer support chatbot

const supportAgent = createAgent({
  model: "anthropic:claude-sonnet-4-5",
  systemPrompt: `You are a customer support agent for ACME Corp.

    Company policies:
    - Always be polite and professional
    - Refer to knowledge base for product information
    - Escalate billing issues to human agents
    ... (extensive policies and guidelines)
  `,
  tools: [searchKnowledgeBase, createTicket, checkOrderStatus],
  middleware: [
    anthropicPromptCachingMiddleware({
      ttl: "1h",  // Long TTL for stable system prompt
      minMessagesToCache: 1  // Cache immediately due to large system prompt
    })
  ]
});

anthropicPromptCachingMiddleware

How It Works

Benefits

Parameters

Example 1

Example 2

Example 3

Example 4

LangChain Assistant

Menu

anthropicPromptCachingMiddleware

How It Works

Benefits

Parameters

Example 1

Example 2

Example 3

Example 4

anthropicPromptCachingMiddleware

How It Works

Benefits

Used in Docs

Parameters

Example 1

Example 2

Example 3

Example 4

Menu

anthropicPromptCachingMiddleware

How It Works

Benefits

Used in Docs

Parameters

Example 1

Example 2

Example 3

Example 4