Optional
middlewareOptions: anyConfiguration options for the caching behavior
A middleware instance that can be passed to createAgent
Basic usage with default settings
import { createAgent } from "langchain";
import { anthropicPromptCachingMiddleware } from "langchain";
const agent = createAgent({
model: "anthropic:claude-3-5-sonnet",
middleware: [
anthropicPromptCachingMiddleware()
]
});
Custom configuration for longer conversations
const cachingMiddleware = anthropicPromptCachingMiddleware({
ttl: "1h", // Cache for 1 hour instead of default 5 minutes
minMessagesToCache: 5 // Only cache after 5 messages
});
const agent = createAgent({
model: "anthropic:claude-3-5-sonnet",
systemPrompt: "You are a helpful assistant with deep knowledge of...", // Long system prompt
middleware: [cachingMiddleware]
});
Conditional caching based on runtime context
const agent = createAgent({
model: "anthropic:claude-3-5-sonnet",
middleware: [
anthropicPromptCachingMiddleware({
enableCaching: true,
ttl: "5m"
})
]
});
// Disable caching for specific requests
await agent.invoke(
{ messages: [new HumanMessage("Process this without caching")] },
{
configurable: {
middleware_context: { enableCaching: false }
}
}
);
Optimal setup for customer support chatbot
const supportAgent = createAgent({
model: "anthropic:claude-3-5-sonnet",
systemPrompt: `You are a customer support agent for ACME Corp.
Company policies:
- Always be polite and professional
- Refer to knowledge base for product information
- Escalate billing issues to human agents
... (extensive policies and guidelines)
`,
tools: [searchKnowledgeBase, createTicket, checkOrderStatus],
middleware: [
anthropicPromptCachingMiddleware({
ttl: "1h", // Long TTL for stable system prompt
minMessagesToCache: 1 // Cache immediately due to large system prompt
})
]
});
minMessagesToCache
Creates a prompt caching middleware for Anthropic models to optimize API usage.
This middleware automatically adds cache control headers to the last messages when using Anthropic models, enabling their prompt caching feature. This can significantly reduce costs for applications with repetitive prompts, long system messages, or extensive conversation histories.
How It Works
The middleware intercepts model requests and adds cache control metadata that tells Anthropic's API to cache processed prompt prefixes. On subsequent requests with matching prefixes, the cached representations are reused, skipping redundant token processing.
Benefits