Interface for a caching layer for LLMs and Chat models.
The cache interface consists of the following methods:
llm_string.llm_string.In addition, the cache interface provides an async version of each method.
The default implementation of the async methods is to run the synchronous method in an executor. It's recommended to override the async methods and provide async implementations to avoid unnecessary overhead.
BaseCache()Look up based on prompt and llm_string.
A cache implementation is expected to generate a key from the 2-tuple
of prompt and llm_string (e.g., by concatenating them with a delimiter).
Update cache based on prompt and llm_string.
The prompt and llm_string are used to generate a key for the cache. The key
should match that of the lookup method.
Clear cache that can take additional keyword arguments.
Async look up based on prompt and llm_string.
A cache implementation is expected to generate a key from the 2-tuple
of prompt and llm_string (e.g., by concatenating them with a delimiter).
Async update cache based on prompt and llm_string.
The prompt and llm_string are used to generate a key for the cache. The key should match that of the look up method.
Async clear cache that can take additional keyword arguments.