InMemoryVectorStore(
self,
redis_url: str,
index_name: str,
embedding: Embeddings,
InMemoryVectorStore vector database.
To use, you should have the redis python package installed
for AWS MemoryDB:
pip install redis
Once running, you can connect to the MemoryDB server with the following url schemas:
Examples:
The following examples show various ways to use the Redis VectorStore with LangChain.
For all the following examples assume we have the following imports:
from langchain_aws.vectorstores import InMemoryVectorStore
Initialize, create index, and load Documents:
from langchain_aws.vectorstores import InMemoryVectorStore
rds = InMemoryVectorStore.from_documents(
documents, # a list of Document objects from loaders or created
embeddings, # an Embeddings object
redis_url="redis://cluster_endpoint:6379",
)
Initialize, create index, and load Documents with metadata:
rds = InMemoryVectorStore.from_texts(
texts, # a list of strings
metadata, # a list of metadata dicts
embeddings, # an Embeddings object
redis_url="redis://cluster_endpoint:6379",
)
Initialize, create index, and load Documents with metadata and return keys
rds, keys = InMemoryVectorStore.from_texts_return_keys(
texts, # a list of strings
metadata, # a list of metadata dicts
embeddings, # an Embeddings object
redis_url="redis://cluster_endpoint:6379",
)
For use cases where the index needs to stay alive, you can initialize with an index name such that it's easier to reference later
rds = InMemoryVectorStore.from_texts(
texts, # a list of strings
metadata, # a list of metadata dicts
embeddings, # an Embeddings object
index_name="my-index",
redis_url="redis://cluster_endpoint:6379",
)
Initialize and connect to an existing index (from above)
# must pass in schema and key_prefix from another index
existing_rds = InMemoryVectorStore.from_existing_index(
embeddings, # an Embeddings object
index_name="my-index",
schema=rds.schema, # schema dumped from another index
key_prefix=rds.key_prefix, # key prefix from another index
redis_url="redis://username:password@cluster_endpoint:6379",
)
Advanced examples:
Custom vector schema can be supplied to change the way that MemoryDB creates the underlying vector schema. This is useful for production use cases where you want to optimize the vector schema for your use case. ex. using HNSW instead of FLAT (knn) which is the default
vector_schema = {
"algorithm": "HNSW"
}
rds = InMemoryVectorStore.from_texts(
texts, # a list of strings
metadata, # a list of metadata dicts
embeddings, # an Embeddings object
vector_schema=vector_schema,
redis_url="redis://cluster_endpoint:6379",
)
Custom index schema can be supplied to change the way that the metadata is indexed. This is useful for you would like to use the hybrid querying (filtering) capability of MemoryDB.
By default, this implementation will automatically generate the index schema according to the following rules: - All strings are indexed as text fields - All numbers are indexed as numeric fields - All lists of strings are indexed as tag fields (joined by langchain_aws.vectorstores.inmemorydb.constants.INMEMORYDB_TAG_SEPARATOR) - All None values are not indexed but still stored in MemoryDB these are not retrievable through the interface here, but the raw MemoryDB client can be used to retrieve them. - All other types are not indexed
To override these rules, you can pass in a custom index schema like the following
tag:
- name: credit_score
text:
- name: user
- name: job
Typically, the credit_score field would be a text field since it's a string,
however, we can override this behavior by specifying the field type as shown with
the yaml config (can also be a dictionary) above and the code below.
rds = InMemoryVectorStore.from_texts(
texts, # a list of strings
metadata, # a list of metadata dicts
embeddings, # an Embeddings object
index_schema="path/to/index_schema.yaml", # can also be a dictionary
redis_url="redis://cluster_endpoint:6379",
)
When connecting to an existing index where a custom schema has been applied, it's
important to pass in the same schema to the from_existing_index method.
Otherwise, the schema for newly added samples will be incorrect and metadata
will not be returned.