AmazonS3Vectors(
self,
*,
vector_bucket_name: str,
index_name: str,
data_type: Literal['float32'] = 'float32',
distance_metric: Literal['euclidean', 'cosine'] = 'cosine',
non_filterable_metadata_keys: list[str] | None = None,
page_content_metadata_key: Optional[str] = '_page_content',
create_index_if_not_exist: bool = True,
relevance_score_fn: Optional[Callable[[float], float]] = None,
embedding: Optional[Embeddings] = None,
query_embedding: Optional[Embeddings] = None,
region_name: Optional[str] = None,
credentials_profile_name: Optional[str] = None,
aws_access_key_id: Optional[str] = None,
aws_secret_access_key: Optional[str] = None,
aws_session_token: Optional[str] = None,
endpoint_url: Optional[str] = None,
config: Any = None,
client: Any = None,
**kwargs: Any = {}
)S3Vectors is Amazon S3 Vectors database.
To use, you MUST first manually create a S3 vector bucket. There is no need to create a vector index. See: https://docs.aws.amazon.com/AmazonS3/latest/userguide/s3-vectors-getting-started.html
Pay attention to s3 vectors limitations and restrictions. By default, metadata for s3 vectors includes page_content and metadata for the Document. See: https://docs.aws.amazon.com/AmazonS3/latest/userguide/s3-vectors-limitations.html
Examples:
The following examples show various ways to use the AmazonS3Vectors with LangChain.
For all the following examples assume we have the following:
from langchain_aws.embeddings import BedrockEmbeddings
from langchain_aws.vectorstores.s3_vectors import AmazonS3Vectors
embedding = BedrockEmbeddings()
Initialize, create vector index if it does not exist, and add texts:
vector_store = AmazonS3Vectors.from_texts(
["hello", "developer", "wife"],
vector_bucket_name="<vector bucket name>",
index_name="<vector index name>",
embedding=embedding,
)
Initialize, create vector index if it does not exist, and add Documents:
from langchain_core.documents import Document
vector_store = AmazonS3Vectors(
vector_bucket_name="<vector bucket name>",
index_name="<vector index name>",
embedding=embedding,
)
vector_store.add_documents(
[
Document("Star Wars", id="key1", metadata={"genre": "scifi"}),
Document("Jurassic Park", id="key2", metadata={"genre": "scifi"}),
Document("Finding Nemo", id="key3", metadata={"genre": "family"}),
]
)
Search with score(distance) and metadata filter:
vector_store.similarity_search_with_score(
"adventures in space", filter={"genre": {"$eq": "family"}}
)Default: 'cosine' |
non_filterable_metadata_keys | list[str] | None | Default: None |
page_content_metadata_key | Optional[str] | Default: '_page_content' |
create_index_if_not_exist | bool | Default: True |
relevance_score_fn | Optional[Callable[[float], float]] | Default: None |
embedding | Optional[Embeddings] | Default: None |
query_embedding | Optional[Embeddings] | Default: None |
region_name | Optional[str] | Default: None |
credentials_profile_name | Optional[str] | Default: None |
aws_access_key_id | Optional[str] | Default: None |
aws_secret_access_key | Optional[str] | Default: None |
aws_session_token | Optional[str] | Default: None |
endpoint_url | Optional[str] | Default: None |
config | Any | Default: None |
client | Any | Default: None |
kwargs | Any | Default: {} |
| page_content_metadata_key | Optional[str] |
| create_index_if_not_exist | bool |
| relevance_score_fn | Optional[Callable[[float], float]] |
| embedding | Optional[Embeddings] |
| query_embedding | Optional[Embeddings] |
| region_name | Optional[str] |
| credentials_profile_name | Optional[str] |
| aws_access_key_id | Optional[str] |
| aws_secret_access_key | Optional[str] |
| aws_session_token | Optional[str] |
| endpoint_url | Optional[str] |
| config | Any |
| client | Any |
The name of an existing S3 vector bucket
The name of the S3 vector index. The index names must be 3 to 63 characters long, start and end with a letter or number, and contain only lowercase letters, numbers, hyphens and dots.
The data type of the vectors to be inserted into the vector index. Default is "float32".
Access the query embedding object if available.
Add more texts to the VectorStore.
Delete by vector ID or delete index.
Get documents by their IDs.
Return docs most similar to query.
Run similarity search with score(distance).
Return docs most similar to embedding vector.
Return AmazonS3VectorsRetriever initialized from this AmazonS3Vectors.
Return AmazonS3Vectors initialized from texts and embeddings.
The distance metric to be used for similarity search. Default is "cosine".
Non-filterable metadata keys
Key of metadata to store
page_content in Document. If None, embedding page_content
but stored as an empty string. Default is _page_content.
Automatically create vector index if it does not exist. Default is True.
The 'correct' relevance function.
Embedding function to use for indexing documents.
Separate embedding function to use
for queries. If not provided, the embedding parameter is used for
both indexing and querying. This is useful for embedding providers
that require different task types for documents vs queries.
The aws region where the Sagemaker model is
deployed, eg. us-west-2.
The name of the profile in the ~/.aws/credentials or ~/.aws/config files, which has either access keys or role information specified. If not specified, the default credential profile or, if on an EC2 instance, credentials from IMDS will be used. See: https://boto3.amazonaws.com/v1/documentation/api/latest/guide/credentials.html
AWS access key id.
If provided, aws_secret_access_key must also be provided.
If not specified, the default credential profile or,
if on an EC2 instance, credentials from IMDS will be used.
See: https://boto3.amazonaws.com/v1/documentation/api/latest/guide/credentials.html
If not provided, will be read from AWS_ACCESS_KEY_ID
environment variable.
AWS secret_access_key.
If provided, aws_access_key_id must also be provided.
If not specified, the default credential profile or,
if on an EC2 instance, credentials from IMDS will be used.
See: https://boto3.amazonaws.com/v1/documentation/api/latest/guide/credentials.html
If not provided, will be read from AWS_SECRET_ACCESS_KEY
environment variable.
AWS session token.
If provided, aws_access_key_id and
aws_secret_access_key must also be provided.
Not required unless using temporary credentials.
See: https://boto3.amazonaws.com/v1/documentation/api/latest/guide/credentials.html
If not provided, will be read from AWS_SESSION_TOKEN
environment variable.
Needed if you don't want to default to us-east-1 endpoint
An optional botocore.config.Config instance to pass to
the client.
Boto3 client for s3vectors
Additional keyword arguments.