PGVector(
self,
embeddings: Embeddings,
*,
connection: Union[None, DBConnection| Name | Type | Description |
|---|---|---|
connection | Union[None, DBConnection, Engine, AsyncEngine, str] | Default: NonePostgres connection string or (async)engine. |
embeddings* | Embeddings | Any embedding function implementing
|
embedding_length | Optional[int] | Default: None |
collection_name | str | Default: _LANGCHAIN_DEFAULT_COLLECTION_NAME |
distance_strategy | DistanceStrategy | Default: DEFAULT_DISTANCE_STRATEGY |
pre_delete_collection | bool | Default: False |
engine_args | Optional[dict[str, Any]] | Default: None |
use_jsonb | bool | Default: True |
create_extension | bool | Default: True |
| Name | Type |
|---|---|
| embeddings | Embeddings |
| connection | Union[None, DBConnection, Engine, AsyncEngine, str] |
| embedding_length | Optional[int] |
| collection_name | str |
| collection_metadata | Optional[dict] |
| distance_strategy | DistanceStrategy |
| pre_delete_collection | bool |
| logger | Optional[logging.Logger] |
| relevance_score_fn | Optional[Callable[[float], float]] |
| engine_args | Optional[dict[str, Any]] |
| use_jsonb | bool |
| create_extension | bool |
| async_mode | bool |
Postgres vector store integration.
Setup:
Install langchain_postgres and run the docker container.
.. code-block:: bash
pip install -qU langchain-postgres
docker run --name pgvector-container -e POSTGRES_USER=langchain -e POSTGRES_PASSWORD=langchain -e POSTGRES_DB=langchain -p 6024:5432 -d pgvector/pgvector:pg16
Key init args — indexing params: collection_name: str Name of the collection. embeddings: Embeddings Embedding function to use.
Key init args — client params: connection: Union[None, DBConnection, Engine, AsyncEngine, str] Connection string or engine.
Instantiate:
.. code-block:: python
from langchain_postgres.vectorstores import PGVector from langchain_openai import OpenAIEmbeddings
connection = "postgresql+psycopg://langchain:langchain@localhost:6024/langchain" # Uses psycopg3! collection_name = "my_docs"
vector_store = PGVector( embeddings=OpenAIEmbeddings(model="text-embedding-3-large"), collection_name=collection_name, connection=connection, use_jsonb=True, )
Add Documents:
.. code-block:: python
from langchain_core.documents import Document
document_1 = Document(page_content="foo", metadata={"baz": "bar"}) document_2 = Document(page_content="thud", metadata={"bar": "baz"}) document_3 = Document(page_content="i will be deleted :(")
documents = [document_1, document_2, document_3] ids = ["1", "2", "3"] vector_store.add_documents(documents=documents, ids=ids)
Delete Documents:
.. code-block:: python
vector_store.delete(ids=["3"])
Search:
.. code-block:: python
results = vector_store.similarity_search(query="thud",k=1)
for doc in results:
print(f"* {doc.page_content} [{doc.metadata}]")
.. code-block:: python
* thud [{'bar': 'baz'}]
Search with filter:
.. code-block:: python
results = vector_store.similarity_search(query="thud",k=1,filter={"bar": "baz"})
for doc in results:
print(f"* {doc.page_content} [{doc.metadata}]")
.. code-block:: python
* thud [{'bar': 'baz'}]
Search with score:
.. code-block:: python
results = vector_store.similarity_search_with_score(query="qux",k=1)
for doc, score in results:
print(f"* [SIM={score:3f}] {doc.page_content} [{doc.metadata}]")
.. code-block:: python
* [SIM=0.499243] foo [{'baz': 'bar'}]
Async:
.. code-block:: python
# add documents
# await vector_store.aadd_documents(documents=documents, ids=ids)
# delete documents
# await vector_store.adelete(ids=["3"])
# search
# results = vector_store.asimilarity_search(query="thud",k=1)
# search with score
results = await vector_store.asimilarity_search_with_score(query="qux",k=1)
for doc,score in results:
print(f"* [SIM={score:3f}] {doc.page_content} [{doc.metadata}]")
.. code-block:: python
* [SIM=0.499243] foo [{'baz': 'bar'}]
Use as Retriever:
.. code-block:: python
retriever = vector_store.as_retriever(
search_type="mmr",
search_kwargs={"k": 1, "fetch_k": 2, "lambda_mult": 0.5},
)
retriever.invoke("thud")
.. code-block:: python
[Document(metadata={'bar': 'baz'}, page_content='thud')]
The length of the embedding vector. (default: None) NOTE: This is not mandatory. Defining it will prevent vectors of any other size to be added to the embeddings table but, without it, the embeddings can't be indexed.
The name of the collection to use. (default: langchain) NOTE: This is not the name of the table, but the name of the collection. The tables will be created when initializing the store (if not exists) So, make sure the user has the right permissions to create tables.
The distance strategy to use. (default: COSINE)
If True, will delete the collection if it exists. (default: False). Useful for testing.
SQLAlchemy's create engine arguments.
Use JSONB instead of JSON for metadata. (default: True) Strongly discouraged from using JSON as it's not as efficient for querying. It's provided here for backwards compatibility with older versions, and will be removed in the future.
If True, will create the vector extension if it doesn't exist. disabling creation is useful when using ReadOnly Databases.