Module●Since v0.3

graph_vectorstores

.. title:: Graph Vector Store

Graph Vector Store

Sometimes embedding models don't capture all the important relationships between documents. Graph Vector Stores are an extension to both vector stores and retrievers that allow documents to be explicitly connected to each other.

Graph vector store retrievers use both vector similarity and links to find documents related to an unstructured query.

Graphs allow linking between documents. Each document identifies tags that link to and from it. For example, a paragraph of text may be linked to URLs based on the anchor tags in it's content and linked from the URL(s) it is published at.

Link extractors <langchain_community.graph_vectorstores.extractors.link_extractor.LinkExtractor> can be used to extract links from documents.

Example::

graph_vector_store = CassandraGraphVectorStore()
link_extractor = HtmlLinkExtractor()
links = link_extractor.extract_one(HtmlInput(document.page_content, "http://mysite"))
add_links(document, links)
graph_vector_store.add_document(document)

.. seealso::

- :class:`How to use a graph vector store as a retriever <langchain_community.graph_vectorstores.base.GraphVectorStoreRetriever>`
- :class:`How to create links between documents <langchain_community.graph_vectorstores.links.Link>`
- :class:`How to link Documents on hyperlinks in HTML <langchain_community.graph_vectorstores.extractors.html_link_extractor.HtmlLinkExtractor>`
- :class:`How to link Documents on common keywords (using KeyBERT) <langchain_community.graph_vectorstores.extractors.keybert_link_extractor.KeybertLinkExtractor>`
- :class:`How to link Documents on common named entities (using GliNER) <langchain_community.graph_vectorstores.extractors.gliner_link_extractor.GLiNERLinkExtractor>`
- `langchain-jieba: link extraction tailored for Chinese language <https://github.com/cqzyys/langchain-jieba>`_

Get started

We chunk the State of the Union text and split it into documents::

from langchain_community.document_loaders import TextLoader
from langchain_text_splitters import CharacterTextSplitter

raw_documents = TextLoader("state_of_the_union.txt").load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
documents = text_splitter.split_documents(raw_documents)

Links can be added to documents manually but it's easier to use a :class:~langchain_community.graph_vectorstores.extractors.link_extractor.LinkExtractor. Several common link extractors are available and you can build your own. For this guide, we'll use the :class:~langchain_community.graph_vectorstores.extractors.keybert_link_extractor.KeybertLinkExtractor which uses the KeyBERT model to tag documents with keywords and uses these keywords to create links between documents::

from langchain_community.graph_vectorstores.extractors import KeybertLinkExtractor
from langchain_community.graph_vectorstores.links import add_links

extractor = KeybertLinkExtractor()

for doc in documents:
    add_links(doc, extractor.extract_one(doc))

Create the graph vector store and add documents

We'll use an Apache Cassandra or Astra DB database as an example. We create a :class:~langchain_community.graph_vectorstores.cassandra.CassandraGraphVectorStore from the documents and an :class:~langchain_openai.embeddings.base.OpenAIEmbeddings model::

import cassio
from langchain_community.graph_vectorstores import CassandraGraphVectorStore
from langchain_openai import OpenAIEmbeddings

# Initialize cassio and the Cassandra session from the environment variables
cassio.init(auto=True)

store = CassandraGraphVectorStore.from_documents(
    embedding=OpenAIEmbeddings(),
    documents=documents,
)

Similarity search

If we don't traverse the graph, a graph vector store behaves like a regular vector store. So all methods available in a vector store are also available in a graph vector store. The :meth:~langchain_community.graph_vectorstores.base.GraphVectorStore.similarity_search method returns documents similar to a query without considering the links between documents::

docs = store.similarity_search(
    "What did the president say about Ketanji Brown Jackson?"
)

Traversal search

The :meth:~langchain_community.graph_vectorstores.base.GraphVectorStore.traversal_search method returns documents similar to a query considering the links between documents. It first does a similarity search and then traverses the graph to find linked documents::

docs = list(
    store.traversal_search("What did the president say about Ketanji Brown Jackson?")
)

Async methods

The graph vector store has async versions of the methods prefixed with a::

docs = [
    doc
    async for doc in store.atraversal_search(
        "What did the president say about Ketanji Brown Jackson?"
    )
]

Graph vector store retriever

The graph vector store can be converted to a retriever. It is similar to the vector store retriever but it also has traversal search methods such as traversal and mmr_traversal::

retriever = store.as_retriever(search_type="mmr_traversal")
docs = retriever.invoke("What did the president say about Ketanji Brown Jackson?")

Classes

class

CassandraGraphVectorStore

class

Link

A link to/from a tag of a given kind.

Documents in a :class:graph vector store <langchain_community.graph_vectorstores.base.GraphVectorStore> are connected via "links". Links form a bipartite graph between documents and tags: documents are connected to tags, and tags are connected to other documents. When documents are retrieved from a graph vector store, a pair of documents are connected with a depth of one if both documents are connected to the same tag.

Links have a kind property, used to namespace different tag identifiers. For example a link to a keyword might use kind kw, while a link to a URL might use kind url. This allows the same tag value to be used in different contexts without causing name collisions.

Links are directed. The directionality of links controls how the graph is traversed at retrieval time. For example, given documents A and B, connected by links to tag T:

Directed links make it possible to describe relationships such as term references / definitions: term definitions are generally relevant to any documents that use the term, but the full set of documents using a term generally aren't relevant to the term's definition.

.. seealso::

- :mod:`How to use a graph vector store <langchain_community.graph_vectorstores>`
- :class:`How to link Documents on hyperlinks in HTML <langchain_community.graph_vectorstores.extractors.html_link_extractor.HtmlLinkExtractor>`
- :class:`How to link Documents on common keywords (using KeyBERT) <langchain_community.graph_vectorstores.extractors.keybert_link_extractor.KeybertLinkExtractor>`
- :class:`How to link Documents on common named entities (using GliNER) <langchain_community.graph_vectorstores.extractors.gliner_link_extractor.GLiNERLinkExtractor>`

How to add links to a Document

How to create links

You can create links using the Link class's constructors :meth:incoming, :meth:outgoing, and :meth:bidir::

from langchain_community.graph_vectorstores.links import Link

print(Link.bidir(kind="location", tag="Paris"))

.. code-block:: output

Link(kind='location', direction='bidir', tag='Paris')

Extending documents with links

Now that we know how to create links, let's associate them with some documents. These edges will strengthen the connection between documents that share a keyword when using a graph vector store to retrieve documents.

First, we'll load some text and chunk it into smaller pieces. Then we'll add a link to each document to link them all together::

from langchain_community.document_loaders import TextLoader
from langchain_community.graph_vectorstores.links import add_links
from langchain_text_splitters import CharacterTextSplitter

loader = TextLoader("state_of_the_union.txt")

raw_documents = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
documents = text_splitter.split_documents(raw_documents)

for doc in documents:
    add_links(doc, Link.bidir(kind="genre", tag="oratory"))

print(documents[0].metadata)

.. code-block:: output

{'source': 'state_of_the_union.txt', 'links': [Link(kind='genre', direction='bidir', tag='oratory')]}

As we can see, each document's metadata now includes a bidirectional link to the genre oratory.

The documents can then be added to a graph vector store::

from langchain_community.graph_vectorstores import CassandraGraphVectorStore

graph_vectorstore = CassandraGraphVectorStore.from_documents(
    documents=documents, embeddings=...
)

class

MmrHelper

Helper for executing an MMR traversal query.

deprecatedclass

GraphVectorStore

A hybrid vector-and-graph graph store.

Document chunks support vector-similarity search as well as edges linking chunks based on structural and semantic properties.

.. versionadded:: 0.3.1

deprecatedclass

GraphVectorStoreRetriever

Retriever for GraphVectorStore.

A graph vector store retriever is a retriever that uses a graph vector store to retrieve documents. It is similar to a vector store retriever, except that it uses both vector similarity and graph connections to retrieve documents. It uses the search methods implemented by a graph vector store, like traversal search and MMR traversal search, to query the texts in the graph vector store.

Example::

store = CassandraGraphVectorStore(...)
retriever = store.as_retriever()
retriever.invoke("What is ...")

.. seealso::

:mod:`How to use a graph vector store <langchain_community.graph_vectorstores>`

How to use a graph vector store as a retriever

Creating a retriever from a graph vector store

You can build a retriever from a graph vector store using its :meth:~langchain_community.graph_vectorstores.base.GraphVectorStore.as_retriever method.

First we instantiate a graph vector store. We will use a store backed by Cassandra :class:~langchain_community.graph_vectorstores.cassandra.CassandraGraphVectorStore graph vector store::

from langchain_community.document_loaders import TextLoader
from langchain_community.graph_vectorstores import CassandraGraphVectorStore
from langchain_community.graph_vectorstores.extractors import (
    KeybertLinkExtractor,
    LinkExtractorTransformer,
)
from langchain_openai import OpenAIEmbeddings
from langchain_text_splitters import CharacterTextSplitter

loader = TextLoader("state_of_the_union.txt")
documents = loader.load()

text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
texts = text_splitter.split_documents(documents)

pipeline = LinkExtractorTransformer([KeybertLinkExtractor()])
pipeline.transform_documents(texts)
embeddings = OpenAIEmbeddings()
graph_vectorstore = CassandraGraphVectorStore.from_documents(texts, embeddings)

We can then instantiate a retriever::

retriever = graph_vectorstore.as_retriever()

This creates a retriever (specifically a GraphVectorStoreRetriever), which we can use in the usual way::

docs = retriever.invoke("what did the president say about ketanji brown jackson?")

Maximum marginal relevance traversal retrieval

By default, the graph vector store retriever uses similarity search, then expands the retrieved set by following a fixed number of graph edges. If the underlying graph vector store supports maximum marginal relevance traversal, you can specify that as the search type.

MMR-traversal is a retrieval method combining MMR and graph traversal. The strategy first retrieves the top fetch_k results by similarity to the question. It then iteratively expands the set of fetched documents by following adjacent_k graph edges and selects the top k results based on maximum-marginal relevance using the given lambda_mult::

retriever = graph_vectorstore.as_retriever(search_type="mmr_traversal")

Passing search parameters

We can pass parameters to the underlying graph vector store's search methods using search_kwargs.

Specifying graph traversal depth ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

For example, we can set the graph traversal depth to only return documents reachable through a given number of graph edges::

retriever = graph_vectorstore.as_retriever(search_kwargs={"depth": 3})

Specifying MMR parameters ^^^^^^^^^^^^^^^^^^^^^^^^^

When using search type mmr_traversal, several parameters of the MMR algorithm can be configured.

The fetch_k parameter determines how many documents are fetched using vector similarity and adjacent_k parameter determines how many documents are fetched using graph edges. The lambda_mult parameter controls how the MMR re-ranking weights similarity to the query string vs diversity among the retrieved documents as fetched documents are selected for the set of k final results::

retriever = graph_vectorstore.as_retriever(
    search_type="mmr",
    search_kwargs={"fetch_k": 20, "adjacent_k": 20, "lambda_mult": 0.25},
)

Specifying top k ^^^^^^^^^^^^^^^^

We can also limit the number of documents k returned by the retriever.

Note that if depth is greater than zero, the retriever may return more documents than is specified by k, since both the original k documents retrieved using vector similarity and any documents connected via graph edges will be returned::

retriever = graph_vectorstore.as_retriever(search_kwargs={"k": 1})

Similarity score threshold retrieval ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

For example, we can set a similarity score threshold and only return documents with a score above that threshold::

retriever = graph_vectorstore.as_retriever(search_kwargs={"score_threshold": 0.5})

deprecatedclass

Node

Node in the GraphVectorStore.

Edges exist from nodes with an outgoing link to nodes with a matching incoming link.

For instance two nodes a and b connected over a hyperlink https://some-url would look like:

.. code-block:: python

[
    Node(
        id="a",
        text="some text a",
        links= [
            Link(kind="hyperlink", tag="https://some-url", direction="incoming")
        ],
    ),
    Node(
        id="b",
        text="some text b",
        links= [
            Link(kind="hyperlink", tag="https://some-url", direction="outgoing")
        ],
    )
]