Module●Since v0.3

gliner_link_extractor

Classes

Interface for extracting links (incoming, outgoing, bidirectional).

A link to/from a tag of a given kind.

Documents in a :class:graph vector store <langchain_community.graph_vectorstores.base.GraphVectorStore> are connected via "links". Links form a bipartite graph between documents and tags: documents are connected to tags, and tags are connected to other documents. When documents are retrieved from a graph vector store, a pair of documents are connected with a depth of one if both documents are connected to the same tag.

Links have a kind property, used to namespace different tag identifiers. For example a link to a keyword might use kind kw, while a link to a URL might use kind url. This allows the same tag value to be used in different contexts without causing name collisions.

Links are directed. The directionality of links controls how the graph is traversed at retrieval time. For example, given documents A and B, connected by links to tag T:

Directed links make it possible to describe relationships such as term references / definitions: term definitions are generally relevant to any documents that use the term, but the full set of documents using a term generally aren't relevant to the term's definition.

.. seealso::

- :mod:`How to use a graph vector store <langchain_community.graph_vectorstores>`
- :class:`How to link Documents on hyperlinks in HTML <langchain_community.graph_vectorstores.extractors.html_link_extractor.HtmlLinkExtractor>`
- :class:`How to link Documents on common keywords (using KeyBERT) <langchain_community.graph_vectorstores.extractors.keybert_link_extractor.KeybertLinkExtractor>`
- :class:`How to link Documents on common named entities (using GliNER) <langchain_community.graph_vectorstores.extractors.gliner_link_extractor.GLiNERLinkExtractor>`

How to add links to a Document

How to create links

You can create links using the Link class's constructors :meth:incoming, :meth:outgoing, and :meth:bidir::

from langchain_community.graph_vectorstores.links import Link

print(Link.bidir(kind="location", tag="Paris"))

.. code-block:: output

Link(kind='location', direction='bidir', tag='Paris')

Extending documents with links

Now that we know how to create links, let's associate them with some documents. These edges will strengthen the connection between documents that share a keyword when using a graph vector store to retrieve documents.

First, we'll load some text and chunk it into smaller pieces. Then we'll add a link to each document to link them all together::

from langchain_community.document_loaders import TextLoader
from langchain_community.graph_vectorstores.links import add_links
from langchain_text_splitters import CharacterTextSplitter

loader = TextLoader("state_of_the_union.txt")

raw_documents = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
documents = text_splitter.split_documents(raw_documents)

for doc in documents:
    add_links(doc, Link.bidir(kind="genre", tag="oratory"))

print(documents[0].metadata)

.. code-block:: output

{'source': 'state_of_the_union.txt', 'links': [Link(kind='genre', direction='bidir', tag='oratory')]}

As we can see, each document's metadata now includes a bidirectional link to the genre oratory.

The documents can then be added to a graph vector store::

from langchain_community.graph_vectorstores import CassandraGraphVectorStore

graph_vectorstore = CassandraGraphVectorStore.from_documents(
    documents=documents, embeddings=...
)

class

GLiNERLinkExtractor

Link documents with common named entities using GLiNER_.

GLiNER_ is a Named Entity Recognition (NER) model capable of identifying any entity type using a bidirectional transformer encoder (BERT-like).

The GLiNERLinkExtractor uses GLiNER to create links between documents that have named entities in common.

Example::

extractor = GLiNERLinkExtractor(
    labels=["Person", "Award", "Date", "Competitions", "Teams"]
)
results = extractor.extract_one("some long text...")

.. _GLiNER: https://github.com/urchade/GLiNER

.. seealso::

    - :mod:`How to use a graph vector store <langchain_community.graph_vectorstores>`
    - :class:`How to create links between documents <langchain_community.graph_vectorstores.links.Link>`

How to link Documents on common named entities

Preliminaries

Install the gliner package:

.. code-block:: bash

pip install -q langchain_community gliner

Usage

We load the state_of_the_union.txt file, chunk it, then for each chunk we extract named entity links and add them to the chunk.

Using extract_one() ^^^^^^^^^^^^^^^^^^^

We can use :meth:extract_one on a document to get the links and add the links to the document metadata with :meth:~langchain_community.graph_vectorstores.links.add_links::

from langchain_community.document_loaders import TextLoader
from langchain_community.graph_vectorstores import CassandraGraphVectorStore
from langchain_community.graph_vectorstores.extractors import GLiNERLinkExtractor
from langchain_community.graph_vectorstores.links import add_links
from langchain_text_splitters import CharacterTextSplitter

loader = TextLoader("state_of_the_union.txt")
raw_documents = loader.load()

text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
documents = text_splitter.split_documents(raw_documents)

ner_extractor = GLiNERLinkExtractor(["Person", "Topic"])
for document in documents:
    links = ner_extractor.extract_one(document)
    add_links(document, links)

print(documents[0].metadata)

.. code-block:: output

{'source': 'state_of_the_union.txt', 'links': [Link(kind='entity:Person', direction='bidir', tag='President Zelenskyy'), Link(kind='entity:Person', direction='bidir', tag='Vladimir Putin')]}

Using LinkExtractorTransformer ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Using the :class:~langchain_community.graph_vectorstores.extractors.link_extractor_transformer.LinkExtractorTransformer, we can simplify the link extraction::

from langchain_community.document_loaders import TextLoader
from langchain_community.graph_vectorstores.extractors import (
    GLiNERLinkExtractor,
    LinkExtractorTransformer,
)
from langchain_text_splitters import CharacterTextSplitter

loader = TextLoader("state_of_the_union.txt")
raw_documents = loader.load()

text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
documents = text_splitter.split_documents(raw_documents)

ner_extractor = GLiNERLinkExtractor(["Person", "Topic"])
transformer = LinkExtractorTransformer([ner_extractor])
documents = transformer.transform_documents(documents)

print(documents[0].metadata)

.. code-block:: output

{'source': 'state_of_the_union.txt', 'links': [Link(kind='entity:Person', direction='bidir', tag='President Zelenskyy'), Link(kind='entity:Person', direction='bidir', tag='Vladimir Putin')]}

The documents with named entity links can then be added to a :class:~langchain_community.graph_vectorstores.base.GraphVectorStore::

from langchain_community.graph_vectorstores import CassandraGraphVectorStore

store = CassandraGraphVectorStore.from_documents(documents=documents, embedding=...)

Type Aliases

typeAlias

GLiNERInput: Union[str, Document]

View source on GitHub

How to link Documents on common named entities

Preliminaries

Usage

How to link Documents on common named entities

Preliminaries

Usage

gliner_link_extractor

Classes

How to add links to a Document

How to create links

Extending documents with links

Type Aliases

LangChain Assistant

Menu

gliner_link_extractor

Classes

How to add links to a Document

How to create links

Extending documents with links

Type Aliases