GraphRAG DataStore
GraphRAG is a ChatModel that provides responses to semantic queries
based on a Knowledge Graph that an LLM is used to create.
As in Vector RAG, we augment the Chat Model's training data
with relevant information that we collect from documents.
In Vector RAG, one uses an "Embedding" model that converts both
the query, and the potentially relevant documents, into vectors,
which can then be compared, and the most similar supplied to the
Chat Model as context to the query.
In Graph RAG, one uses an "Entity-Extraction" model that converts
text into Entities and their relationships, a Knowledge Graph.
Comparison is done by Graph traversal, finding entities connected
to the query prompts. These are then supplied to the Chat Model as context.
The main difference is that GraphRAG's output is typically in a structured format.
GraphRAG excels in finding links and common entities,
even if these come from different articles. It can combine information from
distinct sources providing richer context than Vector RAG in certain cases.
Here are a few examples of so-called multi-hop questions where GraphRAG excels:
- What is the connection between ACME Corporation and GreenTech Ltd.?
- Who is leading the SolarGrid Initiative, and what is their role?
- Which organizations are participating in the SolarGrid Initiative?
- What is John Doe’s role in ACME’s renewable energy projects?
- Which company is headquartered in San Francisco and involved in the SolarGrid Initiative?
In Graph RAG, one uses an Entity-Extraction model that interprets
text documents that it is given and extracting the query,
and the potentially relevant documents, into graphs. These are
composed of nodes that are entities (nouns) and edges that are relationships.
The idea is that the graph can find connections between entities and
hence answer questions that require more than one connection.
In MongoDB, Knowledge Graphs are stored in a single Collection.
Each MongoDB Document represents a single entity (node),
and its relationships (edges) are defined in a nested field named
"relationships". The schema, and an example, are described in the
:data:~langchain_mongodb.graphrag.prompts.entity_context prompts module.
When a query is made, the model extracts the entities in it,
then traverses the graph to find connections.
The closest entities and their relationships form the context
that is included with the query to the Chat Model.
Consider this example Query: "Does John Doe work at MongoDB?"
GraphRAG can answer this question even if the following two statements come
from completely different sources.
- "Jane Smith works with John Doe."
- "Jane Smith works at MongoDB."