GraphRAG DataStore
GraphRAG is a ChatModel that provides responses to semantic queries based on a Knowledge Graph that an LLM is used to create. As in Vector RAG, we augment the Chat Model's training data with relevant information that we collect from documents.
In Vector RAG, one uses an "Embedding" model that converts both the query, and the potentially relevant documents, into vectors, which can then be compared, and the most similar supplied to the Chat Model as context to the query.
In Graph RAG, one uses an "Entity-Extraction" model that converts text into Entities and their relationships, a Knowledge Graph. Comparison is done by Graph traversal, finding entities connected to the query prompts. These are then supplied to the Chat Model as context. The main difference is that GraphRAG's output is typically in a structured format.
GraphRAG excels in finding links and common entities, even if these come from different articles. It can combine information from distinct sources providing richer context than Vector RAG in certain cases.
Here are a few examples of so-called multi-hop questions where GraphRAG excels:
In Graph RAG, one uses an Entity-Extraction model that interprets text documents that it is given and extracting the query, and the potentially relevant documents, into graphs. These are composed of nodes that are entities (nouns) and edges that are relationships. The idea is that the graph can find connections between entities and hence answer questions that require more than one connection.
In MongoDB, Knowledge Graphs are stored in a single Collection.
Each MongoDB Document represents a single entity (node),
and its relationships (edges) are defined in a nested field named
"relationships". The schema, and an example, are described in the
:data:~langchain_mongodb.graphrag.prompts.entity_context prompts module.
When a query is made, the model extracts the entities in it, then traverses the graph to find connections. The closest entities and their relationships form the context that is included with the query to the Chat Model.
Consider this example Query: "Does John Doe work at MongoDB?" GraphRAG can answer this question even if the following two statements come from completely different sources.
MongoDBGraphStore(
self,
*,
connection_string: Optional[str] = None,
database_name: Optional[str] = None,
collection_name: Optional[str] = None,
collection: Optional[Collection] = None,
entity_extraction_model: BaseChatModel,
entity_prompt: Optional[ChatPromptTemplate] = None,
query_prompt: Optional[ChatPromptTemplate] = None,
max_depth: int = 3,
allowed_entity_types: Optional[List[str]] = None,
allowed_relationship_types: Optional[List[str]] = None,
entity_examples: Optional[str] = None,
entity_name_examples: str = '',
validate: bool = False,
validation_action: str = 'warn'
)| Name | Type | Description |
|---|---|---|
connection_string | Optional[str] | Default: NoneA valid MongoDB connection URI. |
database_name | Optional[str] | Default: NoneThe name of the database to connect to. |
collection_name | Optional[str] | Default: NoneThe name of the collection to connect to. |
collection | Optional[Collection] | Default: NoneA Collection that will represent a Knowledge Graph. ** One may pass a Collection in lieu of connection_string, database_name, and collection_name. |
entity_extraction_model* | BaseChatModel | LLM for converting documents into Graph of Entities and Relationships. |
entity_prompt | Optional[ChatPromptTemplate] | Default: NonePrompt to fill graph store with entities following schema. Defaults to .prompts.ENTITY_EXTRACTION_INSTRUCTIONS |
query_prompt | Optional[ChatPromptTemplate] | Default: NonePrompt extracts entities and relationships as search starting points. Defaults to .prompts.NAME_EXTRACTION_INSTRUCTIONS |
max_depth | int | Default: 3Maximum recursion depth in graph traversal. |
allowed_entity_types | Optional[List[str]] | Default: NoneIf provided, constrains search to these types. |
allowed_relationship_types | Optional[List[str]] | Default: NoneIf provided, constrains search to these types. |
entity_examples | Optional[str] | Default: NoneA string containing any number of additional examples to provide as context for entity extraction. |
entity_name_examples | str | Default: ''A string appended to prompts.NAME_EXTRACTION_INSTRUCTIONS containing examples. |
validate | bool | Default: FalseIf True, entity schema will be validated on every insert or update. |
validation_action | str | Default: 'warn'One of {"warn", "error"}.
|
| Name | Type |
|---|---|
| connection_string | Optional[str] |
| database_name | Optional[str] |
| collection_name | Optional[str] |
| collection | Optional[Collection] |
| entity_extraction_model | BaseChatModel |
| entity_prompt | Optional[ChatPromptTemplate] |
| query_prompt | Optional[ChatPromptTemplate] |
| max_depth | int |
| allowed_entity_types | Optional[List[str]] |
| allowed_relationship_types | Optional[List[str]] |
| entity_examples | Optional[str] |
| entity_name_examples | str |
| validate | bool |
| validation_action | str |
JSON Schema Object of Entities. Will be applied if validate is True.
Construct a MongoDB KnowLedge Graph for RAG
from a MongoDB connection URI.
Close the resources used by the MongoDBGraphStore.
Extract entities and upsert into the collection.
Each entity is represented by a single MongoDB Document. Existing entities identified in documents will be updated.
Extract entities and their relations using chosen prompt and LLM.
Extract entity names from a document for similarity_search.
The second entity extraction has a different form and purpose than the first as we are looking for starting points of our search and paths to follow. We aim to find source nodes, but no target nodes or edges.
Utility to get Entity dict from Knowledge Graph / Collection. Args: name: _id string to look for. Returns: List of Entity dicts if any match name.
Traverse Graph along relationship edges to find connected entities.
Retrieve list of connected Entities found via traversal of KnowledgeGraph.
Responds to a query given information found in Knowledge Graph.
Utility converts Entity Collection to NetworkX DiGraph <https://networkx.org/documentation/stable/index.html>_
NOTE: Requires optional-dependency "viz", i.e. pip install "langchain-mongodb[viz]".
Draws a Knowledge Graph as Holoviews/Bokeh interactive plot.
We first convert the entity collection to a NetworkX Graph, and then convert it to a Holoviews Graph via their API.
The default layout chosen is the spring_layout.
This maximizes the distance between nodes. As our entities have a type field,
however, another good layout choice might be
layout=nx.multipartite_layout, nx_opts["subset_key"]= "type"
as multipartite layout positions nodes in straight lines by subset key.
NOTE: Requires optional-dependency "viz", i.e. pip install "langchain-mongodb[viz]".
You can save the view as any HoloViews object with .save.
The type will be inferred from the filename's suffix,
(e.g., hv.save(graph, "graph.html")) or by clicking the download widget
on the Bokeh plot from a Jupyter notebook.