BaseRetriever(
self,
*args: AnyAbstract base class for a document retrieval system.
A retrieval system is defined as something that can take string queries and return the most 'relevant' documents from some source.
Usage:
A retriever follows the standard Runnable interface, and should be used via the
standard Runnable methods of invoke, ainvoke, batch, abatch.
Implementation:
When implementing a custom retriever, the class should implement the
_get_relevant_documents method to define the logic for retrieving documents.
Optionally, an async native implementations can be provided by overriding the
_aget_relevant_documents method.
from langchain_core.documents import Document
from langchain_core.retrievers import BaseRetriever
class SimpleRetriever(BaseRetriever):
docs: list[Document]
k: int = 5
def _get_relevant_documents(self, query: str) -> list[Document]:
"""Return the first k documents from the list of documents"""
return self.docs[:self.k]
async def _aget_relevant_documents(self, query: str) -> list[Document]:
"""(Optional) async native implementation."""
return self.docs[:self.k]from sklearn.metrics.pairwise import cosine_similarity
class TFIDFRetriever(BaseRetriever, BaseModel):
vectorizer: Any
docs: list[Document]
tfidf_array: Any
k: int = 4
class Config:
arbitrary_types_allowed = True
def _get_relevant_documents(self, query: str) -> list[Document]:
# Ip -- (n_docs,x), Op -- (n_docs,n_Feats)
query_vec = self.vectorizer.transform([query])
# Op -- (n_docs,1) -- Cosine Sim with each doc
results = cosine_similarity(self.tfidf_array, query_vec).reshape((-1,))
return [self.docs[i] for i in results.argsort()[-self.k :][::-1]]The type of input this Runnable accepts specified as a Pydantic model.
Get a JSON schema that represents the input to the Runnable.
Get a JSON schema that represents the output of the Runnable.
The type of config this Runnable accepts specified as a Pydantic model.
Get a JSON schema that represents the config of the Runnable.
Return a list of prompts used by this Runnable.
Pipe Runnable objects.
Pick keys from the output dict of this Runnable.
Merge the Dict input with the output produced by the mapping argument.
Run invoke in parallel on a list of inputs.
Run ainvoke in parallel on a list of inputs.
Stream all output from a Runnable, as reported to the callback system.
Generate a stream of events.
Bind arguments to a Runnable, returning a new Runnable.
Bind lifecycle listeners to a Runnable, returning a new Runnable.
Bind async lifecycle listeners to a Runnable.
Bind input and output types to a Runnable, returning a new Runnable.
Create a new Runnable that retries the original Runnable on exceptions.
Map a function to multiple iterables.
Add fallbacks to a Runnable, returning a new Runnable.
Create a BaseTool from a Runnable.
Optional list of tags associated with the retriever.
These tags will be associated with each call to this retriever,
and passed as arguments to the handlers defined in callbacks.
You can use these to eg identify a specific instance of a retriever with its use case.
Optional metadata associated with the retriever.
This metadata will be associated with each call to this retriever,
and passed as arguments to the handlers defined in callbacks.
You can use these to eg identify a specific instance of a retriever with its use case.
Invoke the retriever to get relevant documents.
Main entry point for synchronous retriever invocations.
Asynchronously invoke the retriever to get relevant documents.
Main entry point for asynchronous retriever invocations.