LangChain Reference home pageLangChain ReferenceLangChain Reference
  • GitHub
  • Main Docs
Deep Agents
LangChain
LangGraph
Integrations
LangSmith
  • Overview
    • Overview
    • Caches
    • Callbacks
    • Documents
    • Document loaders
    • Embeddings
    • Exceptions
    • Language models
    • Serialization
    • Output parsers
    • Prompts
    • Rate limiters
    • Retrievers
    • Runnables
    • Utilities
    • Vector stores
    MCP Adapters
    Standard Tests
    Text Splitters
    ⌘I

    LangChain Assistant

    Ask a question to get started

    Enter to send•Shift+Enter new line

    Menu

    OverviewCachesCallbacksDocumentsDocument loadersEmbeddingsExceptionsLanguage modelsSerializationOutput parsersPromptsRate limitersRetrieversRunnablesUtilitiesVector stores
    MCP Adapters
    Standard Tests
    Text Splitters
    Language
    Theme
    Pythonlangchain-coredocumentstransformersBaseDocumentTransformer
    Class●Since v0.1

    BaseDocumentTransformer

    Abstract base class for document transformation.

    A document transformation takes a sequence of Document objects and returns a sequence of transformed Document objects.

    Copy
    BaseDocumentTransformer()

    Bases

    ABC

    Example:

    class EmbeddingsRedundantFilter(BaseDocumentTransformer, BaseModel):
        embeddings: Embeddings
        similarity_fn: Callable = cosine_similarity
        similarity_threshold: float = 0.95
    
        class Config:
            arbitrary_types_allowed = True
    
        def transform_documents(
            self, documents: Sequence[Document], **kwargs: Any
        ) -> Sequence[Document]:
            stateful_documents = get_stateful_documents(documents)
            embedded_documents = _get_embeddings_from_stateful_docs(
                self.embeddings, stateful_documents
            )
            included_idxs = _filter_similar_embeddings(
                embedded_documents,
                self.similarity_fn,
                self.similarity_threshold,
            )
            return [stateful_documents[i] for i in sorted(included_idxs)]
    
        async def atransform_documents(
            self, documents: Sequence[Document], **kwargs: Any
        ) -> Sequence[Document]:
            raise NotImplementedError

    Methods

    method
    transform_documents

    Transform a list of documents.

    method
    atransform_documents

    Asynchronously transform a list of documents.

    View source on GitHub