langchain.js
    Preparing search index...

    A transformer that uses the Mozilla Readability library to extract the main content from a web page.

    const loader = new HTMLWebBaseLoader("https://example.com/article");
    const docs = await loader.load();

    const splitter = new RecursiveCharacterTextSplitter({
    maxCharacterCount: 5000,
    });
    const transformer = new MozillaReadabilityTransformer();

    // The sequence processes the loaded documents through the splitter and then the transformer.
    const sequence = transformer.pipe(splitter);

    // Invoke the sequence to transform the documents into a more readable format.
    const newDocuments = await sequence.invoke(docs);

    console.log(newDocuments);

    Hierarchy (View Summary)

    Index

    Constructors

    Properties

    Methods

    Constructors

    • Parameters

      • options: Options = {}

      Returns MozillaReadabilityTransformer

    Properties

    options: Options = {}

    Methods

    • Parameters

      • document: Document

      Returns Promise<Document>

    • Returns string