langchain.js
    Preparing search index...

    Class that extends the BaseDocumentLoader class and implements the DocumentLoader interface. It represents a document loader for scraping web pages using Puppeteer.

    const loader = new PuppeteerWebBaseLoader("https:exampleurl.com", {
    launchOptions: {
    headless: true,
    },
    gotoOptions: {
    waitUntil: "domcontentloaded",
    },
    });
    const screenshot = await loader.screenshot();

    Hierarchy (View Summary)

    Implements

    Index

    Constructors

    Properties

    options: undefined | PuppeteerWebBaseLoaderOptions
    webPath: string

    Methods

    • Method that calls the scrape method and returns the scraped HTML content as a Document object.

      Returns Promise<Document[]>

      Promise that resolves to an array of Document objects.

    • Method that calls the _scrape method to perform the scraping of the web page specified by the webPath property.

      Returns Promise<string>

      Promise that resolves to the scraped HTML content of the web page.

    • Screenshot a web page and return it as a Document object where the pageContent property is the screenshot encoded in base64.

      Returns Promise<Document>

      A document object containing the screenshot of the page encoded in base64.

    • Static class method used to screenshot a web page and return it as a Document object where the pageContent property is the screenshot encoded in base64.

      Parameters

      Returns Promise<Document>

      A document object containing the screenshot of the page encoded in base64.

    • Static method that imports the necessary Puppeteer modules. It returns a Promise that resolves to an object containing the imported modules.

      Returns Promise<
          {
              connect: (options: ConnectOptions) => Promise<Browser>;
              launch: (options?: PuppeteerLaunchOptions) => Promise<Browser>;
          },
      >

      Promise that resolves to an object containing the imported Puppeteer modules.