Class PuppeteerWebBaseLoader

Class that extends the BaseDocumentLoader class and implements the DocumentLoader interface. It represents a document loader for scraping web pages using Puppeteer.

Example

const loader = new PuppeteerWebBaseLoader("https:exampleurl.com", {
  launchOptions: {
    headless: true,
  },
  gotoOptions: {
    waitUntil: "domcontentloaded",
  },
});
const screenshot = await loader.screenshot();

Hierarchy (View Summary)

Toolkit
- PuppeteerWebBaseLoader

Implements

Toolkit

Index

Constructors

constructor

new PuppeteerWebBaseLoader(
webPath: string,
options?: PuppeteerWebBaseLoaderOptions,
): PuppeteerWebBaseLoader
Parameters
- webPath: string
- Optionaloptions: PuppeteerWebBaseLoaderOptions
Returns PuppeteerWebBaseLoader
Overrides BaseDocumentLoader.constructor
- Defined in remotes/langchain-ai/langchainjs/main/libs/langchain-community/src/document_loaders/web/puppeteer.ts:61

Properties

options

options: undefined | PuppeteerWebBaseLoaderOptions

webPath

webPath: string

Methods

load

load(): Promise<Document[]>
Method that calls the scrape method and returns the scraped HTML content as a Document object.

Returns Promise<Document[]>
Promise that resolves to an array of Document objects.
- Defined in remotes/langchain-ai/langchainjs/main/libs/langchain-community/src/document_loaders/web/puppeteer.ts:116

scrape

scrape(): Promise<string>
Method that calls the _scrape method to perform the scraping of the web page specified by the webPath property.

Returns Promise<string>
Promise that resolves to the scraped HTML content of the web page.
- Defined in remotes/langchain-ai/langchainjs/main/libs/langchain-community/src/document_loaders/web/puppeteer.ts:107

screenshot

screenshot(): Promise<Document>
Screenshot a web page and return it as a Document object where the pageContent property is the screenshot encoded in base64.

Returns Promise<Document>
A document object containing the screenshot of the page encoded in base64.
- Defined in remotes/langchain-ai/langchainjs/main/libs/langchain-community/src/document_loaders/web/puppeteer.ts:170

`Static`_scrape

_scrape(url: string, options?: PuppeteerWebBaseLoaderOptions): Promise<string>
Parameters
- url: string
- Optionaloptions: PuppeteerWebBaseLoaderOptions
Returns Promise<string>
- Defined in remotes/langchain-ai/langchainjs/main/libs/langchain-community/src/document_loaders/web/puppeteer.ts:66

`Static`_screenshot

_screenshot(
url: string,
options?: PuppeteerWebBaseLoaderOptions,
): Promise<Document>
Static class method used to screenshot a web page and return it as a Document object where the pageContent property is the screenshot encoded in base64.
Parameters
- url: string
- Optionaloptions: PuppeteerWebBaseLoaderOptions
Returns Promise<Document>
A document object containing the screenshot of the page encoded in base64.
- Defined in remotes/langchain-ai/langchainjs/main/libs/langchain-community/src/document_loaders/web/puppeteer.ts:132

`Static`imports

imports(): Promise<
    {
        connect: (options: ConnectOptions) => Promise<Browser>;
        launch: (options?: PuppeteerLaunchOptions) => Promise<Browser>;
    },
>
Static method that imports the necessary Puppeteer modules. It returns a Promise that resolves to an object containing the imported modules.

Returns Promise<
    {
        connect: (options: ConnectOptions) => Promise<Browser>;
        launch: (options?: PuppeteerLaunchOptions) => Promise<Browser>;
    },
>
Promise that resolves to an object containing the imported Puppeteer modules.
- Defined in remotes/langchain-ai/langchainjs/main/libs/langchain-community/src/document_loaders/web/puppeteer.ts:179

Class PuppeteerWebBaseLoader

Example

Hierarchy (View Summary)

Implements

Index

Constructors

Properties

Methods

Constructors

constructor

Parameters

Returns PuppeteerWebBaseLoader

Properties

options

webPath

Methods

load

Returns Promise<Document[]>

scrape

Returns Promise<string>

screenshot

Returns Promise<Document>

`Static`_scrape

Parameters

Returns Promise<string>

`Static`_screenshot

Parameters

Returns Promise<Document>

`Static`imports

Returns Promise<
    {
        connect: (options: ConnectOptions) => Promise<Browser>;
        launch: (options?: PuppeteerLaunchOptions) => Promise<Browser>;
    },
>

Settings

On This Page

Class PuppeteerWebBaseLoader

Example

Hierarchy (View Summary)

Implements

Index

Constructors

Properties

Methods

Constructors

constructor

Parameters

Returns PuppeteerWebBaseLoader

Properties

options

webPath

Methods

load

Returns Promise<Document[]>

scrape

Returns Promise<string>

screenshot

Returns Promise<Document>

Static_scrape

Parameters

Returns Promise<string>

Static_screenshot

Parameters

Returns Promise<Document>

Staticimports

Returns Promise< { connect: (options: ConnectOptions) => Promise<Browser>; launch: (options?: PuppeteerLaunchOptions) => Promise<Browser>; },>

Settings

On This Page

`Static`_scrape

`Static`_screenshot

`Static`imports

Returns Promise<
{
connect: (options: ConnectOptions) => Promise<Browser>;
launch: (options?: PuppeteerLaunchOptions) => Promise<Browser>;
},
>