class PuppeteerWebBaseLoaderClass that extends the BaseDocumentLoader class and implements the DocumentLoader interface. It represents a document loader for scraping web pages using Puppeteer.
Additional options to pass to the underlying axios request.
A method that loads the text file or blob and returns a promise that
resolves to an array of Document instances. It reads the text from
the file or blob using the readFile function from the
node:fs/promises module or the text() method of the blob. It then
parses the text using the parse() method and creates a Document
instance for each parsed page. The metadata includes the source of the
text (file path or blob) and, if there are multiple pages, the line
number of each page.
Fetches the web document from the webPath and loads it using Cheerio. It returns a CheerioAPI instance.
Screenshot a web page and return it as a Document object where the pageContent property is the screenshot encoded in base64.
Static class method used to screenshot a web page and return it as a Document object where the pageContent property is the screenshot encoded in base64.
A static method that imports the readFile function from the
node:fs/promises module. It is used to dynamically import the
function when needed. If the import fails, it throws an error
indicating that the fs/promises module is not available in the
current environment.
const loader = new PuppeteerWebBaseLoader("https:exampleurl.com", {
launchOptions: {
headless: true,
},
gotoOptions: {
waitUntil: "domcontentloaded",
},
});
const screenshot = await loader.screenshot();