The size to chunk the sitemap URLs into for scraping.
Optional
headersThe headers to use in the fetch request.
Optional
selectorThe selector to use to extract the text from the document. Defaults to "body".
Optional
textThe text decoder to use to decode the response. Defaults to UTF-8.
The timeout in milliseconds for the fetch request. Defaults to 10s.
Extracts the text content from the loaded document using the selector and creates a Document instance with the extracted text and metadata. It returns an array of Document instances.
A Promise that resolves to an array of Document instances.
Fetches the web document from the webPath and loads it using Cheerio. It returns a CheerioAPI instance.
A Promise that resolves to a CheerioAPI instance.
Static
_Optional
textDecoder: TextDecoderOptional
options: CheerioOptions & { headers?: HeadersInit }Static
importsA static method that dynamically imports the Cheerio library and returns the load function. If the import fails, it throws an error.
A Promise that resolves to an object containing the load function from the Cheerio library.
Static
scrapeFetches web documents from the given array of URLs and loads them using Cheerio. It returns an array of CheerioAPI instances.
An array of URLs to fetch and load.
Optional
textDecoder: TextDecoderOptional
options: CheerioOptions & { headers?: HeadersInit }A Promise that resolves to an array of CheerioAPI instances.
Interface representing the parameters for initializing a SitemapLoader. SitemapLoaderParams