Class SitemapLoader

Interface representing the parameters for initializing a SitemapLoader. SitemapLoaderParams

Hierarchy (View Summary)

CheerioWebBaseLoader
- SitemapLoader

Implements

SitemapLoaderParams

Index

Constructors

constructor

new SitemapLoader(webPath: string, params?: SitemapLoaderParams): SitemapLoader
Parameters
- webPath: string
- params: SitemapLoaderParams = {}
Returns SitemapLoader
Overrides CheerioWebBaseLoader.constructor
- Defined in remotes/langchain-ai/langchainjs/main/libs/langchain-community/src/document_loaders/web/sitemap.ts:40

Properties

allowUrlPatterns

allowUrlPatterns: undefined | (string | RegExp)[]

caller

caller: AsyncCaller

chunkSize

chunkSize: number

The size to chunk the sitemap URLs into for scraping.

Default

{300}

`Optional`headers

headers?: HeadersInit

The headers to use in the fetch request.

`Optional`selector

selector?: SelectorType

The selector to use to extract the text from the document. Defaults to "body".

`Optional`textDecoder

textDecoder?: TextDecoder

The text decoder to use to decode the response. Defaults to UTF-8.

timeout

timeout: number

The timeout in milliseconds for the fetch request. Defaults to 10s.

webPath

webPath: string

Methods

_checkUrlPatterns

_checkUrlPatterns(url: string): boolean
Parameters
- url: string
Returns boolean
- Defined in remotes/langchain-ai/langchainjs/main/libs/langchain-community/src/document_loaders/web/sitemap.ts:52

_loadSitemapUrls

_loadSitemapUrls(elements: SiteMapElement[]): Promise<DocumentInterface[]>
Parameters
- elements: SiteMapElement[]
Returns Promise<DocumentInterface[]>
- Defined in remotes/langchain-ai/langchainjs/main/libs/langchain-community/src/document_loaders/web/sitemap.ts:107

load

load(): Promise<Document[]>
Extracts the text content from the loaded document using the selector and creates a Document instance with the extracted text and metadata. It returns an array of Document instances.

Returns Promise<Document[]>
A Promise that resolves to an array of Document instances.
Overrides CheerioWebBaseLoader.load
- Defined in remotes/langchain-ai/langchainjs/main/libs/langchain-community/src/document_loaders/web/sitemap.ts:142

parseSitemap

parseSitemap(): Promise<SiteMapElement[]>
Returns Promise<SiteMapElement[]>
- Defined in remotes/langchain-ai/langchainjs/main/libs/langchain-community/src/document_loaders/web/sitemap.ts:61

scrape

scrape(): Promise<CheerioAPI>
Fetches the web document from the webPath and loads it using Cheerio. It returns a CheerioAPI instance.

Returns Promise<CheerioAPI>
A Promise that resolves to a CheerioAPI instance.
Inherited from CheerioWebBaseLoader.scrape
- Defined in remotes/langchain-ai/langchainjs/main/libs/langchain-community/src/document_loaders/web/cheerio.ts:108

`Static`_scrape

_scrape(
    url: string,
    caller: AsyncCaller,
    timeout: undefined | number,
    textDecoder?: TextDecoder,
    options?: CheerioOptions & { headers?: HeadersInit },
): Promise<CheerioAPI>
Parameters
- url: string
- caller: AsyncCaller
- timeout: undefined | number
- OptionaltextDecoder: TextDecoder
- Optionaloptions: CheerioOptions & { headers?: HeadersInit }
Returns Promise<CheerioAPI>
Inherited from CheerioWebBaseLoader._scrape
- Defined in remotes/langchain-ai/langchainjs/main/libs/langchain-community/src/document_loaders/web/cheerio.ts:82

`Static`imports

imports(): Promise<
    {
        load: (
            content: string | Buffer<ArrayBufferLike> | AnyNode | AnyNode[],
            options?: null | CheerioOptions,
            isDocument?: boolean,
        ) => CheerioAPI;
    },
>
A static method that dynamically imports the Cheerio library and returns the load function. If the import fails, it throws an error.

Returns Promise<
    {
        load: (
            content: string | Buffer<ArrayBufferLike> | AnyNode | AnyNode[],
            options?: null | CheerioOptions,
            isDocument?: boolean,
        ) => CheerioAPI;
    },
>
A Promise that resolves to an object containing the load function from the Cheerio library.
Inherited from CheerioWebBaseLoader.imports
- Defined in remotes/langchain-ai/langchainjs/main/libs/langchain-community/src/document_loaders/web/cheerio.ts:138

`Static`scrapeAll

scrapeAll(
    urls: string[],
    caller: AsyncCaller,
    timeout: undefined | number,
    textDecoder?: TextDecoder,
    options?: CheerioOptions & { headers?: HeadersInit },
): Promise<CheerioAPI[]>
Fetches web documents from the given array of URLs and loads them using Cheerio. It returns an array of CheerioAPI instances.
Parameters
- urls: string[]
  An array of URLs to fetch and load.
- caller: AsyncCaller
- timeout: undefined | number
- OptionaltextDecoder: TextDecoder
- Optionaloptions: CheerioOptions & { headers?: HeadersInit }
Returns Promise<CheerioAPI[]>
A Promise that resolves to an array of CheerioAPI instances.
Inherited from CheerioWebBaseLoader.scrapeAll
- Defined in remotes/langchain-ai/langchainjs/main/libs/langchain-community/src/document_loaders/web/cheerio.ts:66

Class SitemapLoader

Hierarchy (View Summary)

Implements

Index

Constructors

Properties

Methods

Constructors

constructor

Parameters

Returns SitemapLoader

Properties

allowUrlPatterns

caller

chunkSize

Default

Optionalheaders

Optionalselector

OptionaltextDecoder

timeout

webPath

Methods

_checkUrlPatterns

Parameters

Returns boolean

_loadSitemapUrls

Parameters

Returns Promise<DocumentInterface[]>

load

Returns Promise<Document[]>

parseSitemap

Returns Promise<SiteMapElement[]>

scrape

Returns Promise<CheerioAPI>

Static_scrape

Parameters

Returns Promise<CheerioAPI>

Staticimports

Returns Promise< { load: ( content: string | Buffer<ArrayBufferLike> | AnyNode | AnyNode[], options?: null | CheerioOptions, isDocument?: boolean, ) => CheerioAPI; },>

StaticscrapeAll

Parameters

Returns Promise<CheerioAPI[]>

Settings

On This Page

`Optional`headers

`Optional`selector

`Optional`textDecoder

`Static`_scrape

`Static`imports

Returns Promise<
{
load: (
content: string | Buffer<ArrayBufferLike> | AnyNode | AnyNode[],
options?: null | CheerioOptions,
isDocument?: boolean,
) => CheerioAPI;
},
>

`Static`scrapeAll