langchain.js
    Preparing search index...

    Interface representing the parameters for initializing a SitemapLoader. SitemapLoaderParams

    Hierarchy (View Summary)

    Implements

    Index

    Constructors

    Properties

    allowUrlPatterns: undefined | (string | RegExp)[]
    caller: AsyncCaller
    chunkSize: number

    The size to chunk the sitemap URLs into for scraping.

    {300}
    
    headers?: HeadersInit

    The headers to use in the fetch request.

    selector?: SelectorType

    The selector to use to extract the text from the document. Defaults to "body".

    textDecoder?: TextDecoder

    The text decoder to use to decode the response. Defaults to UTF-8.

    timeout: number

    The timeout in milliseconds for the fetch request. Defaults to 10s.

    webPath: string

    Methods

    • Parameters

      • url: string

      Returns boolean

    • Parameters

      • elements: SiteMapElement[]

      Returns Promise<DocumentInterface[]>

    • Extracts the text content from the loaded document using the selector and creates a Document instance with the extracted text and metadata. It returns an array of Document instances.

      Returns Promise<Document[]>

      A Promise that resolves to an array of Document instances.

    • Returns Promise<SiteMapElement[]>

    • Fetches the web document from the webPath and loads it using Cheerio. It returns a CheerioAPI instance.

      Returns Promise<CheerioAPI>

      A Promise that resolves to a CheerioAPI instance.

    • Parameters

      • url: string
      • caller: AsyncCaller
      • timeout: undefined | number
      • OptionaltextDecoder: TextDecoder
      • Optionaloptions: CheerioOptions & { headers?: HeadersInit }

      Returns Promise<CheerioAPI>

    • A static method that dynamically imports the Cheerio library and returns the load function. If the import fails, it throws an error.

      Returns Promise<
          {
              load: (
                  content: string | Buffer<ArrayBufferLike> | AnyNode | AnyNode[],
                  options?: null | CheerioOptions,
                  isDocument?: boolean,
              ) => CheerioAPI;
          },
      >

      A Promise that resolves to an object containing the load function from the Cheerio library.

    • Fetches web documents from the given array of URLs and loads them using Cheerio. It returns an array of CheerioAPI instances.

      Parameters

      • urls: string[]

        An array of URLs to fetch and load.

      • caller: AsyncCaller
      • timeout: undefined | number
      • OptionaltextDecoder: TextDecoder
      • Optionaloptions: CheerioOptions & { headers?: HeadersInit }

      Returns Promise<CheerioAPI[]>

      A Promise that resolves to an array of CheerioAPI instances.