LangChain Reference home pageLangChain ReferenceLangChain Reference
  • GitHub
  • Main Docs
Deep Agents
LangChain
LangGraph
Integrations
LangSmith
  • Overview
    • Overview
    • Caches
    • Callbacks
    • Documents
    • Document loaders
    • Embeddings
    • Exceptions
    • Language models
    • Serialization
    • Output parsers
    • Prompts
    • Rate limiters
    • Retrievers
    • Runnables
    • Utilities
    • Vector stores
    MCP Adapters
    Standard Tests
    Text Splitters
    ⌘I

    LangChain Assistant

    Ask a question to get started

    Enter to send•Shift+Enter new line

    Menu

    OverviewCachesCallbacksDocumentsDocument loadersEmbeddingsExceptionsLanguage modelsSerializationOutput parsersPromptsRate limitersRetrieversRunnablesUtilitiesVector stores
    MCP Adapters
    Standard Tests
    Text Splitters
    Language
    Theme
    Pythonlangchain-coreutilshtmlextract_sub_links
    Function●Since v0.1

    extract_sub_links

    Extract all links from a raw HTML string and convert into absolute paths.

    Copy
    extract_sub_links(
      raw_html: str,
      url: str,
      *,
      base_url: str | None = None,
      pattern: str | re.Pattern | None = None,
      prevent_outside: bool = True,
      exclude_prefixes: Sequence[str] = (),
      continue_on_failure: bool = False
    ) -> list[str]

    Parameters

    NameTypeDescription
    raw_html*str

    Original HTML.

    url*str

    The url of the HTML.

    base_urlstr | None
    Default:None

    the base URL to check for outside links against.

    patternstr | re.Pattern | None
    Default:None

    Regex to use for extracting links from raw HTML.

    prevent_outsidebool
    Default:True

    If True, ignore external links which are not children of the base URL.

    exclude_prefixesSequence[str]
    Default:()

    Exclude any URLs that start with one of these prefixes.

    continue_on_failurebool
    Default:False

    If True, continue if parsing a specific link raises an exception. Otherwise, raise the exception.

    View source on GitHub