LangChain Reference home pageLangChain ReferenceLangChain Reference
  • GitHub
  • Main Docs
Deep Agents
LangChain
LangGraph
Integrations
LangSmith
  • Overview
    • Overview
    • Caches
    • Callbacks
    • Documents
    • Document loaders
    • Embeddings
    • Exceptions
    • Language models
    • Serialization
    • Output parsers
    • Prompts
    • Rate limiters
    • Retrievers
    • Runnables
    • Utilities
    • Vector stores
    MCP Adapters
    Standard Tests
    Text Splitters
    ⌘I

    LangChain Assistant

    Ask a question to get started

    Enter to send•Shift+Enter new line

    Menu

    OverviewCachesCallbacksDocumentsDocument loadersEmbeddingsExceptionsLanguage modelsSerializationOutput parsersPromptsRate limitersRetrieversRunnablesUtilitiesVector stores
    MCP Adapters
    Standard Tests
    Text Splitters
    Language
    Theme
    Pythonlangchain-coreutilshtmlextract_sub_links
    Functionā—Since v0.1

    extract_sub_links

    Copy
    extract_sub_links(
      raw_html: str,
      url: str,
      *,
      base_url: str | 
    View source on GitHub
    None
    =
    None
    ,
    pattern
    :
    str
    |
    re
    .
    Pattern
    |
    None
    =
    None
    ,
    prevent_outside
    :
    bool
    =
    True
    ,
    exclude_prefixes
    :
    Sequence
    [
    str
    ]
    =
    (
    )
    ,
    continue_on_failure
    :
    bool
    =
    False
    )
    ->
    list
    [
    str
    ]

    Parameters

    NameTypeDescription
    raw_html*str

    Original HTML.

    url*str

    The url of the HTML.

    base_urlstr | None
    Default:None

    the base URL to check for outside links against.

    patternstr | re.Pattern | None
    Default:None
    prevent_outsidebool
    Default:True
    exclude_prefixesSequence[str]
    Default:()
    continue_on_failurebool
    Default:False

    Extract all links from a raw HTML string and convert into absolute paths.

    Regex to use for extracting links from raw HTML.

    If True, ignore external links which are not children of the base URL.

    Exclude any URLs that start with one of these prefixes.

    If True, continue if parsing a specific link raises an exception. Otherwise, raise the exception.