Extract all links from a raw HTML string and convert into absolute paths.
extract_sub_links(
raw_html: str,
url: str,
*,
base_url: str | None = None,
pattern: str | re.Pattern | None = None,
prevent_outside: bool = True,
exclude_prefixes: Sequence[str] = (),
continue_on_failure: bool = False
) -> list[str]| Name | Type | Description |
|---|---|---|
raw_html* | str | Original HTML. |
url* | str | The url of the HTML. |
base_url | str | None | Default: Nonethe base URL to check for outside links against. |
pattern | str | re.Pattern | None | Default: NoneRegex to use for extracting links from raw HTML. |
prevent_outside | bool | Default: TrueIf |
exclude_prefixes | Sequence[str] | Default: ()Exclude any URLs that start with one of these prefixes. |
continue_on_failure | bool | Default: FalseIf |