LangChain Reference home pageLangChain ReferenceLangChain Reference
  • GitHub
  • Main Docs
Deep Agents
LangChain
LangGraph
Integrations
LangSmith
  • Overview
  • MCP Adapters
    Standard Tests
    Text Splitters
    • Overview
    • Agents
    • Callbacks
    • Chains
    • Chat models
    • Embeddings
    • Evaluation
    • Globals
    • Hub
    • Memory
    • Output parsers
    • Retrievers
    • Runnables
    • LangSmith
    • Storage
    ⌘I

    LangChain Assistant

    Ask a question to get started

    Enter to send•Shift+Enter new line

    Menu

    MCP Adapters
    Standard Tests
    Text Splitters
    OverviewAgentsCallbacksChainsChat modelsEmbeddingsEvaluationGlobalsHubMemoryOutput parsersRetrieversRunnablesLangSmithStorage
    Language
    Theme
    Pythonlangchain-classicchainsnatbotcrawlerCrawler
    Classā—Since v1.0

    Crawler

    Copy
    Crawler(
        self,
    )

    Constructors

    Attributes

    Methods

    View source on GitHub
    constructor
    __init__
    attribute
    browser: Browser
    attribute
    page: Page
    attribute
    page_element_buffer: dict[int, ElementInViewPort]
    attribute
    client: CDPSession
    method
    go_to_page
    method
    scroll
    method
    click
    method
    type
    method
    enter
    method
    crawl

    A crawler for web pages.

    Security Note: This is an implementation of a crawler that uses a browser via Playwright.

    This crawler can be used to load arbitrary webpages INCLUDING content
    from the local file system.
    
    Control access to who can submit crawling requests and what network access
    the crawler has.
    
    Make sure to scope permissions to the minimal permissions necessary for
    the application.
    
    See https://docs.langchain.com/oss/python/security-policy for more information.
    

    Navigate to the given URL.

    Scroll the page in the given direction.

    Click on an element with the given id.

    Type text into an element with the given id.

    Press the Enter key.

    Crawl the current page.