Converts HTML documents to Markdown format with customizable options for handling links, images, other tags and heading styles using the markdownify library.
Example:
.. code-block:: python from langchain_community.document_transformers import MarkdownifyTransformer markdownify = MarkdownifyTransformer() docs_transform = markdownify.transform_documents(docs)
More configuration options can be found at the markdownify GitHub page: https://github.com/matthewwithanm/python-markdownify
A list of tags to convert. This option can't be used with the strip option.
A boolean indicating whether the "automatic link" style should be used when a a tag's contents match its href. Defaults to True.
Defines how headings should be converted. Accepted values are ATX, ATX_CLOSED, SETEXT, and UNDERLINED (which is an alias for SETEXT). Defaults to ATX.
Additional options to pass to markdownify.