An experimental text splitter for handling Markdown syntax.
This splitter aims to retain the exact whitespace of the original text while
extracting structured metadata, such as headers. It is a re-implementation of the
MarkdownHeaderTextSplitter with notable changes to the approach and additional
features.
Key Features:
---) as well.headers_to_split_on parameter.ExperimentalMarkdownSyntaxTextSplitter(
self,
headers_to_split_on: list[tuple[str, str]] | None = None,
return_each_line: bool = False,
strip_headers: bool = True
)Example:
headers_to_split_on = [
("#", "Header 1"),
("##", "Header 2"),
]
splitter = ExperimentalMarkdownSyntaxTextSplitter(
headers_to_split_on=headers_to_split_on
)
chunks = splitter.split(text)
for chunk in chunks:
print(chunk)
This class is currently experimental and subject to change based on feedback and further development.
| Name | Type | Description |
|---|---|---|
headers_to_split_on | list[tuple[str, str]] | None | Default: NoneA list of tuples, where each tuple contains a header tag (e.g., "h1") and its corresponding metadata key. If |
return_each_line | bool | Default: FalseWhether to return each line as an individual chunk. Defaults to |
strip_headers | bool | Default: TrueWhether to exclude headers from the resulting chunks. |