Class●Since v0.0

SpacyTextSplitter

SpacyTextSplitter(
  self,
  separator: str = '\n\n',
  pipeline: str = 'en_core_web_sm',
  max_length:

Bases

TextSplitter

Constructors

Methods

Inherited fromTextSplitter

Methods

Mcreate_documents

—

Create a list of Document objects from a list of texts.

Msplit_documents

—

Split documents.

Mfrom_huggingface_tokenizer

—

Text splitter that uses Hugging Face tokenizer to count length.

View source on GitHub

—

Text splitter that uses tiktoken encoder to count length.

Name	Type
separator	str
pipeline	str
max_length	int
strip_whitespace	bool

Splitting text using Spacy package.

Per default, Spacy's en_core_web_sm model is used and its default max_length is 1000000 (it is the length of maximum character this model takes which can be increased for large files). For a faster, but potentially less accurate splitting, you can use pipeline='sentencizer'.

LangChain Assistant

Menu

SpacyTextSplitter

Bases

Constructors

Methods

Inherited fromTextSplitter

Methods

Inherited fromBaseDocumentTransformer(langchain_core)

Methods

Menu

SpacyTextSplitter

Bases

Used in Docs

Constructors

Methods

Inherited fromTextSplitter

Methods

Inherited fromBaseDocumentTransformer(langchain_core)

Methods