Ask a question to get started
Enter to send•Shift+Enter new line
NLTKTextSplitter( self, separator: str = '\n\n', language: str = 'english', *, use_span_tokenize
TextSplitter
Create a list of Document objects from a list of json objects (dict).
Document
dict
Split documents.
Text splitter that uses Hugging Face tokenizer to count length.
Text splitter that uses tiktoken encoder to count length.
tiktoken
Transform sequence of documents by splitting them.
separator
str
'\n\n'
language
'english'
use_span_tokenize
bool
False
Splitting text using NLTK package.
The separator to use when combining splits.
The language to use.
Whether to use span_tokenize instead of sent_tokenize.
span_tokenize
sent_tokenize