Text splitter that uses tiktoken encoder to count length.
Splitting text using Spacy package.
Per default, Spacy's en_core_web_sm model is used and
its default max_length is 1000000 (it is the length of maximum character
this model takes which can be increased for large files). For a faster, but
potentially less accurate splitting, you can use pipeline='sentencizer'.