TokenTextSplitter

TokenTextSplitter(
  self,
  encoding_name: str = 'gpt2',
  model_name: str | None = None,

Bases

TextSplitter

Create a list of Document objects from a list of texts.

Split documents.

Text splitter that uses Hugging Face tokenizer to count length.

Name	Type	Description
`encoding_name`	`str`	Default:`'gpt2'` The name of the tiktoken encoding to use.
`model_name`	`str \| None`	Default:`None` The name of the model to use. If provided, this will override the `encoding_name`.
`allowed_special`	`Literal['all'] \| AbstractSet[str] \| None`	Default:`None`
`disallowed_special`	`Literal['all'] \| Collection[str]`	Default:`'all'`

Name	Type
encoding_name	str
model_name	str \| None
allowed_special	Literal['all'] \| AbstractSet[str] \| None
disallowed_special	Literal['all'] \| Collection[str]