Text splitter that uses tiktoken encoder to count length.
| Name | Type | Description |
|---|---|---|
encoding_name | str | Default: 'gpt2'The name of the tiktoken encoding to use. |
model_name | str | None | Default: NoneThe name of the model to use. If provided, this will override the |
allowed_special | Literal['all'] | AbstractSet[str] | Default: set() |
disallowed_special | Literal['all'] | Collection[str] | Default: 'all' |
Splitting text to tokens using model tokenizer.
Special tokens that are allowed during encoding.
Special tokens that are disallowed during encoding.