Text splitter that uses tiktoken encoder to count length.
from_tiktoken_encoder(
cls,
encoding_name: str = 'gpt2',
model_name: str | None = None,
allowed_special: Literal['all'] | AbstractSet[str] = set(),
disallowed_special: Literal['all'] | Collection[str] = 'all',
**kwargs: Any = {}
) -> Self| Name | Type | Description |
|---|---|---|
encoding_name | str | Default: 'gpt2'The name of the tiktoken encoding to use. |
model_name | str | None | Default: NoneThe name of the model to use. If provided, this will override the |
allowed_special | Literal['all'] | AbstractSet[str] | Default: set()Special tokens that are allowed during encoding. |
disallowed_special | Literal['all'] | Collection[str] | Default: 'all'Special tokens that are disallowed during encoding. |