Tokenizer data class.
Tokenizer(
self,
chunk_overlap: int,
tokens_per_chunk: int,
decode: Callable[[list[int]], str],
encode: Callable[[str], list[int]]
)Overlap in tokens between chunks
Maximum number of tokens per chunk
Function to decode a list of token IDs to a string
Function to encode a string to a list of token IDs