Skip to content

Tokenizer refactor #12

@rayendito

Description

@rayendito

although they work alright (hopefully lol) the tokenizers are still somewhat rudimentary. all of them are brute force solutions to do stuff (greedy encoding during code point level BPE for example). also maybe clean up the codebase a little, especially for fallback-to-unknown-char cases

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions