Skip to content

feat: add unk_token property to Unigram model#1943

Open
ArthurZucker wants to merge 2 commits intomainfrom
feature/unigram-unk-token
Open

feat: add unk_token property to Unigram model#1943
ArthurZucker wants to merge 2 commits intomainfrom
feature/unigram-unk-token

Conversation

@ArthurZucker
Copy link
Copy Markdown
Collaborator

Summary

  • Adds getter/setter for unk_token on the Unigram model
  • Allows users to retrieve and modify the unknown token by its string value (in addition to the existing unk_id)

Changes

  • Add get_unk_id(), get_unk_token(), set_unk_token() methods to Rust Unigram model
  • Add Python bindings for unk_token and unk_id properties
  • Add Python type hints for new properties
  • Add comprehensive tests for Rust and Python

Example usage

from tokenizers.models import Unigram

vocab = [("<unk>", 0.0), ("hello", -1.0), ("world", -1.5)]
model = Unigram(vocab, unk_id=0)

# Get unk_token
print(model.unk_token)  # "<unk>"
print(model.unk_id)     # 0

# Set unk_token to a different token in vocab
model.unk_token = "hello"
print(model.unk_token)  # "hello"
print(model.unk_id)     # 1

Test plan

  • Rust tests for get_unk_id, get_unk_token, set_unk_token methods
  • Python tests for unk_token property getter
  • Python tests for unk_token property setter
  • Python tests for error handling (non-existent token, None value)

🤖 Generated with Claude Code

Adds getter/setter for unk_token on the Unigram model, allowing users
to retrieve and modify the unknown token by its string value (in addition
to the existing unk_id).

- Add get_unk_id(), get_unk_token(), set_unk_token() methods to Rust Unigram
- Add Python bindings for unk_token and unk_id properties
- Add Python type hints for new properties
- Add comprehensive tests for Rust and Python
@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants