deberta-v3 has 100 more vocabs than its tokenizer
See original GitHub issueSystem Info
transformers
version: 4.22.1- Platform: macOS-12.6-arm64-arm-64bit
- Python version: 3.8.13
- Huggingface_hub version: 0.10.0
- PyTorch version (GPU?): 1.12.1 (False)
- Tensorflow version (GPU?): 2.8.1 (False)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
Who can help?
Hi @LysandreJik @SaulLu , I think this issue needs both of you to help or confirm:
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examples
folder (such as GLUE/SQuAD, …) - My own task or dataset (give details below)
Reproduction
model_type = "microsoft/deberta-v3-base"
tokenizer = AutoTokenizer.from_pretrained(model_type)
print(tokenizer.vocab_size) # output: 128000
print(len(tokenizer.vocab)) # output: 128001, the extra one is padding?
config = AutoConfig.from_pretrained(model_type)
print(config.vocab_size) # output: 128100
model = AutoModel.from_pretrained(model_type, config=config)
print(print(len(model.embeddings.word_embeddings.weight)) # 128100, which is consistent with the config
Expected behavior
The deberta model should have the same vocab_size as its tokenizer.
Issue Analytics
- State:
- Created a year ago
- Comments:9 (4 by maintainers)
Top Results From Across the Web
DeBERTa - Hugging Face
Retrieves sequence ids from a token list that has no special tokens added. This method is called when adding special tokens using the...
Read more >deberta v2_3 fast tokenizer - Kaggle
The following is necessary if you want to use the fast tokenizer for deberta v2 or v3 ... by -100 to avoid using...
Read more >TASK-AWARE LAYER-WISE DISTILLA - OpenReview
Layer-wise distillation is a powerful tool to compress large models (i.e. teacher models) into small ones (i.e., student models). The student distills ...
Read more >Language Modelling with Pixels – arXiv Vanity
Furthermore, we find that pixel is more robust to noisy text inputs than ... On one end of the spectrum, a vocabulary over...
Read more >PROCEEDINGS Volume 1: MT Research Track - AMTA
mainstream than ever, we have had to be more selective in the ... contains too many words (100 words is set as default...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@SaulLu Got another improvement again. I’m a Prize Contender now! Many thanks!
@SaulLu Got it, thanks a lot for your detailed explanation