question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Adding special tokens to the model

See original GitHub issue

Hello, I am trying to use model.tokenizer.add_special_tokens(special_tokens_dict) to add some special tokens to the model. But after doing that i received indexing error (IndexError: index out of range in self ) when i wanted to encode a sentence. I wonder to know how i can learn the vector representations of new tokens? something like model.resize_token_embeddings(len(t))

Issue Analytics

  • State:open
  • Created 3 years ago
  • Comments:5 (2 by maintainers)

github_iconTop GitHub Comments

2reactions
nreimerscommented, Feb 5, 2021

You can use this code:

tokens = ["TOK1", "TOK2"]
word_embedding_model = model._first_module()   #Your models.Transformer object
word_embedding_model.tokenizer.add_tokens(tokens, special_tokens=True)
word_embedding_model.auto_model.resize_token_embeddings(len(word_embedding_model.tokenizer))
1reaction
nreimerscommented, Mar 26, 2021

Yes, it is correct.

Read more comments on GitHub >

github_iconTop Results From Across the Web

How to add some new special tokens to a pretrained tokenizer?
Hi guys. I want to add some new special tokens like [XXX] to a pretrained ByteLevelBPETokenizer, but I can't find how to do...
Read more >
Utilities for Tokenizers - Hugging Face
The model input with special tokens. Build model inputs from a sequence or a pair of sequence for sequence classification tasks by concatenating...
Read more >
How to add new special token to the tokenizer? - Stack Overflow
I want to build a multi-class classification model for which I have conversational data as input for the BERT model ...
Read more >
How to add new tokens to huggingface transformers vocabulary
In this short article, you'll learn how to add new tokens to the vocabulary of a huggingface transformer model.
Read more >
Adding a new token to a transformer model without breaking ...
add_tokens (new_words) model.resize_token_embeddings(len(tokenizer)) tokenizer.tokenize('myword1 myword2') # result: ['myword1', 'myword2 ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found