How to ignore PAD tokens for NER
See original GitHub issueHi,
Thank you for such a great repo. I am trying to use the word/token embeddings from the pretrained transformers for NER. The following code is a snippet of my model. For simplicity I am using a Linear decoder as opposed to a CRF decoder.
model_bert = BertModel.from_pretrained(model_dir, config=config)
tokenizer = BertTokenizer.from_pretrained('bert-base-cased')
class BERTNER(nn.Module):
def __init__(self, model, hidden_dim,num_labels):
"""
Torch model that uses the BERT and adds in a classifiers at the end. Num labels is a list of labels
"""
super(BERTNER self).__init__()
self.model = model
self.hidden_dim = hidden_dim
self.num_labels = num_labels
self.rnn = nn.LSTM(self.model.config.hidden_size, hidden_dim, batch_first=True, bidirectional=True)
self.classifier = nn.Linear(2*hidden_dim, num_labels)
def forward(self,input_ids,attention_mask):
outputs = self.model(input_ids=input_ids,attention_mask=attention_mask)
sequence_output = outputs[0]
out,_ = self.rnn(sequence_output)
return self.classifier(out)
model = BERTNER(model_bert,128,len(tag2idx))
And this is the part I am confused. My input to the model are all padded to be fixed length. And generally, when the sentences are padded, if one uses nn.Embedding and then the padding can be ignored. https://pytorch.org/docs/stable/generated/torch.nn.Embedding.html. But here it is not clear to me how to ignore the padded tokens. Any help will be greatly appreciated. Thanks in advance.
Issue Analytics
- State:
- Created 2 years ago
- Comments:5 (1 by maintainers)
Top Results From Across the Web
Token classification - Hugging Face
Mapping all tokens to their corresponding word with the word_ids method. · Assigning the label -100 to the special tokens [CLS] and [SEP]...
Read more >Tensorflow BERT for token-classification - exclude pad-tokens ...
Yes, this is normal. The output of BERT [batch_size, max_seq_len = 100, hidden_size] will include values or embeddings for [PAD] tokens as ...
Read more >How to Fine-Tune BERT for NER Using HuggingFace
How to Pad the Samples. Another issue is different samples can get tokenized into different lengths, so we need to add pad tokens...
Read more >Lessons Learned from Fine-Tuning BERT for Named Entity ...
First, NER is token-level classification, meaning that the model makes ... its predictions for [PAD] tokens' labels were essentially random!
Read more >Named Entity Recognition with BERT in PyTorch
NER is a task in NLP to identify and extract meaningful ... padding : to pad the sequence with a special [PAD] token...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
First, placing an LSTM on top of the final hidden states of a model like BERT is not needed. You can just place a linear layer on top. Any
xxxForTokenClassification
model in the library is implemented that way, and it works really well.Second, to ignore padding tokens, you should make predictions for all tokens, but simply label pad tokens with -100, as this is the default
ignore_index
of theCrossEntropyLoss
in PyTorch. This means that they will not be taken into account by the loss function.Btw, I do have an example notebook for NER which you find here. There’s also the official one which you can find here.
The
attention_mask
indicates if a token is padding or an actual token. The usual way to deal with padding in the LSTM is to pass lengths for each sequence, you can work this out by summing the attention_mask along the “time” access, ie something likeYou’ll have to double check the axis you want to sum over, and that attention_mask=1 for non-padded tokens (otherwise you’ll have to negate it) but hopefully this will help.