Bert Embedding Extraction
See original GitHub issueHi, I’m trying to extract bert features by extract_bert_features.sh
. I find that the token features are extracted based on a document-level, which generates embeddings based on a sequence of sentences, am I right?
Issue Analytics
- State:
- Created 3 years ago
- Reactions:1
- Comments:29 (15 by maintainers)
Top Results From Across the Web
BERT Word Embeddings Tutorial - Chris McCormick
In this tutorial, we will use BERT to extract features, namely word and sentence embedding vectors, from text data.
Read more >NLP: Contextualized word embeddings from BERT
Extract contextualized word embeddings from BERT using Keras and TF ... skip to the “BERT Word Embedding Extraction” section.
Read more >BERT Word Embeddings Deep Dive - Isanka Rajapaksha
In this section, we'll highlight the code to extract the word embedding from the BERT model. A notebook containing all this code is...
Read more >How to get sentence embedding using BERT?
Which vector represents the sentence embedding here? Is it hidden_reps or cls_head ? If we look in the forward() method of the BERT...
Read more >[2105.05112] Integrating extracted information from bert and ...
Recently BERT sentence embedding has also been used for this task. In this paper, we propose a framework for humour detection in short...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Certainly, I always believe biaffine parser for nested NER is very great, but I need to make sure the flatted NER work as well. So I mainly talk about flat NER here. For your comments:
For the embeddings, I think you can also concat more embeddings as well (though, I think contextual character embeddings may not help parsing methods according to my experiments) or simply use BERT embeddings only. Since the embeddings has different advantages over the approach (Flair is significantly stronger than BERT for sequence-labeling based approaches (Akbik et al., 2018). And in fact, the BERT embedding for sequence-labeling based can do the document-level as well. For me, I want to see a fair comparison between previous state-of-the-art (https://www.aclweb.org/anthology/P19-1527/) and biaffine parser. The beneficial of embeddings probably depends on you network.
Therefore compared to the embeddings issue and document-/sentence-level issue, latter is more important I think. The embeddings only change you network, but not inputs. Though, this is a problem of a lot of previous work, including the great work of BERT. Again, since we are comparing the model architecture, I need to have a fair comparison for the two (decoding) approaches on flat NER with same kinds of input, same kinds of output and even the same kinds of embeddings (though this is not essential). By the way, if you think different embeddings in the comparison is unfair, in addition to the input style, the comparison becomes more unfair 😃
It may be a reason but I have proved that single-layer BiLSTM+SGD does better than yours here. So the sequence-labeling approach can do better in the comparison and it is a usual approach recently.
The hyper-parameters are the same in most of tasks in sequence labeling. For the biaffine parser, your hyper-parameters will be good if we directly apply it into the dependency parsing datasets like PTB. So I think tuning hyper-parameters may not affect the accuracy significantly (also for the grid search). Though the biaffine parser improve on CoNLL 03 NER only by 0.1, we will believe this is a new state-of-the-art based on the score. So we must take the score more seriously and carefully in the comparison.
In conclusion, the document-/sentence-level are totally different input for the task and we need to figure it out clearly, while the embeddings only affect the model architecture. Though it is a problem from a lot of previous work, I need to clearify it since I want to have a fair settings/comparisons (which I believe) in my work. In fact, I don’t want to see the biaffine parser does not work on NER tasks because I cannot do more work based on it if works bad. So currently I believe it does good at difficult flat NER tasks (OntoNote) and nested ner tasks.
Thank you for the help to understand your great work more clearly and again I’m very glad to see it works well.
@speedcell4 the ontonotes are based on documents so the path to the documents will be the doc_key. for simplicity you can follow: https://github.com/kentonl/e2e-coref/blob/master/setup_training.sh and https://github.com/kentonl/e2e-coref/blob/master/minimize.py to create the json files. You will only need to modify the NER to sentences level.