Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Bert Embedding Extraction

See original GitHub issue

Hi, I’m trying to extract bert features by extract_bert_features.sh. I find that the token features are extracted based on a document-level, which generates embeddings based on a sequence of sentences, am I right?

Issue Analytics

State:
Created 3 years ago
Reactions:1
Comments:29 (15 by maintainers)

Top GitHub Comments

6reactions

wangxinyu0922commented, Jul 3, 2020

Certainly, I always believe biaffine parser for nested NER is very great, but I need to make sure the flatted NER work as well. So I mainly talk about flat NER here. For your comments:

For the document-level and sentence-level, I think they are probably different topics and are practical. If you want to tag on documents, certainly taking more sentences will become better and it is a proper way to do this. For sentence-level, the most usual usage is online serving for the users (e.g. search engine, inputs from customers), where the inputs are mostly single sentences. Using either sentence-level or document-level are good for study, but I think it should by more clear in the paper since document-level and sentence-level are applicable since they are different kinds of information/input (multiple sentence input [w_1, … , w_n] VS. single sentence input [w_i]).

For the embeddings, I think you can also concat more embeddings as well (though, I think contextual character embeddings may not help parsing methods according to my experiments) or simply use BERT embeddings only. Since the embeddings has different advantages over the approach (Flair is significantly stronger than BERT for sequence-labeling based approaches (Akbik et al., 2018). And in fact, the BERT embedding for sequence-labeling based can do the document-level as well. For me, I want to see a fair comparison between previous state-of-the-art (https://www.aclweb.org/anthology/P19-1527/) and biaffine parser. The beneficial of embeddings probably depends on you network.

Therefore compared to the embeddings issue and document-/sentence-level issue, latter is more important I think. The embeddings only change you network, but not inputs. Though, this is a problem of a lot of previous work, including the great work of BERT. Again, since we are comparing the model architecture, I need to have a fair comparison for the two (decoding) approaches on flat NER with same kinds of input, same kinds of output and even the same kinds of embeddings (though this is not essential). By the way, if you think different embeddings in the comparison is unfair, in addition to the input style, the comparison becomes more unfair 😃

It may be a reason but I have proved that single-layer BiLSTM+SGD does better than yours here. So the sequence-labeling approach can do better in the comparison and it is a usual approach recently.
The hyper-parameters are the same in most of tasks in sequence labeling. For the biaffine parser, your hyper-parameters will be good if we directly apply it into the dependency parsing datasets like PTB. So I think tuning hyper-parameters may not affect the accuracy significantly (also for the grid search). Though the biaffine parser improve on CoNLL 03 NER only by 0.1, we will believe this is a new state-of-the-art based on the score. So we must take the score more seriously and carefully in the comparison.

In conclusion, the document-/sentence-level are totally different input for the task and we need to figure it out clearly, while the embeddings only affect the model architecture. Though it is a problem from a lot of previous work, I need to clearify it since I want to have a fair settings/comparisons (which I believe) in my work. In fact, I don’t want to see the biaffine parser does not work on NER tasks because I cannot do more work based on it if works bad. So currently I believe it does good at difficult flat NER tasks (OntoNote) and nested ner tasks.

Thank you for the help to understand your great work more clearly and again I’m very glad to see it works well.

1reaction

juntaoycommented, Oct 5, 2020

@speedcell4 the ontonotes are based on documents so the path to the documents will be the doc_key. for simplicity you can follow: https://github.com/kentonl/e2e-coref/blob/master/setup_training.sh and https://github.com/kentonl/e2e-coref/blob/master/minimize.py to create the json files. You will only need to modify the NER to sentences level.