DPR usage of BertPooler
See original GitHub issueEnvironment info
transformers
version: 4.8.2- Platform: Linux-5.8.0-50-generic-x86_64-with-debian-bullseye-sid
- Python version: 3.7.4
- PyTorch version (GPU?): 1.5.1 (True)
- Tensorflow version (GPU?): not installed (NA)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
Who can help
RAG, DPR: @patrickvonplaten, @lhoestq
Information
DPR initializes BertModel with a BertPooler module which is not used in the end
Although this seems consistent with the original implementation, it is confusing for the user. One would expect that the pooled_output
will come from the BertPooler module, if it is present, and the last layer of the model. Moreover it wastes memory and compute.
How to fix
Simply add the add_pooling_layer=False
flag in https://github.com/huggingface/transformers/blob/master/src/transformers/models/dpr/modeling_dpr.py#L178
Some other parts of the code need also to be fixed, like https://github.com/huggingface/transformers/blob/master/src/transformers/models/dpr/modeling_dpr.py#L205
should be sequence_output = outputs[0]
Issue Analytics
- State:
- Created 2 years ago
- Comments:7 (6 by maintainers)
Top Results From Across the Web
DPR - Hugging Face
It is used to instantiate the components of the DPR model according to the specified arguments, defining the model component architectures.
Read more >DPR-Models — Sentence-Transformers documentation
Usage¶. To encode paragraphs, you need to provide a title (e.g. the Wikipedia article title) and the text passage. These must be seperated...
Read more >bert_pooler - AllenNLP v2.10.1
The pooling layer at the end of the BERT model. This returns an embedding for the [CLS] token, after passing it through a...
Read more >Retrieval Specifics - Simple Transformers
Retrieval models ( RetrievalModel ) are models used to retrieve relevant documents from a corpus given a query. Currently, only DPR models ...
Read more >Utilizing Transformer Representations Efficiently - Kaggle
Note 2: I have used torch.no_grad to fetch outputs from transformer in each technique since gradients gets accumulated and that results in OOM...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Ok, I’ll let you know. I’m quite busy atm.
DPR has an optional projection layer in the original implementation but it is only applied on the sequence output, not on BertPooler’s output.