DPR AutoModel loading incorrect architecture for DPRContextEncoders
See original GitHub issueEnvironment info
transformers
version: 4.10.2- Platform: Darwin-20.6.0-x86_64-i386-64bit
- Python version: 3.7.7
- PyTorch version (GPU?): 1.9.0 (False)
- Tensorflow version (GPU?): not installed (NA)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using GPU in script?: No
- Using distributed or parallel set-up in script?: No
Who can help
Model type dpr
: @LysandreJik @patrickvonplaten @lhoestq
Information
Model I am using:
- https://huggingface.co/facebook/dpr-ctx_encoder-single-nq-base
- https://huggingface.co/facebook/dpr-question_encoder-single-nq-base
To reproduce
Loading a DPR context encoder DPRContextEncoder
using AutoModel.from_pretrained
is actually loading DPRQuestionEncoder
instead, and later fails.
Steps to reproduce the behavior:
AutoModel.from_pretrained('facebook/dpr-ctx_encoder-single-nq-base')
File "venv/lib/python3.7/site-packages/transformers/modeling_utils.py", line 579, in _init_weights
raise NotImplementedError(f"Make sure `_init_weigths` is implemented for {self.__class__}")
NotImplementedError: Make sure `_init_weigths` is implemented for <class 'transformers.models.dpr.modeling_dpr.DPRQuestionEncoder'>
Note in the above that it’s trying to use the DPRQuestionEncoder
even though the config for this context encoder is correct and points to architecture=DPRContextEncoder
.
Using explicitly the DPRContextEncoder.from_pretrained
works just fine, so it looks like this is somewhere in AutoModel
.
DPRContextEncoder.from_pretrained('facebook/dpr-ctx_encoder-single-nq-base')
Expected behavior
Using AutoModel.from_pretrained
should pick the correct architecture for a DPRContextEncoder
.
Issue Analytics
- State:
- Created 2 years ago
- Comments:9 (8 by maintainers)
To the best of my knowledge, this would be a major change of auto factory because the mapping file defines all
Auto-
models all together, not for each specific model. Only modifyingDPR
-related models might break the consistency of them.This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.