Attributes explicitly defined in model configurations are now overridden by the default type.
See original GitHub issueEnvironment info
transformers
version: 4.11.0.dev0- Platform: Linux-5.14.11-arch1-1-x86_64-with-glibc2.33
- Python version: 3.9.7
- PyTorch version (GPU?): 1.9.1+cu102 (True)
- Tensorflow version (GPU?): 2.6.0 (False)
- Flax version (CPU?/GPU?/TPU?): 0.3.5 (cpu)
- Jax version: 0.2.21
- JaxLib version: 0.1.71
- Using GPU in script?: <fill in>
- Using distributed or parallel set-up in script?: <fill in>
The issue
The issue is made visible from the introduction of parameter setters in https://github.com/huggingface/transformers/pull/13026.
This PR moved the initialization of the parent object to be the last statement of the configuration creation - while this could be benign, it isn’t due to the fact that some arguments are defined both in the model configuration and in the upstream configuration.
Such an example is the FSMT configuration. It defines the generate arguments here:
At the end of the method, it initializes the parent configuration without passing the parameter:
Finally, in the parent configuration, the num_beams
is set once again:
This is an issue now as this overrides the previously set num_beams
to be 1. The issue wasn’t caught before because the superclass initialization happened at the beginning, being overridden by the parameters afterwards. This is not the case anymore.
This makes the following test fail: tests/test_modeling_fsmt.py -k test_translation_direct_2_en_de
.
IMO the issue comes from the redefinition of arguments in the FSMT configuration which should not be done as the superclass will already correctly define these arguments given the kwargs. The simplest patch for this (apart from making sure that the parameters are only set once) would be to make sure all previously applied parameters are taken into account by the superclass by adding the following statement to the initialization of the PretrainedConfig
superclass:
[...]
def __init__(self, **kwargs):
+ kwargs = {**kwargs, **self.__dict__}
# Attributes with defaults
self.return_dict = kwargs.pop("return_dict", True)
[...]
WDYT? cc @sgugger @stas00 @nreimers @patrickvonplaten
The cleanest solution would be to make sure that all parameters are only set once, however, which is slightly harder to test.
Reproducible code sample:
from transformers import AutoTokenizer, FSMTForConditionalGeneration
pair = "en-de"
text = {
"en": "Machine learning is great, isn't it?",
"ru": "Машинное обучение - это здорово, не так ли?",
"de": "Maschinelles Lernen ist großartig, oder?",
}
src, tgt = pair.split("-")
print(f"Testing {src} -> {tgt}")
mname = f"facebook/wmt19-{pair}"
src_text = text[src]
tgt_text = text[tgt]
tokenizer = AutoTokenizer.from_pretrained(mname)
model = FSMTForConditionalGeneration.from_pretrained(mname)
print(model.config)
input_ids = tokenizer.encode(src_text, return_tensors="pt")
outputs = model.generate(input_ids)
decoded = tokenizer.decode(outputs[0], skip_special_tokens=True)
assert decoded == tgt_text, f"\n\ngot: {decoded}\nexp: {tgt_text}\n"
Issue Analytics
- State:
- Created 2 years ago
- Comments:7 (7 by maintainers)
I highly doubt the bug is only in FSMT @stas00 The fact that #13026 moved all the super calls at the end of the configuration init has probably created multiple instances of it. It’s just FSMT had good tests that showed us the bug 😃
I like this proposition too @nreimers!