[modeling] keys to ignore revisited
See original GitHub issueIf possible let’s please revisit:
_keys_to_ignore_on_save
_keys_to_ignore_on_load_unexpected
_keys_to_ignore_on_load_missing
I’m trying to debug a key that refuses to be ignored on load and I’m not sure if I’m not setting it correctly in all those keys_to_ignore_*
patterns.
- should the keys include the model prefix or not? e.g. here it’s a mixed bunch:
should they all have the model.
prefix, or all not have it?
- should we consistently escape the . or not? Again see the example above for a mixed bunch
I know I was adding non-escaped keys, because there was no ambiguity in things like: encoder.embed_positions.weights
- do we ever need to escape it? Whatever the decision I ask that we use a consistent way so that when things don’t work it’s easy to know how it should be written correctly.
- I’m not very clear about the naming of the last 2 keys, At the point of the model itself it’s hard to remember what they mean, and their explanation is really hard to understand. Could the following explanation be revised. I have a hard time parsing this text:
- I think the logic of defining which keys not to load is either completely missing or incomplete.
I’m trying to tell m2m_100 not to load encoder.embed_positions.weights
(and same for decoder), I added it to all 3 keys to ignore and it still loads it, which is invalid since the model has these saved and I want to load a model with a different max_position_embeddings
value and I can’t.
stderr: RuntimeError: Error(s) in loading state_dict for M2M100ForConditionalGeneration:
stderr: size mismatch for model.encoder.embed_positions.weights: copying a param with shape torch.Size([22, 16]) from checkpoint, the shape in current model is torch.Size([514, 16]).
stderr: size mismatch for model.decoder.embed_positions.weights: copying a param with shape torch.Size([22, 16]) from checkpoint, the shape in current model is torch.Size([514, 16]).
Either the current logic needs to be further refined or we need a new key _keys_to_ignore_on_load_always
?
The current logic is here: https://github.com/huggingface/transformers/blob/69233cf03be5fbce0492f3997e139c4d05499e27/src/transformers/modeling_utils.py#L1921-L1964
It’s easy to see how it fails if set(expected_keys) == set(loaded_keys))
which is the case in this situation:
I think the “bug” is here:
expected_keys = list(model_state_dict.keys())
This further needs to be processed to remove _keys_to_ignore_on_save
, since they are not expected even if they are in the model.
I think the logic is missing and I propose to fix it with this additional chunk (first if):
if cls._keys_to_ignore_on_save is not None:
for pat in cls._keys_to_ignore_on_save:
expected_keys = [k for k in expected_keys if re.search(pat, k) is None]
missing_keys = list(set(expected_keys) - set(loaded_keys))
unexpected_keys = list(set(loaded_keys) - set(expected_keys))
- and it never removes the
unexpected_keys
fromstate_dict
- so all these still get loaded in_load_state_dict_into_model
which doesn’t get the list of keys to load and loads everything from thestate_dict
If I piled up too many issues together please let me know and I will split it up, they are just all seem to be interconnected.
Thank you!
Issue Analytics
- State:
- Created a year ago
- Comments:17 (16 by maintainers)
Those who have clear regex patterns should be escaped and use ., for the ones that only use strings, I think it’s okay to just leave the dot as is.
Thank you!
OK, I will make a PR then.