question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[modeling] keys to ignore revisited

See original GitHub issue

If possible let’s please revisit:

  1. _keys_to_ignore_on_save
  2. _keys_to_ignore_on_load_unexpected
  3. _keys_to_ignore_on_load_missing

I’m trying to debug a key that refuses to be ignored on load and I’m not sure if I’m not setting it correctly in all those keys_to_ignore_* patterns.


  1. should the keys include the model prefix or not? e.g. here it’s a mixed bunch:

https://github.com/huggingface/transformers/blob/69233cf03be5fbce0492f3997e139c4d05499e27/src/transformers/models/m2m_100/modeling_m2m_100.py#L1215-L1227

should they all have the model. prefix, or all not have it?

  1. should we consistently escape the . or not? Again see the example above for a mixed bunch

I know I was adding non-escaped keys, because there was no ambiguity in things like: encoder.embed_positions.weights - do we ever need to escape it? Whatever the decision I ask that we use a consistent way so that when things don’t work it’s easy to know how it should be written correctly.

  1. I’m not very clear about the naming of the last 2 keys, At the point of the model itself it’s hard to remember what they mean, and their explanation is really hard to understand. Could the following explanation be revised. I have a hard time parsing this text:

https://github.com/huggingface/transformers/blob/69233cf03be5fbce0492f3997e139c4d05499e27/src/transformers/modeling_utils.py#L726-L731

  1. I think the logic of defining which keys not to load is either completely missing or incomplete.

I’m trying to tell m2m_100 not to load encoder.embed_positions.weights (and same for decoder), I added it to all 3 keys to ignore and it still loads it, which is invalid since the model has these saved and I want to load a model with a different max_position_embeddings value and I can’t.

stderr: RuntimeError: Error(s) in loading state_dict for M2M100ForConditionalGeneration:
stderr:         size mismatch for model.encoder.embed_positions.weights: copying a param with shape torch.Size([22, 16]) from checkpoint, the shape in current model is torch.Size([514, 16]).
stderr:         size mismatch for model.decoder.embed_positions.weights: copying a param with shape torch.Size([22, 16]) from checkpoint, the shape in current model is torch.Size([514, 16]).

Either the current logic needs to be further refined or we need a new key _keys_to_ignore_on_load_always?

The current logic is here: https://github.com/huggingface/transformers/blob/69233cf03be5fbce0492f3997e139c4d05499e27/src/transformers/modeling_utils.py#L1921-L1964

It’s easy to see how it fails if set(expected_keys) == set(loaded_keys)) which is the case in this situation:

https://github.com/huggingface/transformers/blob/69233cf03be5fbce0492f3997e139c4d05499e27/src/transformers/modeling_utils.py#L1953-L1954

I think the “bug” is here:

        expected_keys = list(model_state_dict.keys())

This further needs to be processed to remove _keys_to_ignore_on_save, since they are not expected even if they are in the model.

I think the logic is missing and I propose to fix it with this additional chunk (first if):

        if cls._keys_to_ignore_on_save is not None:
            for pat in cls._keys_to_ignore_on_save:
                expected_keys = [k for k in expected_keys if re.search(pat, k) is None]

        missing_keys = list(set(expected_keys) - set(loaded_keys))
        unexpected_keys = list(set(loaded_keys) - set(expected_keys))
  1. and it never removes the unexpected_keys from state_dict - so all these still get loaded in _load_state_dict_into_model which doesn’t get the list of keys to load and loads everything from the state_dict

If I piled up too many issues together please let me know and I will split it up, they are just all seem to be interconnected.

Thank you!

@LysandreJik, @sgugger, @patrickvonplaten, @patil-suraj

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:17 (16 by maintainers)

github_iconTop GitHub Comments

1reaction
sguggercommented, Jun 15, 2022

Those who have clear regex patterns should be escaped and use ., for the ones that only use strings, I think it’s okay to just leave the dot as is.

1reaction
stas00commented, Jun 15, 2022

Thank you!

OK, I will make a PR then.

Read more comments on GitHub >

github_iconTop Results From Across the Web

New evidence on algorithm performance and quality metrics
Topic modeling revisited: New evidence on algorithm performance and quality metrics. Matthias Rüdiger ,.
Read more >
Gō model revisited - PMC - NCBI
This review discusses Gō models broadly used in biomolecular simulations. I start with a brief description of the original lattice model ...
Read more >
Convolutional Sequence Modeling Revisited - OpenReview
Abstract: This paper revisits the problem of sequence modeling using convolutional architectures. Although both convolutional and recurrent architectures ...
Read more >
Revisiting the standard for modeling the spread of infectious ...
This may compromise the model's usability for tasks such as “Flattening the Curve” or other interventions for epidemic management. Here we ...
Read more >
Parametric Models, Duration Dependence, and Time-Varying ...
relationship between duration dependence and time-varying data in parametric models of duration. The key reason to use duration analysis methods is to allow ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found