Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

past_key_values not accepted in generate with GPTNeoX

See original GitHub issue

System Info

Python 3.7.13 transformers 4.22.2

Who can help?

@LysandreJik @patrickvonplaten


  • The official example scripts
  • My own modified scripts


  • An officially supported task in the examples folder (such as GLUE/SQuAD, …)
  • My own task or dataset (give details below)


The past_key_values kwarg is not accepted when calling model.generate(..., past_key_values=pkv) on a GPTNeoxForCausalLM, even though the model.forward does accept this kwarg. It does seem to work fine with other model classes like GPT2.

Minimal example to reproduce error:

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
import transformers

model_id = "NinedayWang/PolyCoder-160M" # small model with GPTNeoXForCausalLM class
model = AutoModelForCausalLM.from_pretrained(model_id)
tok = AutoTokenizer.from_pretrained(model_id)
assert isinstance(model, transformers.models.gpt_neox.modeling_gpt_neox.GPTNeoXForCausalLM)
pkv = torch.rand(
        1,      # batch size      
        10,    # number of tokens
        2 * model.config.num_hidden_layers, 
        model.config.hidden_size // model.config.num_attention_heads
out = model.generate(**tok("Hello world"), past_key_values=pkv)

Error message:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/st/st_us-052400/st_st175337/conda/envs/thesis/lib/python3.7/site-packages/torch/autograd/", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/home/st/st_us-052400/st_st175337/conda/envs/thesis/lib/python3.7/site-packages/transformers/", line 1146, in generate
  File "/home/st/st_us-052400/st_st175337/conda/envs/thesis/lib/python3.7/site-packages/transformers/", line 862, in _validate_model_kwargs
    f"The following `model_kwargs` are not used by the model: {unused_model_args} (note: typos in the"
ValueError: The following `model_kwargs` are not used by the model: ['past_key_values'] (note: typos in the generate arguments will also show up in this list)

I checked the error location and located the bug (“transformers/”, line 862, in _validate_model_kwargs):

        unused_model_args = []
        model_args = set(inspect.signature(self.prepare_inputs_for_generation).parameters)
        # `kwargs` if often used to handle optional forward pass inputs like `attention_mask`. If
        # `prepare_inputs_for_generation` doesn't accept `kwargs`, then a stricter check can be made ;)
        if "kwargs" in model_args:
            model_args |= set(inspect.signature(self.forward).parameters)
        for key, value in model_kwargs.items():
            if value is not None and key not in model_args:

        if unused_model_args:
            raise ValueError(
                f"The following `model_kwargs` are not used by the model: {unused_model_args} (note: typos in the"
                " generate arguments will also show up in this list)"

It first checks the args of prepare_inputs_for_generation and only adds the args of forward to the accepted list if "kwargs" is in the args of prepare_inputs_for_generation. However, contrary to GPT2, it only contains model_kwargs instead of kwargs for GPTNeox.

So either the GPTNeoX class should be adapted, or the _validate_model_kwargs method in

Expected behavior

generate should be able to pass along all valid model_kwargs

Issue Analytics

  • State:open
  • Created 10 months ago
  • Comments:6 (4 by maintainers)

github_iconTop GitHub Comments

patrickvonplatencommented, Dec 12, 2022

@gante @ArthurZucker I think we should rename all occurrences of "past" to "past_key_values" in prepare_inputs_for_generation and deprecate “past” if necessary.

"past" was simply the name for the past key values states before we renamed everything to past_key_values, so this is just a left-over.

ArthurZuckercommented, Dec 12, 2022


Read more comments on GitHub >

github_iconTop Results From Across the Web

GPT-NeoX - Hugging Face
The generate() method can be used to generate text using GPT Neo model. ... The bare GPTNeoX Model transformer outputting raw hidden-states without...
Read more >
GPT-NeoX-20B Integration #15642 - GitHub
If we do integrate the model without model parallelism - it will be too large to run on most consumer GPUs. During inference,...
Read more >
Exploring the Text generation with GPT-NeoX
To generate text from GPT-NeoX we need to perform a series of steps. ... using GPT-NeoX we just generate the text from the...
Read more >
How To Fine-Tune GPT-NeoX -
How to start fine-tuning GPT-NeoX models on the Forefront platform. Fine-tuning is a powerful technique to specialize a GPT-NeoX model for a ...
Read more >
How To Run GPT-NeoX-20B(GPT3) - YouTube
Large language models perform better as they get larger for many tasks. At this time, the largest model is GPT-NeoX -20B.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Post

No results found

github_iconTop Related Hashnode Post

No results found