Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Generate: deprecate the use of model `config` as a source of defaults

See original GitHub issue

EDIT: Updated with the discussion up to 2022/08/20

Why?

A confusing part of generate is how the defaults are set. When a certain argument is not specified, we attempt to fetch it from the model config file. This makes generate unpredictable and hard to fully document (the default values change for each model), as well as a major source of issues 🔪

How?

We have the following requirements: 1️⃣ The existing behavior can’t be removed, i.e., we must be able to use the model config.json as a source of generation parameters by default; 2️⃣ We do need per-model defaults – some models are designed to do a certain thing (e.g. summarization), which requires a specific generation configuration. 3️⃣ Users must have full control over generate, with minimal hidden behavior.

Ideally, we also want to: 4️⃣ Have separation of concerns and use a new generate_config.json to parameterize generation;

A TL;DR of the plan consists in changing the paradigm from “non-specified generate arguments are overridden by the [model] configuration file” to “generate arguments will override the [generate] configuration file, which is always used”. With proper documentation changes and logging/warnings, the user will be aware of what’s being set for generate.

Step 1: Define a new generate config file and class

Similar to the model config, we want a .json file to store the generation defaults. The class itself can be a very simplified version of PretrainedConfig, also with functionality to load/store from the hub.

Step 2: Integrate loading generate config file in `.from_pretrained()`

The generation configuration file should be loaded when initializing the model with a from_pretrained() method. A couple of things to keep in mind:

There will be a new kwarg in from_pretrained, generate_config (or generation_config? Leaning toward the former as it has the same name as the function);
It will default to generate_config.json (contrarily to the model config, which defaults to None). This will allow users to set this argument to None, to load a model with an empty generate config. Some users have requested a feature like this;
Because the argument can take a path, it means that users can store/load multiple generate configs if they wish to do so (e.g. to use the same model for summarization, creative generation, factual question-answering, etc) 🚀
Only models that can run generate will attempt to load it;
If there is no generate_config.json in the repo, it will attempt to initialize the generate configuration from the model config.json. This means that this solution will not change any generate behavior and will NOT need a major release 👼
To keep the user in the loop, log ALL parameters set when loading the generation config file. Something like the snippet below.
Because this happens at from_pretrained() time, logging will only happen at most once and will not be verbose.

`facebook/opt-1.3b` generate configuration loaded from `generate_config.json`. The following generation defaults were set:
- max_length: 20
- foo: bar
- baz: qux

Step 3: Generate uses the generate config class internally

Instead of using the configuration to override arguments when they are not set, overwrite a copy of the generation config at generate time. I.e. instead of:

arg = arg if arg is not None else self.config.arg
...

generate_config = self.generate_config.copy()
generate_config.arg = arg if arg is not None
...

This change has three main benefits:

We can improve the readability of the code, as we gain the ability to pass configs around. E.g. this function won’t need to take a large list of arguments nor to bother with their initialization.
Building generate argument validation for each type of generation can be built in simple functions that don’t need ~30 arguments as input 🙃
The three frameworks (PT/TF/FLAX) can share functionality like argument validation, decreasing maintenance burden.

Step 4: Document and open PRs with the generation config file

Rewrite part of the documentation to explain that a generation config is ALWAYS used (regardless of having defaults loaded from the hub or not). Open Hub PRs to pull generate-specific parameters from config.json to generate_config.json

Pros/Cons

Pros:

Better awareness – any generate default will be logged to the screen when loading a generate-compatible model;
Full control – the users can choose NOT to load generation parameters or easily load a set of options from an arbitrary file;
Enables more readable generate code;
Enables sharing generate-related code across frameworks;
Doesn’t need a major release.

Cons:

Pulling the generate parameters into their own files won’t happen everywhere, as merging the changes described in step 4 is not feasible for all models (e.g. due to unresponsive model owners);
Logging loaded defaults may not be enough to stop issues related to the default values, as the logs can be ignored;
Another config file (and related code) to maintain.

Issue Analytics

State:
Created a year ago
Reactions:3
Comments:10 (10 by maintainers)

Top GitHub Comments

2reactions

sguggercommented, Aug 31, 2022

The plan looks good to me, but the devil will be in the details 😉 Looking forward to the PRs actioning this!

2reactions

gantecommented, Aug 17, 2022

@patrickvonplaten Agreed, the argument name is a bit too long 😅 However, if we decide to go the GenerationMixin.__init__ route, we can’t pick config – PreTrainedModel, which inherits from GenerationMixin, uses a config argument for the model config. Perhaps generation_config? We could then do .from_pretrained(foo, generation_config=bar).

I love the ideas you gave around the config:

if it is part of the __init__ and if we always attempt to load the new file format before falling back to the original config, it actually means we don’t need to do a major release to build the final version of this updated configuration handling! No need to change defaults with a new release at all ❤️ ;
The idea of “arguments write into a config that is always used” as opposed to “config is used when no arguments are passed” is much clearer to explain. We gain the ability to pass config files around (as opposed to tens of arguments), and it also opens the door to exporting generation configurations;
Despite the above, we need to be careful with the overwrites: if a user calls model.generate(top_k=top_k) and then model.generate(temperature=temperature), top_k should be the original config’s top_k. Copies of objects are needed;
Agreed, having all downloads/file paths in the same place is helpful.

Regarding dict vs class – I’d go with class (or perhaps a simpler dataclass). Much easier to document and enforce correctness, e.g. check if the right arguments are being used with a certain generation type.

It seems like we are in agreement. Are there more issues we can anticipate?

Top Results From Across the Web

Deprecations by version - GitLab Docs

In each release, GitLab announces features that are deprecated and no longer recommended for use. Each deprecated feature will be removed in a...

Deprecations Added in Ember Data 3.x

Deprecates both store.defaultAdapter (which defaults to -json-api) and the -json-api adapter fallback behavior. Previously, applications could define the ...

openapi-generator/README.md at master - GitHub

Option Property Description verbose openapi.generator.maven.plugin.verbose verbose mode ( false by... inputSpec openapi.generator.maven.plugin.inputSpec OpenAPI Spec file path generatorName openapi.generator.maven.plugin.generatorName target generator name

Configuration Metadata - Spring

The annotation processor applies a number of heuristics to extract the default value from the source model. Default values have to be provided ......

Important changes (deprecations) coming in Power Apps and ...

Important. "Deprecated" means we intend to remove the feature or capability from a future release. The feature or capability will continue ...