Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

All Flan-T5 models configs use the incorrect activation function

See original GitHub issue

System Info

The configs for all of the Flan-T5 says that the activation function is ‘gelu’ and yet ‘is_gated_act’ is set to true. This is an inherent contradiction.

Doing more digging, I realized that per Google’s original Flan-T5 checkpoints , Flan-T5 is directly instantiated from T5v1.1 LM-adapt, which all use gated-gelu

Who can help?

@younesbelkada @arthur

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, …)
My own task or dataset (give details below)

Reproduction

Compare the T5v1.1+LM configs to the Flan-T5 configs.

Expected behavior

“feed_forward_proj” should be “gated-gelu” and “dense_act_fn” is redundant and should be removed entirely from the config.

Issue Analytics

State:
Created 10 months ago
Comments:5 (2 by maintainers)

Top GitHub Comments

1reaction

LiJunnan1992commented, Nov 17, 2022

@younesbelkada I see. Thanks so much for the explanation!

0reactions

younesbelkadacommented, Nov 17, 2022

@LiJunnan1992 No, it gets overriden by the kwargs here, check this snippet:

from transformers import T5Config

config_gated = T5Config(is_gated_act=True, hidden_act="gelu")
print(config_gated.is_gated_act)
>>> True

config_gated = T5Config(hidden_act="gelu")
print(config_gated.is_gated_act)
>>> False

config_gated = T5Config(feed_forward_proj="gated-gelu")
print(config_gated.is_gated_act)
>>> True