question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

All Flan-T5 models configs use the incorrect activation function

See original GitHub issue

System Info

The configs for all of the Flan-T5 says that the activation function is ‘gelu’ and yet ‘is_gated_act’ is set to true. This is an inherent contradiction.

Doing more digging, I realized that per Google’s original Flan-T5 checkpoints , Flan-T5 is directly instantiated from T5v1.1 LM-adapt, which all use gated-gelu

Who can help?

@younesbelkada @arthur

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, …)
  • My own task or dataset (give details below)

Reproduction

Compare the T5v1.1+LM configs to the Flan-T5 configs.

Expected behavior

“feed_forward_proj” should be “gated-gelu” and “dense_act_fn” is redundant and should be removed entirely from the config.

Issue Analytics

  • State:closed
  • Created 10 months ago
  • Comments:5 (2 by maintainers)

github_iconTop GitHub Comments

1reaction
LiJunnan1992commented, Nov 17, 2022

@younesbelkada I see. Thanks so much for the explanation!

0reactions
younesbelkadacommented, Nov 17, 2022

@LiJunnan1992 No, it gets overriden by the kwargs here, check this snippet:

from transformers import T5Config

config_gated = T5Config(is_gated_act=True, hidden_act="gelu")
print(config_gated.is_gated_act)
>>> True

config_gated = T5Config(hidden_act="gelu")
print(config_gated.is_gated_act)
>>> False

config_gated = T5Config(feed_forward_proj="gated-gelu")
print(config_gated.is_gated_act)
>>> True
Read more comments on GitHub >

github_iconTop Results From Across the Web

Activation Functions in Neural Networks - Towards Data Science
The ReLU is the most used activation function in the world right now.Since, it is used in almost all the convolutional neural networks...
Read more >
Activation Functions in Neural Networks [12 Types & Use Cases]
A neural network activation function is a function that is applied to the output of a neuron. Learn about different types of activation ......
Read more >
Which activation function for output layer? - Cross Validated
First of all: the activation function g(x) at the output layer often depends on your cost function. This is done to make the...
Read more >
Tutorial 3: Activation Functions - UvA DL Notebooks
Activation functions are a crucial part of deep learning models as they ... The following cell downloads all pretrained models we will use...
Read more >
Linear vs nonlinear neural network? - Stack Overflow
Nonlinear activation function?) Or do I have a completely wrong understanding of the word "linear" - can a linear regression NN accurately model...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found