Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

`activation_dropout` in OPT is never used

See original GitHub issue

System Info

main

Who can help?

@patil-suraj, @patrickvonplaten, @LysandreJik

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, …)
My own task or dataset (give details below)

Reproduction

https://github.com/huggingface/transformers/blob/ee67e7ad4fd7a766891b68f708cf03e30f609976/src/transformers/models/opt/modeling_opt.py#L279

activation_dropout in modeling_opt.py is never used. It would not behave as expected if one initial a model randomly while setting it to non-zero.

Expected behavior

activation_dropout is used or removed.

Issue Analytics

State:
Created a year ago
Comments:8 (7 by maintainers)

Top GitHub Comments

2reactions

shijie-wucommented, Jul 28, 2022

I’m happy to contribute if removing is what we want 😊

1reaction

ArthurZuckercommented, Sep 12, 2022

Gonna merge it to main 🥳

Top Results From Across the Web

Dropout Regularization in Deep Learning Models with Keras

Dropout is only used during the training of a model and is not used when evaluating the skill of the model.

Where should I place dropout layers in a neural network?

Dropout was used after the activation function of each convolutional layer: CONV->RELU->DROP. So should they be placed after all layers, or only the...

Dropout behavior in Keras with rate=1 (dropping all input units ...

The Dropout layer simply doesn't do anything when rate is set to 1 (or 0, see here). I guess it's because the scaling...

InvalidArgumentError: No OpKernel was registered to ... - GitHub

I am user macintosh code `import tensorflow as tf from tensorflow.keras.models import Sequential from tensorflow.keras.layers import Dense, Dropout, ...

Dropout and Batch Normalization - Kaggle

It seems that batch normalization can be used at almost any point in a network. You can put it after a layer... layers.Dense(16,...