tfa.activations.mish doesn't work in Keras
See original GitHub issueSystem information
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04):Windows 10
- TensorFlow version and how it was installed (source or binary): 2.1
- TensorFlow-Addons version and how it was installed (source or binary): 7.1, pip installed
- Python version: 3.7
- Is GPU used? (yes/no): yes
Describe the bug
when using tfa.activations.mish in keras , training halt at begining.
Train for 353 steps, validate for 40 steps
Learning rate: 0.001 Epoch 1/60 10/353 […] - ETA: 2:58:34 - loss: 8.9578 - dense_1_loss: 4.9835 - dense_2_loss: 2.2109 - dense_3_loss: 1.7634 - dense_1_accuracy: 0.0195 - dense_2_accuracy: 0.1937 - dense_3_accuracy: 0.4625
Code to reproduce the issue
import tensorflow as tf
from tensorflow import keras
import tensorflow_addons as tfa
from tensorflow.keras.layers import Dense,Conv2D,Flatten,MaxPool2D,Dropout,BatchNormalization, Input,Activation,AveragePooling2D
from tensorflow.keras.regularizers import l2
from tensorflow.keras.optimizers import Adam
class Mish(Activation):
def __init__(self, activation, **kwargs):
super(Mish, self).__init__(activation, **kwargs)
self.__name__ = 'Mish'
get_custom_objects().update({'Mish': Mish(tfa.activations.mish)})
def resnet_layer(inputs,
num_filters=16,
kernel_size=3,
strides=1,
activation='Mish',
batch_normalization=True,
conv_first=True):
"""2D Convolution-Batch Normalization-Activation stack builder
# Arguments
inputs (tensor): input tensor from input image or previous layer
num_filters (int): Conv2D number of filters
kernel_size (int): Conv2D square kernel dimensions
strides (int): Conv2D square stride dimensions
activation (string): activation name
batch_normalization (bool): whether to include batch normalization
conv_first (bool): conv-bn-activation (True) or
bn-activation-conv (False)
# Returns
x (tensor): tensor as input to the next layer
"""
conv = Conv2D(num_filters,
kernel_size=kernel_size,
strides=strides,
padding='same',
kernel_initializer='he_normal',
kernel_regularizer=None)#l2(1e-4))# change to Weight decay
x = inputs
if conv_first:
x = conv(x)
if batch_normalization:
x = BatchNormalization()(x)
if activation is not None:
x = Activation(activation)(x)
else:
if batch_normalization:
x = BatchNormalization()(x)
if activation is not None:
x = Activation(activation)(x)
x = conv(x)
return x
def resnet_v2(input_shape, depth, num_classes=10):
"""ResNet Version 2 Model builder [b]
Stacks of (1 x 1)-(3 x 3)-(1 x 1) BN-ReLU-Conv2D or also known as
bottleneck layer
First shortcut connection per layer is 1 x 1 Conv2D.
Second and onwards shortcut connection is identity.
At the beginning of each stage, the feature map size is halved (downsampled)
by a convolutional layer with strides=2, while the number of filter maps is
doubled. Within each stage, the layers have the same number filters and the
same filter map sizes.
Features maps sizes:
conv1 : 32x32, 16
stage 0: 32x32, 64
stage 1: 16x16, 128
stage 2: 8x8, 256
# Arguments
input_shape (tensor): shape of input image tensor
depth (int): number of core convolutional layers
num_classes (int): number of classes (CIFAR10 has 10)
# Returns
model (Model): Keras model instance
"""
if (depth - 2) % 9 != 0:
raise ValueError('depth should be 9n+2 (eg 56 or 110 in [b])')
# Start model definition.
num_filters_in = 32
num_res_blocks = int((depth - 2) / 9)
inputs = Input(shape=input_shape)
# v2 performs Conv2D with BN-ReLU on input before splitting into 2 paths
x = resnet_layer(inputs=inputs,
num_filters=num_filters_in,
conv_first=True,
kernel_size=5,
strides=2)
# Instantiate the stack of residual units
for stage in range(3):
for res_block in range(num_res_blocks):
activation = 'Mish'
batch_normalization = True
strides = 1
if stage == 0:
num_filters_out = num_filters_in * 4
if res_block == 0: # first layer and first stage
activation = None
batch_normalization = False
else:
num_filters_out = num_filters_in * 2
if res_block == 0: # first layer but not first stage
strides = 2 # downsample
# bottleneck residual unit
y = resnet_layer(inputs=x,
num_filters=num_filters_in,
kernel_size=1,
strides=strides,
activation=activation,
batch_normalization=batch_normalization,
conv_first=False)
y = resnet_layer(inputs=y,
num_filters=num_filters_in,
conv_first=False)
y = resnet_layer(inputs=y,
num_filters=num_filters_out,
kernel_size=1,
conv_first=False)
if res_block == 0:
# linear projection residual shortcut connection to match
# changed dims
x = resnet_layer(inputs=x,
num_filters=num_filters_out,
kernel_size=1,
strides=strides,
activation=None,
batch_normalization=False)
x = keras.layers.add([x, y])
num_filters_in = num_filters_out
# Add classifier on top.
# v2 has BN-ReLU before Pooling
x = BatchNormalization()(x)
x = Activation('Mish')(x)
x = AveragePooling2D(pool_size=8)(x)
#x = keras.layers.GlobalAveragePooling2D()(x)
y = Flatten()(x)
y = Dense(512,activation = "Mish",kernel_initializer='he_normal')(y)
out = Dense(168, activation = 'softmax',kernel_initializer='he_normal',dtype='float32',name = "dense_1")(y)
# Instantiate model.
model = Model(inputs=inputs, outputs=out )
return model
# Model parameter
# ----------------------------------------------------------------------------
# | | 200-epoch | Orig Paper| 200-epoch | Orig Paper| sec/epoch
# Model | n | ResNet v1 | ResNet v1 | ResNet v2 | ResNet v2 | GTX1080Ti
# |v1(v2)| %Accuracy | %Accuracy | %Accuracy | %Accuracy | v1 (v2)
# ----------------------------------------------------------------------------
# ResNet20 | 3 (2)| 92.16 | 91.25 | ----- | ----- | 35 (---)
# ResNet32 | 5(NA)| 92.46 | 92.49 | NA | NA | 50 ( NA)
# ResNet44 | 7(NA)| 92.50 | 92.83 | NA | NA | 70 ( NA)
# ResNet56 | 9 (6)| 92.71 | 93.03 | 93.01 | NA | 90 (100)
# ResNet110 |18(12)| 92.65 | 93.39+-.16 | 93.15 | 93.63 | 165(180)
# ResNet164 |27(18)| ----- | 94.07 | ----- | 94.54 | ---(---)
# ResNet1001| (111)| ----- | 92.39 | ----- | 95.08+-.14| ---(---)
# ---------------------------------------------------------------------------
n = 2
# Model version
# Orig paper: version = 1 (ResNet v1), Improved ResNet: version = 2 (ResNet v2)
version = 2
# Computed depth from supplied model parameter n
input_shape = [IMG_SIZE,IMG_SIZE,N_CHANNELS]
depth = n * 9 + 2
model_type = 'ResNet%dv%d' % (depth, version)
# In[ ]:
model = resnet_v2(input_shape=input_shape, depth=depth)
Other info / logs when using Activation(‘Addons>mish’) ,I have the same problem, training halted at beginning.
Issue Analytics
- State:
- Created 4 years ago
- Comments:16 (7 by maintainers)
Top Results From Across the Web
Mish Activation function is not correctly displayed in Model ...
as you can see, yes, the Mish is defined as tanh(softplus(x)), but this completely messes up with the Keras functionality! Does anyone know...
Read more >tfa.activations.mish | TensorFlow Addons
Computes mish activation: m i s h ( x ) = x ⋅ tanh See Mish: A Self Regularized Non-Monotonic Neural Activation...
Read more >bifpn5 inff | Kaggle
import numpy as np from keras.models import Sequential import keras import tensorflow as tf ... alpha=0.1) # conv = tfa.activations.mish(conv) # conv =...
Read more >Activation Functions (updated) – The Code-It List
It is useless as an activation function since we do not have “small changes ... y = tf.keras.activations.sigmoid(X) ... tfa.activations.mish.
Read more >Layer activation functions - Keras
Activations can either be used through an Activation layer, or through the ... import layers from tensorflow.keras import activations model.add(layers.
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Hi @willbattel, no, it is still available but the implementation is going to be pure python ops (probably in Addons 0.12).
I mean the program code. 😄 Although the remote access to the isolated virtual machine with the full-blown model deployed would be even better.