question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

tfa.activations.mish doesn't work in Keras

See original GitHub issue

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04):Windows 10
  • TensorFlow version and how it was installed (source or binary): 2.1
  • TensorFlow-Addons version and how it was installed (source or binary): 7.1, pip installed
  • Python version: 3.7
  • Is GPU used? (yes/no): yes

Describe the bug

when using tfa.activations.mish in keras , training halt at begining.

Train for 353 steps, validate for 40 steps

Learning rate: 0.001 Epoch 1/60 10/353 […] - ETA: 2:58:34 - loss: 8.9578 - dense_1_loss: 4.9835 - dense_2_loss: 2.2109 - dense_3_loss: 1.7634 - dense_1_accuracy: 0.0195 - dense_2_accuracy: 0.1937 - dense_3_accuracy: 0.4625

Code to reproduce the issue

import tensorflow as tf
from tensorflow import keras
import tensorflow_addons as tfa
from tensorflow.keras.layers import Dense,Conv2D,Flatten,MaxPool2D,Dropout,BatchNormalization, Input,Activation,AveragePooling2D
from tensorflow.keras.regularizers import l2
from tensorflow.keras.optimizers import Adam
class Mish(Activation):

    def __init__(self, activation, **kwargs):

        super(Mish, self).__init__(activation, **kwargs)

        self.__name__ = 'Mish'
get_custom_objects().update({'Mish': Mish(tfa.activations.mish)})

def resnet_layer(inputs,
                 num_filters=16,
                 kernel_size=3,
                 strides=1,
                 activation='Mish',
                 batch_normalization=True,
                 conv_first=True):
    """2D Convolution-Batch Normalization-Activation stack builder

    # Arguments
        inputs (tensor): input tensor from input image or previous layer
        num_filters (int): Conv2D number of filters
        kernel_size (int): Conv2D square kernel dimensions
        strides (int): Conv2D square stride dimensions
        activation (string): activation name
        batch_normalization (bool): whether to include batch normalization
        conv_first (bool): conv-bn-activation (True) or
            bn-activation-conv (False)

    # Returns
        x (tensor): tensor as input to the next layer
    """
    conv = Conv2D(num_filters,
                  kernel_size=kernel_size,
                  strides=strides,
                  padding='same',
                  kernel_initializer='he_normal',
                  kernel_regularizer=None)#l2(1e-4))# change to Weight decay

    x = inputs
    if conv_first:
        x = conv(x)
        if batch_normalization:
            x = BatchNormalization()(x)
        if activation is not None:
            x = Activation(activation)(x)
    else:
        if batch_normalization:
            x = BatchNormalization()(x)
        if activation is not None:
            x = Activation(activation)(x)
        x = conv(x)
    return x

def resnet_v2(input_shape, depth, num_classes=10):
    """ResNet Version 2 Model builder [b]

    Stacks of (1 x 1)-(3 x 3)-(1 x 1) BN-ReLU-Conv2D or also known as
    bottleneck layer
    First shortcut connection per layer is 1 x 1 Conv2D.
    Second and onwards shortcut connection is identity.
    At the beginning of each stage, the feature map size is halved (downsampled)
    by a convolutional layer with strides=2, while the number of filter maps is
    doubled. Within each stage, the layers have the same number filters and the
    same filter map sizes.
    Features maps sizes:
    conv1  : 32x32,  16
    stage 0: 32x32,  64
    stage 1: 16x16, 128
    stage 2:  8x8,  256

    # Arguments
        input_shape (tensor): shape of input image tensor
        depth (int): number of core convolutional layers
        num_classes (int): number of classes (CIFAR10 has 10)

    # Returns
        model (Model): Keras model instance
    """
    if (depth - 2) % 9 != 0:
        raise ValueError('depth should be 9n+2 (eg 56 or 110 in [b])')
    # Start model definition.
    num_filters_in = 32
    num_res_blocks = int((depth - 2) / 9)

    inputs = Input(shape=input_shape)
    # v2 performs Conv2D with BN-ReLU on input before splitting into 2 paths
    x = resnet_layer(inputs=inputs,
                     num_filters=num_filters_in,
                     conv_first=True,
                    kernel_size=5,
                    strides=2)

    # Instantiate the stack of residual units
    for stage in range(3):
        for res_block in range(num_res_blocks):
            activation = 'Mish'
            batch_normalization = True
            strides = 1
            if stage == 0:
                num_filters_out = num_filters_in * 4
                if res_block == 0:  # first layer and first stage
                    activation = None
                    batch_normalization = False
            else:
                num_filters_out = num_filters_in * 2
                if res_block == 0:  # first layer but not first stage
                    strides = 2    # downsample

            # bottleneck residual unit
            y = resnet_layer(inputs=x,
                             num_filters=num_filters_in,
                             kernel_size=1,
                             strides=strides,
                             activation=activation,
                             batch_normalization=batch_normalization,
                             conv_first=False)
            y = resnet_layer(inputs=y,
                             num_filters=num_filters_in,
                             conv_first=False)
            y = resnet_layer(inputs=y,
                             num_filters=num_filters_out,
                             kernel_size=1,
                             conv_first=False)
            if res_block == 0:
                # linear projection residual shortcut connection to match
                # changed dims
                x = resnet_layer(inputs=x,
                                 num_filters=num_filters_out,
                                 kernel_size=1,
                                 strides=strides,
                                 activation=None,
                                 batch_normalization=False)
            x = keras.layers.add([x, y])

        num_filters_in = num_filters_out

    # Add classifier on top.
    # v2 has BN-ReLU before Pooling
    x = BatchNormalization()(x)
    x = Activation('Mish')(x)
    x = AveragePooling2D(pool_size=8)(x)
    #x = keras.layers.GlobalAveragePooling2D()(x)
    y = Flatten()(x)
    y = Dense(512,activation = "Mish",kernel_initializer='he_normal')(y)
    
    out = Dense(168, activation = 'softmax',kernel_initializer='he_normal',dtype='float32',name = "dense_1")(y)
    
    # Instantiate model.
    model = Model(inputs=inputs, outputs=out )
    return model

# Model parameter
# ----------------------------------------------------------------------------
#           |      | 200-epoch | Orig Paper| 200-epoch | Orig Paper| sec/epoch
# Model     |  n   | ResNet v1 | ResNet v1 | ResNet v2 | ResNet v2 | GTX1080Ti
#           |v1(v2)| %Accuracy | %Accuracy | %Accuracy | %Accuracy | v1 (v2)
# ----------------------------------------------------------------------------
# ResNet20  | 3 (2)| 92.16     | 91.25     | -----     | -----     | 35 (---)
# ResNet32  | 5(NA)| 92.46     | 92.49     | NA        | NA        | 50 ( NA)
# ResNet44  | 7(NA)| 92.50     | 92.83     | NA        | NA        | 70 ( NA)
# ResNet56  | 9 (6)| 92.71     | 93.03     | 93.01     | NA        | 90 (100)
# ResNet110 |18(12)| 92.65     | 93.39+-.16 | 93.15     | 93.63     | 165(180)
# ResNet164 |27(18)| -----     | 94.07     | -----     | 94.54     | ---(---)
# ResNet1001| (111)| -----     | 92.39     | -----     | 95.08+-.14| ---(---)
# ---------------------------------------------------------------------------
n = 2

# Model version
# Orig paper: version = 1 (ResNet v1), Improved ResNet: version = 2 (ResNet v2)
version = 2

# Computed depth from supplied model parameter n
input_shape = [IMG_SIZE,IMG_SIZE,N_CHANNELS]

depth = n * 9 + 2
model_type = 'ResNet%dv%d' % (depth, version)
# In[ ]:
model = resnet_v2(input_shape=input_shape, depth=depth)

Other info / logs when using Activation(‘Addons>mish’) ,I have the same problem, training halted at beginning.

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:16 (7 by maintainers)

github_iconTop GitHub Comments

1reaction
WindQAQcommented, Jul 20, 2020

Hey @seanpmorgan can you link that RFC when it’s available?

Also, just for absolute clarity: what you’re saying is that Mish is not yet available for usage with Keras. Is that correct?

Thanks.

Hi @willbattel, no, it is still available but the implementation is going to be pure python ops (probably in Addons 0.12).

1reaction
failure-to-thrivecommented, Feb 13, 2020

sorry for delay, the full code have 10Gdata, so I tried to make a cifar-10 case,with my model. will be back soon

I mean the program code. 😄 Although the remote access to the isolated virtual machine with the full-blown model deployed would be even better.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Mish Activation function is not correctly displayed in Model ...
as you can see, yes, the Mish is defined as tanh(softplus(x)), but this completely messes up with the Keras functionality! Does anyone know...
Read more >
tfa.activations.mish | TensorFlow Addons
Computes mish activation: m i s h ( x ) = x ⋅ tanh ⁡ See Mish: A Self Regularized Non-Monotonic Neural Activation...
Read more >
bifpn5 inff | Kaggle
import numpy as np from keras.models import Sequential import keras import tensorflow as tf ... alpha=0.1) # conv = tfa.activations.mish(conv) # conv =...
Read more >
Activation Functions (updated) – The Code-It List
It is useless as an activation function since we do not have “small changes ... y = tf.keras.activations.sigmoid(X) ... tfa.activations.mish.
Read more >
Layer activation functions - Keras
Activations can either be used through an Activation layer, or through the ... import layers from tensorflow.keras import activations model.add(layers.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found