Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

ConvNeXt not compatible with mixed precision

See original GitHub issue

System information.

Have I written custom code (as opposed to using a stock example script provided in Keras): Yes
OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 20.04
TensorFlow installed from (source or binary): nightly
TensorFlow version (use command below): 2.11.0a20220816
Python version: 3.8.10
GPU model and memory: NVIDIA Quadro RTX 6000 (24 GB VRAM)
Do you want to contribute a PR? (yes/no): No

Describe the problem. Starting from this issue, I observed that ConvNeXt was not compatible with TimeDistributed, this was then fixed in the nightly release (see here). As it was working I then tried to use mixed precision, where I got a new error. Note that MobileNetV3 works seemlessly with mixed precision. Hence, I think only ConvNeXt might be affected, but not sure.

I believe the model itself is working fine with mixed precision, but it contains the layer LayerScale, which may not be (see logs below for more details).

Describe the expected behavior. Mixed precision should work seemlessly with ConvNeXt.

Standalone code to reproduce the issue. It failed when initializing the ConvNeXt model, after mixed precision was enabled. Hence, I believe running this might reproduce the issue (note that source logs are not directly from this script, but I believe you will get the same error):

import tensorflow as tf
from tensorflow.keras.applications import ConvNeXtSmall

tf.keras.mixed_precision.set_global_policy('mixed_float16')
model = ConvNeXtSmall(include_top=False, weights="imagenet", pooling="none")

Source code / logs.

Traceback (most recent call last):
  File "source/main.py", line 454, in <module>
    main()
  File "source/main.py", line 216, in main
    model = get_classifier_architecture(MODEL_ARCH=ret.arch, ret=ret, instance_size=instance_size,
  File "/home/andrep/workspace/bcgrade/source/models/classifiers.py", line 371, in get_classifier_architecture
    shared_base_model = ConvNeXtSmall(include_top=False, weights="imagenet", pooling="none", input_shape=instance_size[1:])
  File "/home/andrep/workspace/bcgrade/venv/lib/python3.8/site-packages/keras/applications/convnext.py", line 610, in ConvNeXtSmall
    return ConvNeXt(
  File "/home/andrep/workspace/bcgrade/venv/lib/python3.8/site-packages/keras/applications/convnext.py", line 516, in ConvNeXt
    x = ConvNeXtBlock(
  File "/home/andrep/workspace/bcgrade/venv/lib/python3.8/site-packages/keras/applications/convnext.py", line 283, in apply
    x = LayerScale(
  File "/home/andrep/workspace/bcgrade/venv/lib/python3.8/site-packages/keras/utils/traceback_utils.py", line 70, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File "/home/andrep/workspace/bcgrade/venv/lib/python3.8/site-packages/tensorflow/python/framework/op_def_library.py", line 588, in _ExtractInputsAndAttrs
    raise TypeError(
TypeError: Exception encountered when calling layer "convnext_small_stage_0_block_0_layer_scale" (type LayerScale).

Input 'y' of 'Mul' Op has type float32 that does not match type float16 of argument 'x'.

Call arguments received by layer "convnext_small_stage_0_block_0_layer_scale" (type LayerScale):
  • x=tf.Tensor(shape=(None, None, None, 96), dtype=float16)

Issue Analytics

State:
Created a year ago
Comments:5

Top GitHub Comments

1reaction

zibbinicommented, Oct 4, 2022

I was able to fix this by adding in casts to the appropriate dtype for the the build and __call__ functions (lines 219-225) in the custom layer LayerScale as follows:

class LayerScale(layers.Layer):
    """Layer scale module.

    References:
      - https://arxiv.org/abs/2103.17239

    Args:
      init_values (float): Initial value for layer scale. Should be within
        [0, 1].
      projection_dim (int): Projection dimensionality.

    Returns:
      Tensor multiplied to the scale.
    """

    def __init__(self, init_values, projection_dim, **kwargs):
        super().__init__(**kwargs)
        self.init_values = init_values
        self.projection_dim = projection_dim

    def build(self, input_shape):
        self.gamma = tf.Variable(
            self.init_values * tf.ones((self.projection_dim,))
        )
        if self.gamma.dtype.base_dtype != self._compute_dtype_object.base_dtype:
            self.gamma = tf.cast(self.gamma, dtype=self._compute_dtype_object)        

    def call(self, x):
        if x.dtype.base_dtype != self._compute_dtype_object.base_dtype:
            x = tf.cast(x, dtype=self._compute_dtype_object)
        return x * self.gamma

    def get_config(self):
        config = super().get_config()
        config.update(
            {
                "init_values": self.init_values,
                "projection_dim": self.projection_dim,
            }
        )
        return config

1reaction

tilakrayalcommented, Aug 18, 2022

@gowthamkpr, I was able to reproduce the issue on tensorflow v2.8, v2.9 and nightly. Kindly find the gist of it here.

Top Results From Across the Web

ConvNeXt/TRAINING.md at main - GitHub

You may need to change cluster-specific arguments in run_with_submitit.py . You can add --use_amp true to train in PyTorch's Automatic Mixed Precision (AMP)....

How To Fit a Bigger Model and Train It Faster - Hugging Face

The idea of mixed precision training is that no all variables need to be stored in full (32-bit) floating point precision. If we...

Create the correct variable dtype on custom layer when using ...

The model has no issues on creation and can train it with full precision. Nevertheless, when attempting a mixed precision training, ...

ConvNeXt Tiny, Small, Base, Large, XLarge - Keras

To get a sense of how these parameters were converted to Keras compatible ... only to be specified if include_top is True, and...

ConvNeXt: A ConvNet for the 2020s | Paper Explained

ConvNeXt : A ConvNet for the 2020s | Paper Explained ... Guide To Scaling ML Models - Megatron-LM | ZeRO | DeepSpeed |...