question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

ConvNeXt not compatible with mixed precision

See original GitHub issue

System information.

  • Have I written custom code (as opposed to using a stock example script provided in Keras): Yes
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 20.04
  • TensorFlow installed from (source or binary): nightly
  • TensorFlow version (use command below): 2.11.0a20220816
  • Python version: 3.8.10
  • GPU model and memory: NVIDIA Quadro RTX 6000 (24 GB VRAM)
  • Do you want to contribute a PR? (yes/no): No

Describe the problem. Starting from this issue, I observed that ConvNeXt was not compatible with TimeDistributed, this was then fixed in the nightly release (see here). As it was working I then tried to use mixed precision, where I got a new error. Note that MobileNetV3 works seemlessly with mixed precision. Hence, I think only ConvNeXt might be affected, but not sure.

I believe the model itself is working fine with mixed precision, but it contains the layer LayerScale, which may not be (see logs below for more details).

Describe the expected behavior. Mixed precision should work seemlessly with ConvNeXt.

Standalone code to reproduce the issue. It failed when initializing the ConvNeXt model, after mixed precision was enabled. Hence, I believe running this might reproduce the issue (note that source logs are not directly from this script, but I believe you will get the same error):

import tensorflow as tf
from tensorflow.keras.applications import ConvNeXtSmall

tf.keras.mixed_precision.set_global_policy('mixed_float16')
model = ConvNeXtSmall(include_top=False, weights="imagenet", pooling="none")

Source code / logs.

Traceback (most recent call last):
  File "source/main.py", line 454, in <module>
    main()
  File "source/main.py", line 216, in main
    model = get_classifier_architecture(MODEL_ARCH=ret.arch, ret=ret, instance_size=instance_size,
  File "/home/andrep/workspace/bcgrade/source/models/classifiers.py", line 371, in get_classifier_architecture
    shared_base_model = ConvNeXtSmall(include_top=False, weights="imagenet", pooling="none", input_shape=instance_size[1:])
  File "/home/andrep/workspace/bcgrade/venv/lib/python3.8/site-packages/keras/applications/convnext.py", line 610, in ConvNeXtSmall
    return ConvNeXt(
  File "/home/andrep/workspace/bcgrade/venv/lib/python3.8/site-packages/keras/applications/convnext.py", line 516, in ConvNeXt
    x = ConvNeXtBlock(
  File "/home/andrep/workspace/bcgrade/venv/lib/python3.8/site-packages/keras/applications/convnext.py", line 283, in apply
    x = LayerScale(
  File "/home/andrep/workspace/bcgrade/venv/lib/python3.8/site-packages/keras/utils/traceback_utils.py", line 70, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File "/home/andrep/workspace/bcgrade/venv/lib/python3.8/site-packages/tensorflow/python/framework/op_def_library.py", line 588, in _ExtractInputsAndAttrs
    raise TypeError(
TypeError: Exception encountered when calling layer "convnext_small_stage_0_block_0_layer_scale" (type LayerScale).

Input 'y' of 'Mul' Op has type float32 that does not match type float16 of argument 'x'.

Call arguments received by layer "convnext_small_stage_0_block_0_layer_scale" (type LayerScale):
  • x=tf.Tensor(shape=(None, None, None, 96), dtype=float16)

Issue Analytics

  • State:open
  • Created a year ago
  • Comments:5

github_iconTop GitHub Comments

1reaction
zibbinicommented, Oct 4, 2022

I was able to fix this by adding in casts to the appropriate dtype for the the build and __call__ functions (lines 219-225) in the custom layer LayerScale as follows:

class LayerScale(layers.Layer):
    """Layer scale module.

    References:
      - https://arxiv.org/abs/2103.17239

    Args:
      init_values (float): Initial value for layer scale. Should be within
        [0, 1].
      projection_dim (int): Projection dimensionality.

    Returns:
      Tensor multiplied to the scale.
    """

    def __init__(self, init_values, projection_dim, **kwargs):
        super().__init__(**kwargs)
        self.init_values = init_values
        self.projection_dim = projection_dim

    def build(self, input_shape):
        self.gamma = tf.Variable(
            self.init_values * tf.ones((self.projection_dim,))
        )
        if self.gamma.dtype.base_dtype != self._compute_dtype_object.base_dtype:
            self.gamma = tf.cast(self.gamma, dtype=self._compute_dtype_object)        

    def call(self, x):
        if x.dtype.base_dtype != self._compute_dtype_object.base_dtype:
            x = tf.cast(x, dtype=self._compute_dtype_object)
        return x * self.gamma

    def get_config(self):
        config = super().get_config()
        config.update(
            {
                "init_values": self.init_values,
                "projection_dim": self.projection_dim,
            }
        )
        return config
1reaction
tilakrayalcommented, Aug 18, 2022

@gowthamkpr, I was able to reproduce the issue on tensorflow v2.8, v2.9 and nightly. Kindly find the gist of it here.

Read more comments on GitHub >

github_iconTop Results From Across the Web

ConvNeXt/TRAINING.md at main - GitHub
You may need to change cluster-specific arguments in run_with_submitit.py . You can add --use_amp true to train in PyTorch's Automatic Mixed Precision (AMP)....
Read more >
How To Fit a Bigger Model and Train It Faster - Hugging Face
The idea of mixed precision training is that no all variables need to be stored in full (32-bit) floating point precision. If we...
Read more >
Create the correct variable dtype on custom layer when using ...
The model has no issues on creation and can train it with full precision. Nevertheless, when attempting a mixed precision training, ...
Read more >
ConvNeXt Tiny, Small, Base, Large, XLarge - Keras
To get a sense of how these parameters were converted to Keras compatible ... only to be specified if include_top is True, and...
Read more >
ConvNeXt: A ConvNet for the 2020s | Paper Explained
ConvNeXt : A ConvNet for the 2020s | Paper Explained ... Guide To Scaling ML Models - Megatron-LM | ZeRO | DeepSpeed |...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found