Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[TensorFlow] Failed to get reproducible trainings with albumentations included to the data pipeline

See original GitHub issue

🐛 Bug

I could not get my training work in reproducible way when albumentations added to the data pipeline. I followed this thread https://github.com/albumentations-team/albumentations/issues/93 and fixed all possible seeds, so in overall my snippet that should have enabled reproducible experiments looks like this:

import os
import random

import numpy as np
import tensorflow as tf

def set_random_seed(seed: int = 42):
    """
    Globally fix all possible sources of randomness to keep experiment reproducible 
    """
    random.seed(seed)
    np.random.seed(seed)
    tf.random.set_seed(seed)
    os.environ['PYTHONHASHSEED'] = str(seed)
    os.environ['TF_DETERMINISTIC_OPS'] = '1'
    os.environ['TF_CUDNN_DETERMINISTIC'] = '1'

Unfortunately, this doesn’t help me to get reproducible results. I have executed training process 6 times and got all different results. You can also see the whole picture in W&B:

https://wandb.ai/roma-glushko/rock-paper-scissors/runs/2bdgnbwx (best_val_acc: 0.7104, best_epoch: 3)
https://wandb.ai/roma-glushko/rock-paper-scissors/runs/2qo9pbls (best_val_acc: 0.7875, best_epoch: 8)
https://wandb.ai/roma-glushko/rock-paper-scissors/runs/uf6cknge (best_val_acc: 0.6771, best_epoch: 8)
https://wandb.ai/roma-glushko/rock-paper-scissors/runs/tem3umbx (best_val_acc: 0.7729, best_epoch: 6)
https://wandb.ai/roma-glushko/rock-paper-scissors/runs/czsjm7px (best_val_acc: 0.7208, best_epochs: 0 and 8)
https://wandb.ai/roma-glushko/rock-paper-scissors/runs/29dif98z (best_val_acc: 0.8, best_epoch: 9)

Mean: 0.74478
Std: 0.044726

Also, I tried to set random.seed() right before passing my batch into a.Compose() pipeline. That did not really help.

However, when I comment out albumentations from my data pipeline or replace it with some pure TF augmentations, I can get my training reproducible.

Any clues what’s wrong here?

To Reproduce

Steps to reproduce the behavior:

Clone the project state at 0.1.0-bugrep tag:

git clone --depth 1 --branch 0.1.0-bugrep https://github.com/roma-glushko/rock-paper-scissor

Pull dataset:

cd data
kaggle datasets download --unzip frtgnn/rock-paper-scissor

Install project deps:

poetry install

Uncomment any of the reported augmentations in the config file (they are all commented out in the git): https://github.com/roma-glushko/rock-paper-scissor/blob/master/configs/basic_config.py
Run training a couple of times and you get results that differs by a lot:

python train.py

Expected behavior

In order to do experiments that analyze impact of different ideas and changes, I would like to see my training process reproducible.

Environment

Albumentations version (e.g., 0.1.8): 0.5.2
Python version (e.g., 3.7): 3.8.6
OS (e.g., Linux): Ubuntu 20.10
How you installed albumentations (conda, pip, source): poetry (pip-like)
tensorflow-gpu: 2.5.0 (for the sake of compatibility with RTX3070 (ampere arch.))

Additional context

This report is reproduced in a project that is also mentioned in https://github.com/albumentations-team/albumentations/issues/905

The data pipeline is the same for both issues:

def augment_image(inputs, labels, augmentation_pipeline: a.Compose):
    def apply_augmentation(images):
        aug_data = augmentation_pipeline(image=images.astype('uint8'))
        return aug_data['image']

    inputs = tf.numpy_function(func=apply_augmentation, inp=[inputs], Tout=tf.uint8)

    return inputs, labels


def get_dataset(
        dataset_path: str,
        subset_type: str,
        augmentation_pipeline: a.Compose,
        validation_fraction: float = 0.2,
        batch_size: int = 32,
        image_size: Tuple[int, int] = (300, 300),
        seed: int = 42
) -> tf.data.Dataset:
    augmentation_func = partial(
        augment_image,
        augmentation_pipeline=augmentation_pipeline,
    )

    dataset = image_dataset_from_directory(
        dataset_path,
        subset=subset_type,
        class_names=class_names,
        validation_split=validation_fraction,
        image_size=image_size,
        batch_size=batch_size,
        seed=seed,
    )

    return dataset \
        .map(augmentation_func, num_parallel_calls=AUTOTUNE) \
        .prefetch(AUTOTUNE)

Issue Analytics

State:
Created 2 years ago
Comments:19 (3 by maintainers)

Top GitHub Comments

1reaction

Dipetcommented, May 31, 2021

Looks good. I think current differences associated with the instability of algorithms and hardware.

1reaction

BloodAxecommented, May 27, 2021

Hmm. All of a sudden, this issue starts looking more interesting than at the beginning.

Чт, 27 мая 2021 г. в 11:57, Roman Glushko @.***>:

@Dipet https://github.com/Dipet sure, all tests were performed with the following configuration of augmentation pipeline:

args[‘train_augmentation’] = a.Compose([ a.VerticalFlip(), a.HorizontalFlip(), a.RandomBrightnessContrast(brightness_limit=0.2, contrast_limit=0.1, brightness_by_max=False), a.CoarseDropout(max_holes=20, max_height=8, max_width=8, min_holes=10, min_height=8, min_width=8), a.GaussNoise(p=1.0, var_limit=(10.0, 50.0)), ]) args[‘validation_augmentation’] = a.Compose([])

I kept validation step augmentation-free as @BloodAxe https://github.com/BloodAxe suggested above.

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/albumentations-team/albumentations/issues/906#issuecomment-849462738, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAEB6YDBTXNOLU5VJ6VVWLTTPYCPFANCNFSM45LQYTDQ .

Top Results From Across the Web

I got error trying to use albumentations on tensorflow data ...

Im pretty new at deep learning and tensorflow, then when i try to use albumentations on tensorflow data pipeline, this error occurs (i...

How to save and load parameters of an augmentation pipeline

Albumentations has built-in functionality to serialize the augmentation parameters and save them. Then you can use those parameters to recreate an augmentation ...

Albumentations: Fast and Flexible Image Augmentations - MDPI

Data augmentation is a commonly used technique for increasing both the size and the diversity of labeled training sets by leveraging input transformations ......

image_dataset_from_directory get training set - You.com

Bug. I could not get my training work in reproducible way when albumentations added to the data pipeline. I followed this thread https://github.com/ ......

TorchIO: A Python library for efficient loading, preprocessing ...

Several computer vision libraries supporting data augmentation have appeared recently, such as Albumentations, or imgaug. PyTorch also includes some ...