Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Verify Image Augmentation in Training Set

See original GitHub issue

❓ What is the best way to verify/visualize images are being augmented from custom dataloader (in Google Colab)?

Following this tutorial: https://detectron2.readthedocs.io/tutorials/data_loading.html, I wrote a custom dataloader. I am working with X-ray images, so my goal is to jitter the lighting parameters (contrast and brightness) as would be expected in X-rays. I wrote the dataloader like so:

def custom_mapper(dataset_dict):
    # Implement a mapper, similar to the default DatasetMapper, but with your own customizations
    dataset_dict = copy.deepcopy(dataset_dict)  # it will be modified by code below
    image = utils.read_image(dataset_dict["file_name"], format="RGB") # should be black and white, cause its X-ray?

    # Brightness and contrast adjustments <<<<< MODIFIED THIS
    image, transforms = T.apply_transform_gens([T.RandomBrightness(0.9,1.1), T.RandomContrast(0.9, 1.1)], image)

    dataset_dict["image"] = torch.as_tensor(image.transpose(2, 0, 1).astype("float32"))

    annos = [
        utils.transform_instance_annotations(obj, transforms, image.shape[1:])
        for obj in dataset_dict.pop("annotations")
        if obj.get("iscrowd", 0) == 0
    ]
    instances = utils.annotations_to_instances(annos, image.shape[1:])
    dataset_dict["instances"] = utils.filter_empty_instances(instances)
    return dataset_dict

After defining the mapper, I opened the defaults.py file, and replaced line 430 with this:

return build_detection_train_loader(cfg, mapper=custom_mapper(cfg, True))

After saving this file, building my cfg file and running the default training process:

trainer = DefaultTrainer(cfg)
trainer.resume_or_load(resume=False)
trainer.train()

I expect for my model to list the brightness and contrast augmentations under TransformGens, but it only shows ResizeShortestEdge and RandomFlip:

WARNING [02/03 03:14:26 d2.data.datasets.coco]: 
Category ids in annotations are not in [1, #categories]! We'll apply a mapping for you.

[02/03 03:14:26 d2.data.datasets.coco]: Loaded 213 images in COCO format from /gdrive/My Drive/Research/Rib Fracture/LDR1_TV/Annotations/Train.json
[02/03 03:14:26 d2.data.build]: Removed 0 images with no usable annotations. 213 images left.
[02/03 03:14:26 d2.data.detection_utils]: TransformGens used in training: [ResizeShortestEdge(short_edge_length=(640, 672, 704, 736, 768, 800), max_size=1333, sample_style='choice'), RandomFlip()]
[02/03 03:14:26 d2.data.build]: Using training sampler TrainingSampler

My first question is what am I doing wrong?

Second, how can I verify that my images are in fact being augmented? I took the course fast.ai and found they had a built in method that calls and visualizes training set images, first pulling them through the image augmentation pipeline. Is there something similar here?

Issue Analytics

State:
Created 4 years ago
Comments:9

Top GitHub Comments

4reactions

ncatecommented, Apr 7, 2020

Thanks for the info. I’m having a little trouble using the logger. I tried to do it the same way as in the build_transform_gen function in detectron2 but it isn’t behaving the way I expected.

I wrote a custom data mapper using this code:

def custom_mapper(input_dict):
    dataset_dict = copy.deepcopy(input_dict)  # it will be modified by code below
    image = utils.read_image(dataset_dict["file_name"], format="BGR")
    transform_list = [T.Resize((1200,1200)),
                      T.RandomFlip(prob=0.6, horizontal=True, vertical=False),
                      T.RandomFlip(prob=0.6, horizontal=False, vertical=True),
                      T.RandomContrast(0.7, 3.2),
                      T.RandomBrightness(0.6, 1.8),
                      ]
    image, transforms = T.apply_transform_gens(transform_list, image)
    dataset_dict["image"] = torch.as_tensor(image.transpose(2, 0, 1).astype("float32"))
    annos = [
        utils.transform_instance_annotations(obj, transforms, image.shape[:2])
        for obj in dataset_dict.pop("annotations")
    ]
    instances = utils.annotations_to_instances(annos, image.shape[:2])
    dataset_dict["instances"] = utils.filter_empty_instances(instances)
    return dataset_dict

However, when I run the training session I still get:

[04/07 19:07:44 d2.data.detection_utils]: TransformGens used in training: [ResizeShortestEdge(short_edge_length=(800,), max_size=1333, sample_style='choice'), RandomFlip()]

Which makes it seem like the custom mapper isn’t being called. But as a test I put a print statement inside the custom mapper and that prints out during the training, which makes it seem like it is being called. So then I thought I might need to update the log, maybe that was the issue. So I added import logging to the top of the cell. Then I added the lines:

logger = logging.getLogger(__name__)
logger.info("TranformGens used in training: "+ str(transform_list))

But still it tells me that the TransformGens being applied are just ResizeShortestEdge and RandomFlip. Am I just updating the logger incorrectly? Where am I going wrong? Thanks

3reactions

ppwwyyxxcommented, Feb 3, 2020

You can loop over the data loader with for data in data_loader and visualize them. tools/visualize_data.py is an example.

I guess your custom code is not executed. But without instructions of everything you did (in the form of git diff) I cannot tell.

Top Results From Across the Web

Data augmentation in test/validation set? - Stack Overflow

Only on training. Data augmentation is used to increase the size of the training set and to get more different images.

Image Data Augmentation for Deep Learning | by Wei-Meng Lee

Image data augmentation is a technique that creates new images from existing ones. To do that, you make some small changes to them,...

Data augmentation on training set only? - Cross Validated

Running the augmentation procedure against test data is not to make the test data bigger/more accurate, but just to make the input data...

Image Augmentation for Deep Learning with Keras

Think of the augmented images as randomly modified versions of your training dataset. You have a new dataset with lots of variations of...

Data Augmentation | Baeldung on Computer Science

3. Data Augmentation on Test, Validation, and Train Sets ... The most common practice is to apply data augmentation only to the training...

Troubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.

Start Free

Top Related Reddit Thread

No results found

Top Related Tweet

No results found

Top Related Dev.to Post

No results found

Verify Image Augmentation in Training Set

❓ What is the best way to verify/visualize images are being augmented from custom dataloader (in Google Colab)?

Issue Analytics

Top GitHub Comments

Top Results From Across the Web

Top Related Medium Post

Top Related StackOverflow Question

Troubleshoot Live Code

Top Related Reddit Thread

Top Related Hackernoon Post

Top Related Tweet

Top Related Dev.to Post

Top Related Hashnode Post

How do I compute validation loss during training?

Exif metadata causes training to fail with SizeMismatchError