question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Verify Image Augmentation in Training Set

See original GitHub issue

❓ What is the best way to verify/visualize images are being augmented from custom dataloader (in Google Colab)?

Following this tutorial: https://detectron2.readthedocs.io/tutorials/data_loading.html, I wrote a custom dataloader. I am working with X-ray images, so my goal is to jitter the lighting parameters (contrast and brightness) as would be expected in X-rays. I wrote the dataloader like so:

def custom_mapper(dataset_dict):
    # Implement a mapper, similar to the default DatasetMapper, but with your own customizations
    dataset_dict = copy.deepcopy(dataset_dict)  # it will be modified by code below
    image = utils.read_image(dataset_dict["file_name"], format="RGB") # should be black and white, cause its X-ray?

    # Brightness and contrast adjustments <<<<< MODIFIED THIS
    image, transforms = T.apply_transform_gens([T.RandomBrightness(0.9,1.1), T.RandomContrast(0.9, 1.1)], image)

    dataset_dict["image"] = torch.as_tensor(image.transpose(2, 0, 1).astype("float32"))

    annos = [
        utils.transform_instance_annotations(obj, transforms, image.shape[1:])
        for obj in dataset_dict.pop("annotations")
        if obj.get("iscrowd", 0) == 0
    ]
    instances = utils.annotations_to_instances(annos, image.shape[1:])
    dataset_dict["instances"] = utils.filter_empty_instances(instances)
    return dataset_dict

After defining the mapper, I opened the defaults.py file, and replaced line 430 with this:

return build_detection_train_loader(cfg, mapper=custom_mapper(cfg, True))

After saving this file, building my cfg file and running the default training process:

trainer = DefaultTrainer(cfg)
trainer.resume_or_load(resume=False)
trainer.train()

I expect for my model to list the brightness and contrast augmentations under TransformGens, but it only shows ResizeShortestEdge and RandomFlip:

WARNING [02/03 03:14:26 d2.data.datasets.coco]: 
Category ids in annotations are not in [1, #categories]! We'll apply a mapping for you.

[02/03 03:14:26 d2.data.datasets.coco]: Loaded 213 images in COCO format from /gdrive/My Drive/Research/Rib Fracture/LDR1_TV/Annotations/Train.json
[02/03 03:14:26 d2.data.build]: Removed 0 images with no usable annotations. 213 images left.
[02/03 03:14:26 d2.data.detection_utils]: TransformGens used in training: [ResizeShortestEdge(short_edge_length=(640, 672, 704, 736, 768, 800), max_size=1333, sample_style='choice'), RandomFlip()]
[02/03 03:14:26 d2.data.build]: Using training sampler TrainingSampler

My first question is what am I doing wrong?

Second, how can I verify that my images are in fact being augmented? I took the course fast.ai and found they had a built in method that calls and visualizes training set images, first pulling them through the image augmentation pipeline. Is there something similar here?

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:9

github_iconTop GitHub Comments

4reactions
ncatecommented, Apr 7, 2020

Thanks for the info. I’m having a little trouble using the logger. I tried to do it the same way as in the build_transform_gen function in detectron2 but it isn’t behaving the way I expected.

I wrote a custom data mapper using this code:

def custom_mapper(input_dict):
    dataset_dict = copy.deepcopy(input_dict)  # it will be modified by code below
    image = utils.read_image(dataset_dict["file_name"], format="BGR")
    transform_list = [T.Resize((1200,1200)),
                      T.RandomFlip(prob=0.6, horizontal=True, vertical=False),
                      T.RandomFlip(prob=0.6, horizontal=False, vertical=True),
                      T.RandomContrast(0.7, 3.2),
                      T.RandomBrightness(0.6, 1.8),
                      ]
    image, transforms = T.apply_transform_gens(transform_list, image)
    dataset_dict["image"] = torch.as_tensor(image.transpose(2, 0, 1).astype("float32"))
    annos = [
        utils.transform_instance_annotations(obj, transforms, image.shape[:2])
        for obj in dataset_dict.pop("annotations")
    ]
    instances = utils.annotations_to_instances(annos, image.shape[:2])
    dataset_dict["instances"] = utils.filter_empty_instances(instances)
    return dataset_dict

However, when I run the training session I still get:

[04/07 19:07:44 d2.data.detection_utils]: TransformGens used in training: [ResizeShortestEdge(short_edge_length=(800,), max_size=1333, sample_style='choice'), RandomFlip()]

Which makes it seem like the custom mapper isn’t being called. But as a test I put a print statement inside the custom mapper and that prints out during the training, which makes it seem like it is being called. So then I thought I might need to update the log, maybe that was the issue. So I added import logging to the top of the cell. Then I added the lines:

logger = logging.getLogger(__name__)
logger.info("TranformGens used in training: "+ str(transform_list))

But still it tells me that the TransformGens being applied are just ResizeShortestEdge and RandomFlip. Am I just updating the logger incorrectly? Where am I going wrong? Thanks

3reactions
ppwwyyxxcommented, Feb 3, 2020

You can loop over the data loader with for data in data_loader and visualize them. tools/visualize_data.py is an example.

I guess your custom code is not executed. But without instructions of everything you did (in the form of git diff) I cannot tell.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Data augmentation in test/validation set? - Stack Overflow
Only on training. Data augmentation is used to increase the size of the training set and to get more different images.
Read more >
Image Data Augmentation for Deep Learning | by Wei-Meng Lee
Image data augmentation is a technique that creates new images from existing ones. To do that, you make some small changes to them,...
Read more >
Data augmentation on training set only? - Cross Validated
Running the augmentation procedure against test data is not to make the test data bigger/more accurate, but just to make the input data...
Read more >
Image Augmentation for Deep Learning with Keras
Think of the augmented images as randomly modified versions of your training dataset. You have a new dataset with lots of variations of...
Read more >
Data Augmentation | Baeldung on Computer Science
3. Data Augmentation on Test, Validation, and Train Sets ... The most common practice is to apply data augmentation only to the training...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found