Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Unable to replicate balloon training result

See original GitHub issue

Code to reproduce result:

Tried to test the training of a model via the balloon example (https://github.com/facebookresearch/d2go/blob/master/demo/d2go_beginner.ipynb)

import os
import json
import numpy as np
from detectron2.structures import BoxMode
from detectron2.data import MetadataCatalog, DatasetCatalog
import cv2

def get_balloon_dicts(img_dir):
    json_file = os.path.join(img_dir, "via_region_data.json")
    with open(json_file) as f:
        imgs_anns = json.load(f)

    dataset_dicts = []
    for idx, v in enumerate(imgs_anns.values()):
        record = {}
        
        filename = os.path.join(img_dir, v["filename"])
        height, width = cv2.imread(filename).shape[:2]
        
        record["file_name"] = filename
        record["image_id"] = idx
        record["height"] = height
        record["width"] = width
      
        annos = v["regions"]
        objs = []
        for _, anno in annos.items():
            assert not anno["region_attributes"]
            anno = anno["shape_attributes"]
            px = anno["all_points_x"]
            py = anno["all_points_y"]
            poly = [(x + 0.5, y + 0.5) for x, y in zip(px, py)]
            poly = [p for x in poly for p in x]

            obj = {
                "bbox": [np.min(px), np.min(py), np.max(px), np.max(py)],
                "bbox_mode": BoxMode.XYXY_ABS,
                "segmentation": [poly],
                "category_id": 0,
            }
            objs.append(obj)
        record["annotations"] = objs
        dataset_dicts.append(record)
    return dataset_dicts

for d in ["train", "val"]:
    DatasetCatalog.register("balloon_" + d, lambda d=d: get_balloon_dicts("balloon/" + d))
    MetadataCatalog.get("balloon_" + d).set(thing_classes=["balloon"], evaluator_type="coco")

balloon_metadata = MetadataCatalog.get("balloon_train")

from d2go.runner import Detectron2GoRunner
from d2go.model_zoo import model_zoo

def prepare_for_launch():
    runner = Detectron2GoRunner()
    cfg = runner.get_default_cfg()
    cfg.merge_from_file(model_zoo.get_config_file("faster_rcnn_fbnetv3a_C4.yaml"))
    cfg.MODEL_EMA.ENABLED = False
    cfg.DATASETS.TRAIN = ("balloon_train",)
    cfg.DATASETS.TEST = ("balloon_val",)
    cfg.DATALOADER.NUM_WORKERS = 2
    cfg.MODEL.WEIGHTS = model_zoo.get_checkpoint_url("faster_rcnn_fbnetv3a_C4.yaml")  # Let training initialize from model zoo
    cfg.SOLVER.IMS_PER_BATCH = 2
    cfg.SOLVER.BASE_LR = 0.00025  # pick a good LR
    cfg.SOLVER.MAX_ITER = 600    # 600 iterations seems good enough for this toy dataset; you will need to train longer for a practical dataset
    cfg.SOLVER.STEPS = []        # do not decay learning rate
    cfg.MODEL.ROI_HEADS.BATCH_SIZE_PER_IMAGE = 128   # faster, and good enough for this toy dataset (default: 512)
    cfg.MODEL.ROI_HEADS.NUM_CLASSES = 1  # only has one class (ballon). (see https://detectron2.readthedocs.io/tutorials/datasets.html#update-the-config-for-new-datasets)
    cfg.OUTPUT_DIR = 'balloon_model'
    # NOTE: this config means the number of classes, but a few popular unofficial tutorials incorrect uses num_classes+1 here.
    os.makedirs(cfg.OUTPUT_DIR, exist_ok=True)
    return cfg, runner

cfg, runner = prepare_for_launch()
model = runner.build_model(cfg)
runner.do_train(cfg, model, resume=False)

cfg.MODEL.WEIGHTS = 'balloon_model/model_final.pth'
metrics = runner.do_test(cfg, model)
print(metrics)

Result

 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.021
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.061
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.012
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.001
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.041
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.004
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.118
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.204
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.012
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.333

And the inference results on test images are terrible.

Expected Result

 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.494
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.651
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.543
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.104
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.757
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.204
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.526
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.526
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.118
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.810

Issue Analytics

State:
Created 2 years ago
Reactions:4
Comments:14 (4 by maintainers)

Top GitHub Comments

4reactions

AaronReidCIcommented, May 17, 2021

Yes - I followed it exactly. My code is above for you to cut & paste. I get a terrible model which doesn’t exhibit the statistics of yours. The inference cases are really bad/random.

I am writing this as an issue as I believe that something is broken in the training. Perhaps it’s just a hyper-parameter setting…

I can reproduce all the other d2go pre-trained inference examples, but the newly trained models are bad.

2reactions

Yaoxingtiancommented, Jun 16, 2021

@AaronReidCI i get it ! by setting para as below: cfg.MODEL.WEIGHTS = ‘./weghts/model_0479999.pth’ # download weights directly cfg.SOLVER.IMS_PER_BATCH = 2 cfg.SOLVER.BASE_LR = 0.00025 # pick a good LR cfg.SOLVER.MAX_ITER = 3000 # set more iter times cfg.MODEL.ROI_HEADS.BATCH_SIZE_PER_IMAGE =16 # this is my computer’s capacity

result shows: The checkpoint state_dict contains keys that are not used by the model: pixel_mean pixel_std proposal_generator.anchor_generator.cell_anchors.0 Loading and preparing results… DONE (t=0.00s) creating index… index created! Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.494 Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.662 Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.572 Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000 Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.061 Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.756 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.202 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.526 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.526 Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000 Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.141 Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.797

but i’am still confused, such a small datasets , train 3000 times, 16 bs, but the result is still not high!

Top Results From Across the Web

How to Perform and Interpret Balloon Expulsion Test - PMC

The balloon expulsion test is a simple and useful method for investigating a defecatory disorder assessing the subject's ability to evacuate a simulated ......

Not able to reproduce detection of balloons or shapes; stuck ...

I tried to reproduce the balloon and the shapes detection by running balloon.py and train_shapes.ipynb. ... If I just display the data with...

Intra-aortic Balloon Pump Trouble-shooting - LITFL.com

Your ICU patient with the intra-aortic balloon pump (IABP) from Cardiovascular Curveball 005 is having a very bad night. His IABP keeps playing...

Balloon Flying Handbook.indb - FAA

This book introduces the prospective pilot to the realm of balloon flight and provides information and guidance to all balloon pilots in the...

Basics of Yolo v5 - Balloon Detection - Kaggle

One thing I found confusing is that the results of each training (and ... Copy the yolov5 folder from the dataset to /kaggle/working/...