question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Unexpected Stack Trace when training w EfficientDet

See original GitHub issue

🐛 Bug

Unexpected Stack Trace when training w EfficientDet not in Faster-RCNN

Trying to see my previously reported training issue is fixed by #465. Unfortunately another problem cropped up downstream.

To Reproduce Steps to reproduce the behavior, run this gist https://gist.github.com/bguan/dbc36933e5a56014f8bdd19da7ede481

Expected behavior Training should run till the end without issue.

Error Instead the following Stack Trace happens

---------------------------------------------------------------------------
error                                     Traceback (most recent call last)
/usr/local/lib/python3.8/dist-packages/fastai/learner.py in _with_events(self, f, event_type, ex, final)
    154     def _with_events(self, f, event_type, ex, final=noop):
--> 155         try:       self(f'before_{event_type}')       ;f()
    156         except ex: self(f'after_cancel_{event_type}')

/usr/local/lib/python3.8/dist-packages/fastai/learner.py in _do_epoch(self)
    190     def _do_epoch(self):
--> 191         self._do_epoch_train()
    192         self._do_epoch_validate()

/usr/local/lib/python3.8/dist-packages/fastai/learner.py in _do_epoch_train(self)
    182         self.dl = self.dls.train
--> 183         self._with_events(self.all_batches, 'train', CancelTrainException)
    184 

/usr/local/lib/python3.8/dist-packages/fastai/learner.py in _with_events(self, f, event_type, ex, final)
    154     def _with_events(self, f, event_type, ex, final=noop):
--> 155         try:       self(f'before_{event_type}')       ;f()
    156         except ex: self(f'after_cancel_{event_type}')

/usr/local/lib/python3.8/dist-packages/fastai/learner.py in all_batches(self)
    160         self.n_iter = len(self.dl)
--> 161         for o in enumerate(self.dl): self.one_batch(*o)
    162 

/usr/local/lib/python3.8/dist-packages/fastai/data/load.py in __iter__(self)
    101         self.__idxs=self.get_idxs() # called in context of main process (not workers/subprocesses)
--> 102         for b in _loaders[self.fake_l.num_workers==0](self.fake_l):
    103             if self.device is not None: b = to_device(b, self.device)

~/.local/lib/python3.8/site-packages/torch/utils/data/dataloader.py in __next__(self)
    362     def __next__(self):
--> 363         data = self._next_data()
    364         self._num_yielded += 1

~/.local/lib/python3.8/site-packages/torch/utils/data/dataloader.py in _next_data(self)
    970                 data = self._task_info.pop(self._rcvd_idx)[1]
--> 971                 return self._process_data(data)
    972 

~/.local/lib/python3.8/site-packages/torch/utils/data/dataloader.py in _process_data(self, data)
   1013         if isinstance(data, ExceptionWrapper):
-> 1014             data.reraise()
   1015         return data

~/.local/lib/python3.8/site-packages/torch/_utils.py in reraise(self)
    394             msg = KeyErrorMessage(msg)
--> 395         raise self.exc_type(msg)

error: Caught error in DataLoader worker process 3.
Original Traceback (most recent call last):
  File "/home/brian/.local/lib/python3.8/site-packages/torch/utils/data/_utils/worker.py", line 185, in _worker_loop
    data = fetcher.fetch(index)
  File "/home/brian/.local/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 34, in fetch
    data = next(self.dataset_iter)
  File "/usr/local/lib/python3.8/dist-packages/fastai/data/load.py", line 111, in create_batches
    yield from map(self.do_batch, self.chunkify(res))
  File "/usr/local/lib/python3.8/dist-packages/fastcore/utils.py", line 381, in chunked
    res = list(itertools.islice(it, chunk_sz))
  File "/usr/local/lib/python3.8/dist-packages/fastai/data/load.py", line 124, in do_item
    try: return self.after_item(self.create_item(s))
  File "/usr/local/lib/python3.8/dist-packages/fastai/data/load.py", line 130, in create_item
    def create_item(self, s):  return next(self.it) if s is None else self.dataset[s]
  File "/usr/local/lib/python3.8/dist-packages/icevision/data/dataset.py", line 38, in __getitem__
    data = self.tfm(data)
  File "/usr/local/lib/python3.8/dist-packages/icevision/tfms/transform.py", line 13, in __call__
    tfmed = self.apply(**data)
  File "/usr/local/lib/python3.8/dist-packages/icevision/tfms/albumentations/tfms.py", line 110, in apply
    d = self.tfms(**params)
  File "/home/brian/.local/lib/python3.8/site-packages/albumentations/core/composition.py", line 176, in __call__
    data = t(force_apply=force_apply, **data)
  File "/home/brian/.local/lib/python3.8/site-packages/albumentations/core/composition.py", line 240, in __call__
    return self.transforms[0](force_apply=True, **data)
  File "/home/brian/.local/lib/python3.8/site-packages/albumentations/core/transforms_interface.py", line 87, in __call__
    return self.apply_with_params(params, **kwargs)
  File "/home/brian/.local/lib/python3.8/site-packages/albumentations/core/transforms_interface.py", line 100, in apply_with_params
    res[key] = target_function(arg, **dict(params, **target_dependencies))
  File "/home/brian/.local/lib/python3.8/site-packages/albumentations/augmentations/transforms.py", line 982, in apply
    return F.resize(crop, self.height, self.width, interpolation)
  File "/home/brian/.local/lib/python3.8/site-packages/albumentations/augmentations/functional.py", line 70, in wrapped_function
    result = func(img, *args, **kwargs)
  File "/home/brian/.local/lib/python3.8/site-packages/albumentations/augmentations/functional.py", line 211, in resize
    return resize_fn(img)
  File "/home/brian/.local/lib/python3.8/site-packages/albumentations/augmentations/functional.py", line 188, in __process_fn
    img = process_fn(img, **kwargs)
cv2.error: OpenCV(4.4.0) /tmp/pip-req-build-vu_aq9yd/opencv/modules/imgproc/src/resize.cpp:3929: error: (-215:Assertion failed) !ssize.empty() in function 'resize'



During handling of the above exception, another exception occurred:

IndexError                                Traceback (most recent call last)
<ipython-input-18-dba7405c6bba> in <module>
      1 min_lr, epochs, freeze_epochs = 5e-2, 300, 20
      2 print(f"Running with image size {size} for {freeze_epochs}+{epochs} epochs at min LR {min_lr}")
----> 3 learn.fine_tune(epochs, min_lr, freeze_epochs=freeze_epochs)

/usr/local/lib/python3.8/dist-packages/fastcore/logargs.py in _f(*args, **kwargs)
     54         init_args.update(log)
     55         setattr(inst, 'init_args', init_args)
---> 56         return inst if to_return else f(*args, **kwargs)
     57     return _f

/usr/local/lib/python3.8/dist-packages/fastai/callback/schedule.py in fine_tune(self, epochs, base_lr, freeze_epochs, lr_mult, pct_start, div, **kwargs)
    159     "Fine tune with `freeze` for `freeze_epochs` then with `unfreeze` from `epochs` using discriminative LR"
    160     self.freeze()
--> 161     self.fit_one_cycle(freeze_epochs, slice(base_lr), pct_start=0.99, **kwargs)
    162     base_lr /= 2
    163     self.unfreeze()

/usr/local/lib/python3.8/dist-packages/fastcore/logargs.py in _f(*args, **kwargs)
     54         init_args.update(log)
     55         setattr(inst, 'init_args', init_args)
---> 56         return inst if to_return else f(*args, **kwargs)
     57     return _f

/usr/local/lib/python3.8/dist-packages/fastai/callback/schedule.py in fit_one_cycle(self, n_epoch, lr_max, div, div_final, pct_start, wd, moms, cbs, reset_opt)
    111     scheds = {'lr': combined_cos(pct_start, lr_max/div, lr_max, lr_max/div_final),
    112               'mom': combined_cos(pct_start, *(self.moms if moms is None else moms))}
--> 113     self.fit(n_epoch, cbs=ParamScheduler(scheds)+L(cbs), reset_opt=reset_opt, wd=wd)
    114 
    115 # Cell

/usr/local/lib/python3.8/dist-packages/fastcore/logargs.py in _f(*args, **kwargs)
     54         init_args.update(log)
     55         setattr(inst, 'init_args', init_args)
---> 56         return inst if to_return else f(*args, **kwargs)
     57     return _f

/usr/local/lib/python3.8/dist-packages/fastai/learner.py in fit(self, n_epoch, lr, wd, cbs, reset_opt)
    205             self.opt.set_hypers(lr=self.lr if lr is None else lr)
    206             self.n_epoch = n_epoch
--> 207             self._with_events(self._do_fit, 'fit', CancelFitException, self._end_cleanup)
    208 
    209     def _end_cleanup(self): self.dl,self.xb,self.yb,self.pred,self.loss = None,(None,),(None,),None,None

/usr/local/lib/python3.8/dist-packages/fastai/learner.py in _with_events(self, f, event_type, ex, final)
    153 
    154     def _with_events(self, f, event_type, ex, final=noop):
--> 155         try:       self(f'before_{event_type}')       ;f()
    156         except ex: self(f'after_cancel_{event_type}')
    157         finally:   self(f'after_{event_type}')        ;final()

/usr/local/lib/python3.8/dist-packages/fastai/learner.py in _do_fit(self)
    195         for epoch in range(self.n_epoch):
    196             self.epoch=epoch
--> 197             self._with_events(self._do_epoch, 'epoch', CancelEpochException)
    198 
    199     @log_args(but='cbs')

/usr/local/lib/python3.8/dist-packages/fastai/learner.py in _with_events(self, f, event_type, ex, final)
    155         try:       self(f'before_{event_type}')       ;f()
    156         except ex: self(f'after_cancel_{event_type}')
--> 157         finally:   self(f'after_{event_type}')        ;final()
    158 
    159     def all_batches(self):

/usr/local/lib/python3.8/dist-packages/fastai/learner.py in __call__(self, event_name)
    131     def ordered_cbs(self, event): return [cb for cb in sort_by_run(self.cbs) if hasattr(cb, event)]
    132 
--> 133     def __call__(self, event_name): L(event_name).map(self._call_one)
    134 
    135     def _call_one(self, event_name):

/usr/local/lib/python3.8/dist-packages/fastcore/foundation.py in map(self, f, *args, **kwargs)
    270              else f.format if isinstance(f,str)
    271              else f.__getitem__)
--> 272         return self._new(map(g, self))
    273 
    274     def filter(self, f, negate=False, **kwargs):

/usr/local/lib/python3.8/dist-packages/fastcore/foundation.py in _new(self, items, *args, **kwargs)
    216     @property
    217     def _xtra(self): return None
--> 218     def _new(self, items, *args, **kwargs): return type(self)(items, *args, use_list=None, **kwargs)
    219     def __getitem__(self, idx): return self._get(idx) if is_indexer(idx) else L(self._get(idx), use_list=None)
    220     def copy(self): return self._new(self.items.copy())

/usr/local/lib/python3.8/dist-packages/fastcore/foundation.py in __call__(cls, x, *args, **kwargs)
    197     def __call__(cls, x=None, *args, **kwargs):
    198         if not args and not kwargs and x is not None and isinstance(x,cls): return x
--> 199         return super().__call__(x, *args, **kwargs)
    200 
    201 # Cell

/usr/local/lib/python3.8/dist-packages/fastcore/foundation.py in __init__(self, items, use_list, match, *rest)
    207         if items is None: items = []
    208         if (use_list is not None) or not _is_array(items):
--> 209             items = list(items) if use_list else _listify(items)
    210         if match is not None:
    211             if is_coll(match): match = len(match)

/usr/local/lib/python3.8/dist-packages/fastcore/foundation.py in _listify(o)
    114     if isinstance(o, list): return o
    115     if isinstance(o, str) or _is_array(o): return [o]
--> 116     if is_iter(o): return list(o)
    117     return [o]
    118 

/usr/local/lib/python3.8/dist-packages/fastcore/foundation.py in __call__(self, *args, **kwargs)
    177             if isinstance(v,_Arg): kwargs[k] = args.pop(v.i)
    178         fargs = [args[x.i] if isinstance(x, _Arg) else x for x in self.pargs] + args[self.maxi+1:]
--> 179         return self.fn(*fargs, **kwargs)
    180 
    181 # Cell

/usr/local/lib/python3.8/dist-packages/fastai/learner.py in _call_one(self, event_name)
    135     def _call_one(self, event_name):
    136         assert hasattr(event, event_name), event_name
--> 137         [cb(event_name) for cb in sort_by_run(self.cbs)]
    138 
    139     def _bn_bias_state(self, with_bias): return norm_bias_params(self.model, with_bias).map(self.opt.state)

/usr/local/lib/python3.8/dist-packages/fastai/learner.py in <listcomp>(.0)
    135     def _call_one(self, event_name):
    136         assert hasattr(event, event_name), event_name
--> 137         [cb(event_name) for cb in sort_by_run(self.cbs)]
    138 
    139     def _bn_bias_state(self, with_bias): return norm_bias_params(self.model, with_bias).map(self.opt.state)

/usr/local/lib/python3.8/dist-packages/fastai/callback/core.py in __call__(self, event_name)
     42                (self.run_valid and not getattr(self, 'training', False)))
     43         res = None
---> 44         if self.run and _run: res = getattr(self, event_name, noop)()
     45         if event_name=='after_fit': self.run=True #Reset self.run to True at each end of fit
     46         return res

/usr/local/lib/python3.8/dist-packages/fastai/callback/tracker.py in after_epoch(self)
     79         if self.every_epoch: self._save(f'{self.fname}_{self.epoch}')
     80         else: #every improvement
---> 81             super().after_epoch()
     82             if self.new_best:
     83                 print(f'Better model found at epoch {self.epoch} with {self.monitor} value: {self.best}.')

/usr/local/lib/python3.8/dist-packages/fastai/callback/tracker.py in after_epoch(self)
     37     def after_epoch(self):
     38         "Compare the last value to the best up to now"
---> 39         val = self.recorder.values[-1][self.idx]
     40         if self.comp(val - self.min_delta, self.best): self.best,self.new_best = val,True
     41         else: self.new_best = False

/usr/local/lib/python3.8/dist-packages/fastcore/foundation.py in __getitem__(self, idx)
    217     def _xtra(self): return None
    218     def _new(self, items, *args, **kwargs): return type(self)(items, *args, use_list=None, **kwargs)
--> 219     def __getitem__(self, idx): return self._get(idx) if is_indexer(idx) else L(self._get(idx), use_list=None)
    220     def copy(self): return self._new(self.items.copy())
    221 

/usr/local/lib/python3.8/dist-packages/fastcore/foundation.py in _get(self, i)
    221 
    222     def _get(self, i):
--> 223         if is_indexer(i) or isinstance(i,slice): return getattr(self.items,'iloc',self.items)[i]
    224         i = mask2idxs(i)
    225         return (self.items.iloc[list(i)] if hasattr(self.items,'iloc')

IndexError: list index out of range

Environment:

  • OS: ubuntu 20.04
  • Python ver 3.8.2 [GCC 9.3.0]
  • torch 1.6.0, torchvision 0.7.0
  • fastai 2.0.13
  • icevision 0.1.6

Additional context If switch to using Faster-RCNN this will run to completion.

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:6 (3 by maintainers)

github_iconTop GitHub Comments

3reactions
bguancommented, Oct 5, 2020

It turns out that IceVision relies on Albumentation’s default image augmentation which invokes ShiftScaleRotate with shift_limit=0.0625 and scale_limit=0.1 probabilistically so sometimes small boxes may have effective area of 0, or boxes may be pushed to beyond the borders, leading to invalid or 0 area boxes.

My quick fix is to have logic in my custom parser to filter these risky boxes but. Leaving my code fix here so others may find idea for their own dataset should they counter the same problem:


def box_within_bounds(x, y, w, h, width, height, min_margin_ratio, min_width_height_ratio):
    min_width = min_width_height_ratio*width
    min_height = min_width_height_ratio*height
    if w < min_width or h < min_height:
        return False
    top_margin = min_margin_ratio*height
    bottom_margin = height - top_margin
    left_margin = min_margin_ratio*width
    right_margin = width - left_margin
    if x < left_margin or x > right_margin:
        return False
    if y < top_margin or y > bottom_margin:
        return False
    return True

class SubCocoParser(Parser, LabelsMixin, BBoxesMixin, FilepathMixin, SizeMixin):
    def __init__(self, stats:CocoDatasetStats, min_margin_ratio = 0.15, min_width_height_ratio = 0.1, quiet = True):
        self.stats = stats
        self.data = [] # list of tuple of form (img_id, wth, ht, bbox, label_id, img_path)
        skipped = 0
        for img_id, imgfname in stats.img2fname.items():
            imgf = stats.img_dir/imgfname
            width, height = stats.img2sz[img_id]
            bboxs = []
            lids = []
            for lid, x, y, w, h in stats.img2lbs[img_id]:
                if lid != None and box_within_bounds(x, y, w, h, width, height, min_margin_ratio, min_width_height_ratio): 
                    b = [int(x), int(y), int(w), int(h)]
                    l = int(lid)
                    bboxs.append(b)
                    lids.append(l)
                else:
                    if not quiet: print(f"warning: skipping lxywh of {lid, x, y, w, h}")
            
            if len(bboxs) > 0:
                self.data.append( (img_id, width, height, bboxs, lids, imgf, ) )
            else:
                skipped += 1
                
        print(f"Skipped {skipped} out of {stats.num_imgs} images")

...

@lgvaz is working on a solution at IceVision level that may do something similar.

2reactions
bguancommented, Nov 5, 2020

I think it is due to albumentation scaling again making boxes too small. May be you can enhance autofix by making sure boxes are at least 1x1 pixel?

Read more comments on GitHub >

github_iconTop Results From Across the Web

How do I get a full stack trace instead of "Unexpected error ...
When starting my rails server in test environment ( rails s -e test ), something is rescuing exceptions and outputs instead "Unexpected error...
Read more >
FiftyOne Release Notes - Voxel51
Fixed an issue that could arise when loading a group dataset with sparse alternate media ... Added stack traces to the new error...
Read more >
Release Notes for Intel® Distribution of OpenVINO™ toolkit ...
Introduction. The Intel® Distribution of OpenVINO™ toolkit is a comprehensive toolkit for quickly developing applications and solutions that solve a variety ...
Read more >
Unexpected Error java.lang.StackOverflowError (SCI70534)
<<no stack trace available>> ERROR [JMSContainerInvoker] Exception in JMSCI message listener javax.ejb.EJBException: Unexpected Error
Read more >
ABOships—An Inshore and Offshore Maritime Vessel ...
of inshore and offshore maritime vessels are no exception, with a ... except in the large object category where EfficientDet surpasses.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found