Unexpected Stack Trace when training w EfficientDet
See original GitHub issue🐛 Bug
Unexpected Stack Trace when training w EfficientDet not in Faster-RCNN
Trying to see my previously reported training issue is fixed by #465. Unfortunately another problem cropped up downstream.
To Reproduce Steps to reproduce the behavior, run this gist https://gist.github.com/bguan/dbc36933e5a56014f8bdd19da7ede481
Expected behavior Training should run till the end without issue.
Error Instead the following Stack Trace happens
---------------------------------------------------------------------------
error Traceback (most recent call last)
/usr/local/lib/python3.8/dist-packages/fastai/learner.py in _with_events(self, f, event_type, ex, final)
154 def _with_events(self, f, event_type, ex, final=noop):
--> 155 try: self(f'before_{event_type}') ;f()
156 except ex: self(f'after_cancel_{event_type}')
/usr/local/lib/python3.8/dist-packages/fastai/learner.py in _do_epoch(self)
190 def _do_epoch(self):
--> 191 self._do_epoch_train()
192 self._do_epoch_validate()
/usr/local/lib/python3.8/dist-packages/fastai/learner.py in _do_epoch_train(self)
182 self.dl = self.dls.train
--> 183 self._with_events(self.all_batches, 'train', CancelTrainException)
184
/usr/local/lib/python3.8/dist-packages/fastai/learner.py in _with_events(self, f, event_type, ex, final)
154 def _with_events(self, f, event_type, ex, final=noop):
--> 155 try: self(f'before_{event_type}') ;f()
156 except ex: self(f'after_cancel_{event_type}')
/usr/local/lib/python3.8/dist-packages/fastai/learner.py in all_batches(self)
160 self.n_iter = len(self.dl)
--> 161 for o in enumerate(self.dl): self.one_batch(*o)
162
/usr/local/lib/python3.8/dist-packages/fastai/data/load.py in __iter__(self)
101 self.__idxs=self.get_idxs() # called in context of main process (not workers/subprocesses)
--> 102 for b in _loaders[self.fake_l.num_workers==0](self.fake_l):
103 if self.device is not None: b = to_device(b, self.device)
~/.local/lib/python3.8/site-packages/torch/utils/data/dataloader.py in __next__(self)
362 def __next__(self):
--> 363 data = self._next_data()
364 self._num_yielded += 1
~/.local/lib/python3.8/site-packages/torch/utils/data/dataloader.py in _next_data(self)
970 data = self._task_info.pop(self._rcvd_idx)[1]
--> 971 return self._process_data(data)
972
~/.local/lib/python3.8/site-packages/torch/utils/data/dataloader.py in _process_data(self, data)
1013 if isinstance(data, ExceptionWrapper):
-> 1014 data.reraise()
1015 return data
~/.local/lib/python3.8/site-packages/torch/_utils.py in reraise(self)
394 msg = KeyErrorMessage(msg)
--> 395 raise self.exc_type(msg)
error: Caught error in DataLoader worker process 3.
Original Traceback (most recent call last):
File "/home/brian/.local/lib/python3.8/site-packages/torch/utils/data/_utils/worker.py", line 185, in _worker_loop
data = fetcher.fetch(index)
File "/home/brian/.local/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 34, in fetch
data = next(self.dataset_iter)
File "/usr/local/lib/python3.8/dist-packages/fastai/data/load.py", line 111, in create_batches
yield from map(self.do_batch, self.chunkify(res))
File "/usr/local/lib/python3.8/dist-packages/fastcore/utils.py", line 381, in chunked
res = list(itertools.islice(it, chunk_sz))
File "/usr/local/lib/python3.8/dist-packages/fastai/data/load.py", line 124, in do_item
try: return self.after_item(self.create_item(s))
File "/usr/local/lib/python3.8/dist-packages/fastai/data/load.py", line 130, in create_item
def create_item(self, s): return next(self.it) if s is None else self.dataset[s]
File "/usr/local/lib/python3.8/dist-packages/icevision/data/dataset.py", line 38, in __getitem__
data = self.tfm(data)
File "/usr/local/lib/python3.8/dist-packages/icevision/tfms/transform.py", line 13, in __call__
tfmed = self.apply(**data)
File "/usr/local/lib/python3.8/dist-packages/icevision/tfms/albumentations/tfms.py", line 110, in apply
d = self.tfms(**params)
File "/home/brian/.local/lib/python3.8/site-packages/albumentations/core/composition.py", line 176, in __call__
data = t(force_apply=force_apply, **data)
File "/home/brian/.local/lib/python3.8/site-packages/albumentations/core/composition.py", line 240, in __call__
return self.transforms[0](force_apply=True, **data)
File "/home/brian/.local/lib/python3.8/site-packages/albumentations/core/transforms_interface.py", line 87, in __call__
return self.apply_with_params(params, **kwargs)
File "/home/brian/.local/lib/python3.8/site-packages/albumentations/core/transforms_interface.py", line 100, in apply_with_params
res[key] = target_function(arg, **dict(params, **target_dependencies))
File "/home/brian/.local/lib/python3.8/site-packages/albumentations/augmentations/transforms.py", line 982, in apply
return F.resize(crop, self.height, self.width, interpolation)
File "/home/brian/.local/lib/python3.8/site-packages/albumentations/augmentations/functional.py", line 70, in wrapped_function
result = func(img, *args, **kwargs)
File "/home/brian/.local/lib/python3.8/site-packages/albumentations/augmentations/functional.py", line 211, in resize
return resize_fn(img)
File "/home/brian/.local/lib/python3.8/site-packages/albumentations/augmentations/functional.py", line 188, in __process_fn
img = process_fn(img, **kwargs)
cv2.error: OpenCV(4.4.0) /tmp/pip-req-build-vu_aq9yd/opencv/modules/imgproc/src/resize.cpp:3929: error: (-215:Assertion failed) !ssize.empty() in function 'resize'
During handling of the above exception, another exception occurred:
IndexError Traceback (most recent call last)
<ipython-input-18-dba7405c6bba> in <module>
1 min_lr, epochs, freeze_epochs = 5e-2, 300, 20
2 print(f"Running with image size {size} for {freeze_epochs}+{epochs} epochs at min LR {min_lr}")
----> 3 learn.fine_tune(epochs, min_lr, freeze_epochs=freeze_epochs)
/usr/local/lib/python3.8/dist-packages/fastcore/logargs.py in _f(*args, **kwargs)
54 init_args.update(log)
55 setattr(inst, 'init_args', init_args)
---> 56 return inst if to_return else f(*args, **kwargs)
57 return _f
/usr/local/lib/python3.8/dist-packages/fastai/callback/schedule.py in fine_tune(self, epochs, base_lr, freeze_epochs, lr_mult, pct_start, div, **kwargs)
159 "Fine tune with `freeze` for `freeze_epochs` then with `unfreeze` from `epochs` using discriminative LR"
160 self.freeze()
--> 161 self.fit_one_cycle(freeze_epochs, slice(base_lr), pct_start=0.99, **kwargs)
162 base_lr /= 2
163 self.unfreeze()
/usr/local/lib/python3.8/dist-packages/fastcore/logargs.py in _f(*args, **kwargs)
54 init_args.update(log)
55 setattr(inst, 'init_args', init_args)
---> 56 return inst if to_return else f(*args, **kwargs)
57 return _f
/usr/local/lib/python3.8/dist-packages/fastai/callback/schedule.py in fit_one_cycle(self, n_epoch, lr_max, div, div_final, pct_start, wd, moms, cbs, reset_opt)
111 scheds = {'lr': combined_cos(pct_start, lr_max/div, lr_max, lr_max/div_final),
112 'mom': combined_cos(pct_start, *(self.moms if moms is None else moms))}
--> 113 self.fit(n_epoch, cbs=ParamScheduler(scheds)+L(cbs), reset_opt=reset_opt, wd=wd)
114
115 # Cell
/usr/local/lib/python3.8/dist-packages/fastcore/logargs.py in _f(*args, **kwargs)
54 init_args.update(log)
55 setattr(inst, 'init_args', init_args)
---> 56 return inst if to_return else f(*args, **kwargs)
57 return _f
/usr/local/lib/python3.8/dist-packages/fastai/learner.py in fit(self, n_epoch, lr, wd, cbs, reset_opt)
205 self.opt.set_hypers(lr=self.lr if lr is None else lr)
206 self.n_epoch = n_epoch
--> 207 self._with_events(self._do_fit, 'fit', CancelFitException, self._end_cleanup)
208
209 def _end_cleanup(self): self.dl,self.xb,self.yb,self.pred,self.loss = None,(None,),(None,),None,None
/usr/local/lib/python3.8/dist-packages/fastai/learner.py in _with_events(self, f, event_type, ex, final)
153
154 def _with_events(self, f, event_type, ex, final=noop):
--> 155 try: self(f'before_{event_type}') ;f()
156 except ex: self(f'after_cancel_{event_type}')
157 finally: self(f'after_{event_type}') ;final()
/usr/local/lib/python3.8/dist-packages/fastai/learner.py in _do_fit(self)
195 for epoch in range(self.n_epoch):
196 self.epoch=epoch
--> 197 self._with_events(self._do_epoch, 'epoch', CancelEpochException)
198
199 @log_args(but='cbs')
/usr/local/lib/python3.8/dist-packages/fastai/learner.py in _with_events(self, f, event_type, ex, final)
155 try: self(f'before_{event_type}') ;f()
156 except ex: self(f'after_cancel_{event_type}')
--> 157 finally: self(f'after_{event_type}') ;final()
158
159 def all_batches(self):
/usr/local/lib/python3.8/dist-packages/fastai/learner.py in __call__(self, event_name)
131 def ordered_cbs(self, event): return [cb for cb in sort_by_run(self.cbs) if hasattr(cb, event)]
132
--> 133 def __call__(self, event_name): L(event_name).map(self._call_one)
134
135 def _call_one(self, event_name):
/usr/local/lib/python3.8/dist-packages/fastcore/foundation.py in map(self, f, *args, **kwargs)
270 else f.format if isinstance(f,str)
271 else f.__getitem__)
--> 272 return self._new(map(g, self))
273
274 def filter(self, f, negate=False, **kwargs):
/usr/local/lib/python3.8/dist-packages/fastcore/foundation.py in _new(self, items, *args, **kwargs)
216 @property
217 def _xtra(self): return None
--> 218 def _new(self, items, *args, **kwargs): return type(self)(items, *args, use_list=None, **kwargs)
219 def __getitem__(self, idx): return self._get(idx) if is_indexer(idx) else L(self._get(idx), use_list=None)
220 def copy(self): return self._new(self.items.copy())
/usr/local/lib/python3.8/dist-packages/fastcore/foundation.py in __call__(cls, x, *args, **kwargs)
197 def __call__(cls, x=None, *args, **kwargs):
198 if not args and not kwargs and x is not None and isinstance(x,cls): return x
--> 199 return super().__call__(x, *args, **kwargs)
200
201 # Cell
/usr/local/lib/python3.8/dist-packages/fastcore/foundation.py in __init__(self, items, use_list, match, *rest)
207 if items is None: items = []
208 if (use_list is not None) or not _is_array(items):
--> 209 items = list(items) if use_list else _listify(items)
210 if match is not None:
211 if is_coll(match): match = len(match)
/usr/local/lib/python3.8/dist-packages/fastcore/foundation.py in _listify(o)
114 if isinstance(o, list): return o
115 if isinstance(o, str) or _is_array(o): return [o]
--> 116 if is_iter(o): return list(o)
117 return [o]
118
/usr/local/lib/python3.8/dist-packages/fastcore/foundation.py in __call__(self, *args, **kwargs)
177 if isinstance(v,_Arg): kwargs[k] = args.pop(v.i)
178 fargs = [args[x.i] if isinstance(x, _Arg) else x for x in self.pargs] + args[self.maxi+1:]
--> 179 return self.fn(*fargs, **kwargs)
180
181 # Cell
/usr/local/lib/python3.8/dist-packages/fastai/learner.py in _call_one(self, event_name)
135 def _call_one(self, event_name):
136 assert hasattr(event, event_name), event_name
--> 137 [cb(event_name) for cb in sort_by_run(self.cbs)]
138
139 def _bn_bias_state(self, with_bias): return norm_bias_params(self.model, with_bias).map(self.opt.state)
/usr/local/lib/python3.8/dist-packages/fastai/learner.py in <listcomp>(.0)
135 def _call_one(self, event_name):
136 assert hasattr(event, event_name), event_name
--> 137 [cb(event_name) for cb in sort_by_run(self.cbs)]
138
139 def _bn_bias_state(self, with_bias): return norm_bias_params(self.model, with_bias).map(self.opt.state)
/usr/local/lib/python3.8/dist-packages/fastai/callback/core.py in __call__(self, event_name)
42 (self.run_valid and not getattr(self, 'training', False)))
43 res = None
---> 44 if self.run and _run: res = getattr(self, event_name, noop)()
45 if event_name=='after_fit': self.run=True #Reset self.run to True at each end of fit
46 return res
/usr/local/lib/python3.8/dist-packages/fastai/callback/tracker.py in after_epoch(self)
79 if self.every_epoch: self._save(f'{self.fname}_{self.epoch}')
80 else: #every improvement
---> 81 super().after_epoch()
82 if self.new_best:
83 print(f'Better model found at epoch {self.epoch} with {self.monitor} value: {self.best}.')
/usr/local/lib/python3.8/dist-packages/fastai/callback/tracker.py in after_epoch(self)
37 def after_epoch(self):
38 "Compare the last value to the best up to now"
---> 39 val = self.recorder.values[-1][self.idx]
40 if self.comp(val - self.min_delta, self.best): self.best,self.new_best = val,True
41 else: self.new_best = False
/usr/local/lib/python3.8/dist-packages/fastcore/foundation.py in __getitem__(self, idx)
217 def _xtra(self): return None
218 def _new(self, items, *args, **kwargs): return type(self)(items, *args, use_list=None, **kwargs)
--> 219 def __getitem__(self, idx): return self._get(idx) if is_indexer(idx) else L(self._get(idx), use_list=None)
220 def copy(self): return self._new(self.items.copy())
221
/usr/local/lib/python3.8/dist-packages/fastcore/foundation.py in _get(self, i)
221
222 def _get(self, i):
--> 223 if is_indexer(i) or isinstance(i,slice): return getattr(self.items,'iloc',self.items)[i]
224 i = mask2idxs(i)
225 return (self.items.iloc[list(i)] if hasattr(self.items,'iloc')
IndexError: list index out of range
Environment:
- OS: ubuntu 20.04
- Python ver 3.8.2 [GCC 9.3.0]
- torch 1.6.0, torchvision 0.7.0
- fastai 2.0.13
- icevision 0.1.6
Additional context If switch to using Faster-RCNN this will run to completion.
Issue Analytics
- State:
- Created 3 years ago
- Comments:6 (3 by maintainers)
Top Results From Across the Web
How do I get a full stack trace instead of "Unexpected error ...
When starting my rails server in test environment ( rails s -e test ), something is rescuing exceptions and outputs instead "Unexpected error...
Read more >FiftyOne Release Notes - Voxel51
Fixed an issue that could arise when loading a group dataset with sparse alternate media ... Added stack traces to the new error...
Read more >Release Notes for Intel® Distribution of OpenVINO™ toolkit ...
Introduction. The Intel® Distribution of OpenVINO™ toolkit is a comprehensive toolkit for quickly developing applications and solutions that solve a variety ...
Read more >Unexpected Error java.lang.StackOverflowError (SCI70534)
<<no stack trace available>> ERROR [JMSContainerInvoker] Exception in JMSCI message listener javax.ejb.EJBException: Unexpected Error
Read more >ABOships—An Inshore and Offshore Maritime Vessel ...
of inshore and offshore maritime vessels are no exception, with a ... except in the large object category where EfficientDet surpasses.
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
It turns out that IceVision relies on Albumentation’s default image augmentation which invokes ShiftScaleRotate with shift_limit=0.0625 and scale_limit=0.1 probabilistically so sometimes small boxes may have effective area of 0, or boxes may be pushed to beyond the borders, leading to invalid or 0 area boxes.
My quick fix is to have logic in my custom parser to filter these risky boxes but. Leaving my code fix here so others may find idea for their own dataset should they counter the same problem:
@lgvaz is working on a solution at IceVision level that may do something similar.
I think it is due to albumentation scaling again making boxes too small. May be you can enhance autofix by making sure boxes are at least 1x1 pixel?