question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

pytorch 1.3 got a used bug

See original GitHub issue

I just upgrade to pytorch1.3 (build from source) previously can training code not working anymore.

/usr/local/lib/python3.5/dist-packages/torch/optim/lr_scheduler.py:82: UserWarning: Detected call of `lr_scheduler.step()` before `optimizer.step()`. In PyTorch 1.1.0 and later, you should call them in the opposite order: `optimizer.step()` before `lr_scheduler.step()`.  Failure to do this will result in PyTorch skipping the first value of the learning rate schedule.See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate
  "https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate", UserWarning)
/pytorch/aten/src/ATen/native/IndexingUtils.h:20: UserWarning: indexing with dtype torch.uint8 is now deprecated, please use a dtype torch.bool instead.
  " please use a dtype torch.bool instead.");
/pytorch/aten/src/ATen/native/IndexingUtils.h:20: UserWarning: indexing with dtype torch.uint8 is now deprecated, please use a dtype torch.bool instead.
  " please use a dtype torch.bool instead.");
/pytorch/aten/src/ATen/native/IndexingUtils.h:20: UserWarning: indexing with dtype torch.uint8 is now deprecated, please use a dtype torch.bool instead.
  " please use a dtype torch.bool instead.");
/pytorch/aten/src/ATen/native/IndexingUtils.h:20: UserWarning: indexing with dtype torch.uint8 is now deprecated, please use a dtype torch.bool instead.
  " please use a dtype torch.bool instead.");
/pytorch/aten/src/ATen/native/IndexingUtils.h:20: UserWarning: indexing with dtype torch.uint8 is now deprecated, please use a dtype torch.bool instead.
  " please use a dtype torch.bool instead.");




maskrcnn-benchmark_local/vendor/maskrcnn-benchmark/maskrcnn_benchmark/engine/trainer.py", line 57, in do_train
    for iteration, (images, targets, _) in enumerate(data_loader, start_iter):
  File "/usr/local/lib/python3.5/dist-packages/torch/utils/data/dataloader.py", line 819, in __next__
    return self._process_data(data)
  File "/usr/local/lib/python3.5/dist-packages/torch/utils/data/dataloader.py", line 846, in _process_data
    data.reraise()
  File "/usr/local/lib/python3.5/dist-packages/torch/_utils.py", line 369, in reraise
    raise self.exc_type(msg)
IndexError: Caught IndexError in DataLoader worker process 0.
Original Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/torch/utils/data/_utils/worker.py", line 178, in _worker_loop
    data = fetcher.fetch(index)
  File "/usr/local/lib/python3.5/dist-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/usr/local/lib/python3.5/dist-packages/torch/utils/data/_utils/fetch.py", line 44, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/usr/local/lib/python3.5/dist-packages/torch/utils/data/dataset.py", line 207, in __getitem__
    return self.datasets[dataset_idx][sample_idx]
  File "//maskrcnn-benchmark_local/vendor/maskrcnn-benchmark/maskrcnn_benchmark/data/datasets/coco.py", line 94, in __getitem__
    target = target.clip_to_image(remove_empty=True)
  File "s/fagangjin/work/maskrcnn-benchmark_local/vendor/maskrcnn-benchmark/maskrcnn_benchmark/structures/bounding_box.py", line 223, in clip_to_image
    return self[keep]
  File k/maskrcnn-benchmark_local/vendor/maskrcnn-benchmark/maskrcnn_benchmark/structures/bounding_box.py", line 208, in __getitem__
    bbox.add_field(k, v[item])
  File "/maskrcnn-benchmark_local/vendor/maskrcnn-benchmark/maskrcnn_benchmark/structures/segmentation_mask.py", line 553, in __getitem__
    selected_instances = self.instances.__getitem__(item)
  File "/maskrcnn-benchmark_local/vendor/maskrcnn-benchmark/maskrcnn_benchmark/structures/segmentation_mask.py", line 462, in __getitem__
    selected_polygons.append(self.polygons[i])
IndexError: list index out of range


This bug can be seen previously, but I am sure this bug is not related that one since I just cloned a fresh new maskrcnn-benmark.

Seems only happens on pytorch 1.3?

To be more detail, it happens in these line codes:

link

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:7 (2 by maintainers)

github_iconTop GitHub Comments

6reactions
zhenglileicommented, Sep 12, 2019

@jinfagang I pasted the wrong path in my previous reply. To resolve the warning, I only change the below file.

maskrcnn_benchmark/modeling/rpn/inference.py
191:            inds_mask = torch.zeros_like(objectness, dtype=torch.uint8)

From release 1.2, It shows that “Masking via torch.uint8 Tensors is now deprecated in favor of masking via torch.bool Tensors.”

Therefore, the warning is showed only when we use torch.uint8 as index or mask to select tensor. And other places using dtype=torch.uint8 needn’t be changed.

You are right, thanks! The warning disappears when in this line “uint8” is replaced by “bool”.

2reactions
henrywang1commented, Sep 11, 2019

I think it is because the comparison operations return dtype has changed in PyTorch 1.2. https://github.com/pytorch/pytorch/releases

If __getitem__ is called from clip_to_image, the dtype of keep is changed from torch.uint8 to torch.bool.

so you could change the dtype checking in __getitem__ from item.dtype == torch.uint8: to item.dtype == torch.bool:

def clip_to_image(self, remove_empty=True):
    TO_REMOVE = 1
    self.bbox[:, 0].clamp_(min=0, max=self.size[0] - TO_REMOVE)
    self.bbox[:, 1].clamp_(min=0, max=self.size[1] - TO_REMOVE)
    self.bbox[:, 2].clamp_(min=0, max=self.size[0] - TO_REMOVE)
    self.bbox[:, 3].clamp_(min=0, max=self.size[1] - TO_REMOVE)
    if remove_empty:
        box = self.bbox
        keep = (box[:, 3] > box[:, 1]) & (box[:, 2] > box[:, 0])
        return self[keep]
    return self

We could also resolve the warnings by modifying the dtype in maskrcnn_benchmark/modeling/balanced_positive_negative_sampler.py.

Read more comments on GitHub >

github_iconTop Results From Across the Web

pytorch 1.7.0 is much slower than pytorch 1.3.1 - vision
7.0 took more time training especially when I enabled nvidia-apex or torch.cuda.amp, it was even up to 6x time slower! (2080Ti pytorch1.3.1 with ......
Read more >
Help installing 1.3 - PyTorch Forums
I am trying to install the latest Pytorch using the conda installation on my Windows 10 machine. I get the following error -...
Read more >
How to load checkpoints across different versions of pytorch ...
For this reason, I am having issues when sending and receiving checkpoints between different computers, clusters and my personal mac. I wonder ...
Read more >
Bug of pytorch 1.10 for NVIDIA RTX A6000 - autograd
Hi there, I ran my code below on RTX A6000 with 2 GPUs or 4 GPUs. However, the CE loss becomes nan after...
Read more >
How to fix this nan bug? - autograd - PyTorch Forums
I've used torch.autograd.detect_anomaly() to debug, ... home/user/anaconda3/envs/pytorch-1.3.1/lib/python3.7/site-packages/torch/tensor.py", ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found