question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. Itย collects links to all the places you might be looking at while hunting down a tough bug.

And, if youโ€™re still stuck at the end, weโ€™re happy to hop on a call to see how we can help out.

RuntimeError: received 0 items of ancdata

See original GitHub issue

โ€œStochasticโ€ issue happening with training at some point. Training starts okay for x number of epochs and at some point this often happens with Pytorch Lightning (quite close still to the Build a segmentation workflow (with PyTorch Lightning) ) , and is probably propagating from Pytorch code? (e.g. https://github.com/fastai/fastai/issues/23)

reduction.py", line 161, in recvfds
    len(ancdata))
RuntimeError: received 0 items of ancdata
TypeError: 'NoneType' object is not iterable

Which I thought was happening first with the CacheDataset as it was quite RAM-intensive?:

train_ds = CacheDataset(data=datalist_train, transform=train_trans, cache_rate=1, num_workers=4)
val_ds = CacheDataset(data=datalist_val, transform=val_trans, cache_rate=1, num_workers=4)

but the same behavior was happening with the vanilla loader

train_ds = Dataset(data=datalist_train, transform=train_trans)
val_ds = Dataset(data=datalist_val, transform=val_trans)

with the following transformation

image

I guess this depends on environment in which the code is run, but do you have any ideas how to get rid of this?

Full trace:


MONAI version: 0.2.0
Python version: 3.6.8 |Anaconda, Inc.| (default, Dec 30 2018, 01:22:34)  [GCC 7.3.0]
Numpy version: 1.18.5
Pytorch version: 1.5.0

Optional dependencies:
Pytorch Ignite version: 0.3.0
Nibabel version: 3.1.0
scikit-image version: 0.17.2
Pillow version: 7.1.2
Tensorboard version: 2.2.2

GPU available: True, used: True
TPU available: False, using: 0 TPU cores
CUDA_VISIBLE_DEVICES: [0]

  | Name          | Type     | Params
-------------------------------------------
0 | _model        | UNet     | 4 M   
1 | loss_function | DiceLoss | 0     
Validation sanity check: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 1/1 [00:05<00:00,  5.89s/it]
Validation sanity check: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 1/1 [00:05<00:00,  5.89s/it]
current epoch: 0 current mean loss: 0.6968 best mean loss: 0.6968 (best dice at that loss 0.0061) at epoch 0
Epoch 1: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 222/222 [08:21<00:00,  2.26s/it, loss=0.604, v_num=0]
...
Epoch 41:  83%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–‹             | 184/222 [06:40<01:22,  2.18s/it, loss=0.281, v_num=0]
Traceback (most recent call last):                                                                                                                  
  trainer.fit(net)
  site-packages/pytorch_lightning/trainer/trainer.py", line 918, in fit
    self.single_gpu_train(model)
  site-packages/pytorch_lightning/trainer/distrib_parts.py", line 176, in single_gpu_train
    self.run_pretrain_routine(model)
  site-packages/pytorch_lightning/trainer/trainer.py", line 1093, in run_pretrain_routine
    self.train()
  site-packages/pytorch_lightning/trainer/training_loop.py", line 375, in train
    self.run_training_epoch()
  site-packages/pytorch_lightning/trainer/training_loop.py", line 445, in run_training_epoch
    enumerate(_with_is_last(train_dataloader)), "get_train_batch"
  site-packages/pytorch_lightning/profiler/profilers.py", line 64, in profile_iterable
    value = next(iterator)
  site-packages/pytorch_lightning/trainer/training_loop.py", line 844, in _with_is_last
    for val in it:
  site-packages/torch/utils/data/dataloader.py", line 345, in __next__
    data = self._next_data()
  site-packages/torch/utils/data/dataloader.py", line 841, in _next_data
    idx, data = self._get_data()
  site-packages/torch/utils/data/dataloader.py", line 808, in _get_data
    success, data = self._try_get_data()
  site-packages/torch/utils/data/dataloader.py", line 761, in _try_get_data
    data = self._data_queue.get(timeout=timeout)
  File "/home/petteri/anaconda3/lib/python3.6/multiprocessing/queues.py", line 113, in get
    return _ForkingPickler.loads(res)
  site-packages/torch/multiprocessing/reductions.py", line 294, in rebuild_storage_fd
    fd = df.detach()
  File "/home/petteri/anaconda3/lib/python3.6/multiprocessing/resource_sharer.py", line 58, in detach
    return reduction.recv_handle(conn)
  File "/home/petteri/anaconda3/lib/python3.6/multiprocessing/reduction.py", line 182, in recv_handle
    return recvfds(s, 1)[0]
  File "/home/petteri/anaconda3/lib/python3.6/multiprocessing/reduction.py", line 161, in recvfds
    len(ancdata))

RuntimeError: received 0 items of ancdata
Exception ignored in: <object repr() failed>
Traceback (most recent call last):
  site-packages/tqdm/std.py", line 1086, in __del__
  site-packages/tqdm/std.py", line 1293, in close
  site-packages/tqdm/std.py", line 1471, in display
  site-packages/tqdm/std.py", line 1089, in __repr__
  site-packages/tqdm/std.py", line 1433, in format_dict
TypeError: 'NoneType' object is not iterable

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:10 (5 by maintainers)

github_iconTop GitHub Comments

13reactions
cuge1995commented, Jan 26, 2021
import resource
rlimit = resource.getrlimit(resource.RLIMIT_NOFILE)
resource.setrlimit(resource.RLIMIT_NOFILE, (4096, rlimit[1]))

Thanks, it solved my problem.

8reactions
petteriTeikaricommented, Jul 25, 2020

@Nic-Ma : I came across again this error when trying a custom loss outside the Monai and added one more volume to the dataloader, thus giving CPUs more stuff to compute. I could not get past 1st epoch.

Apparently it is the multiprocessing that is causing the headaches, thus making the problem a bit local to my machine and hard to reproduce.

And tried some of the fixes from there https://github.com/pytorch/pytorch/issues/973, and got at least past the first epoch, and will see how robust these workarounds are

https://github.com/pytorch/pytorch/issues/973#issuecomment-604473515:

torch.multiprocessing.set_sharing_strategy('file_system')

https://github.com/pytorch/pytorch/issues/973#issuecomment-310397437:

pool = torch.multiprocessing.Pool(torch.multiprocessing.cpu_count(), maxtasksperchild=1)

https://github.com/pytorch/pytorch/issues/973#issuecomment-346405667:

import resource
rlimit = resource.getrlimit(resource.RLIMIT_NOFILE)
resource.setrlimit(resource.RLIMIT_NOFILE, (4096, rlimit[1]))

@sampathweb had that suggestion for debugging https://github.com/pytorch/pytorch/issues/973#issuecomment-345089046: If the core devs want to see the error, just reduce the ulimit to 1024 and run the code of @kamo-naoyuki above and you might see the same problem.

Read more comments on GitHub >

github_iconTop Results From Across the Web

RuntimeError: received 0 items of ancdata - PyTorch Forums
How to solve it? 1 Like. Training stops due to Caught RuntimeError in DataLoader worker process 0 with large dataset of files.
Read more >
How to resolve the error: RuntimeError: received 0 items of ...
I have a torch.utils.data.DataLoader. I have created them with the following code. transform_train = transforms.Compose([ transforms.RandomCrop(ย ...
Read more >
RuntimeError: received 0 items of ancdata - Part 1 (2019)
I have a -trained- learner, which I'm trying to use to make predictions on a validation set, which consists of 100.000 samples, via...
Read more >
๏ปฟ[Pytorch] RuntimeError: received 0 items of ancdata ํ•ด๊ฒฐ ๋ฐฉ๋ฒ•
๏ปฟ[Pytorch] RuntimeError: received 0 items of ancdata ํ•ด๊ฒฐ ๋ฐฉ๋ฒ• ... >ulimit -a core file size (blocks, -c) 0 data seg size (kbytes,ย ...
Read more >
RuntimeError: received 0 items of ancdata - CSDNๅšๅฎข
ๆ–น1๏ผšpool = torch.multiprocessing.Pool(torch.multiprocessing.cpu_count(), maxtasksperchild=1)ๆ–น2๏ผš ไฟฎๆ”นๅคš็บฟ็จ‹็š„tensorๆ–นๅผไธบfile_system๏ผˆ้ป˜่ฎคย ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found