RuntimeError: received 0 items of ancdata
See original GitHub issueโStochasticโ issue happening with training at some point. Training starts okay for x number of epochs and at some point this often happens with Pytorch Lightning
(quite close still to the Build a segmentation workflow (with PyTorch Lightning)
)
, and is probably propagating from Pytorch code? (e.g. https://github.com/fastai/fastai/issues/23)
reduction.py", line 161, in recvfds
len(ancdata))
RuntimeError: received 0 items of ancdata
TypeError: 'NoneType' object is not iterable
Which I thought was happening first with the CacheDataset
as it was quite RAM-intensive?:
train_ds = CacheDataset(data=datalist_train, transform=train_trans, cache_rate=1, num_workers=4)
val_ds = CacheDataset(data=datalist_val, transform=val_trans, cache_rate=1, num_workers=4)
but the same behavior was happening with the vanilla loader
train_ds = Dataset(data=datalist_train, transform=train_trans)
val_ds = Dataset(data=datalist_val, transform=val_trans)
with the following transformation
I guess this depends on environment in which the code is run, but do you have any ideas how to get rid of this?
Full trace:
MONAI version: 0.2.0
Python version: 3.6.8 |Anaconda, Inc.| (default, Dec 30 2018, 01:22:34) [GCC 7.3.0]
Numpy version: 1.18.5
Pytorch version: 1.5.0
Optional dependencies:
Pytorch Ignite version: 0.3.0
Nibabel version: 3.1.0
scikit-image version: 0.17.2
Pillow version: 7.1.2
Tensorboard version: 2.2.2
GPU available: True, used: True
TPU available: False, using: 0 TPU cores
CUDA_VISIBLE_DEVICES: [0]
| Name | Type | Params
-------------------------------------------
0 | _model | UNet | 4 M
1 | loss_function | DiceLoss | 0
Validation sanity check: 100%|โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ| 1/1 [00:05<00:00, 5.89s/it]
Validation sanity check: 100%|โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ| 1/1 [00:05<00:00, 5.89s/it]
current epoch: 0 current mean loss: 0.6968 best mean loss: 0.6968 (best dice at that loss 0.0061) at epoch 0
Epoch 1: 100%|โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ| 222/222 [08:21<00:00, 2.26s/it, loss=0.604, v_num=0]
...
Epoch 41: 83%|โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ | 184/222 [06:40<01:22, 2.18s/it, loss=0.281, v_num=0]
Traceback (most recent call last):
trainer.fit(net)
site-packages/pytorch_lightning/trainer/trainer.py", line 918, in fit
self.single_gpu_train(model)
site-packages/pytorch_lightning/trainer/distrib_parts.py", line 176, in single_gpu_train
self.run_pretrain_routine(model)
site-packages/pytorch_lightning/trainer/trainer.py", line 1093, in run_pretrain_routine
self.train()
site-packages/pytorch_lightning/trainer/training_loop.py", line 375, in train
self.run_training_epoch()
site-packages/pytorch_lightning/trainer/training_loop.py", line 445, in run_training_epoch
enumerate(_with_is_last(train_dataloader)), "get_train_batch"
site-packages/pytorch_lightning/profiler/profilers.py", line 64, in profile_iterable
value = next(iterator)
site-packages/pytorch_lightning/trainer/training_loop.py", line 844, in _with_is_last
for val in it:
site-packages/torch/utils/data/dataloader.py", line 345, in __next__
data = self._next_data()
site-packages/torch/utils/data/dataloader.py", line 841, in _next_data
idx, data = self._get_data()
site-packages/torch/utils/data/dataloader.py", line 808, in _get_data
success, data = self._try_get_data()
site-packages/torch/utils/data/dataloader.py", line 761, in _try_get_data
data = self._data_queue.get(timeout=timeout)
File "/home/petteri/anaconda3/lib/python3.6/multiprocessing/queues.py", line 113, in get
return _ForkingPickler.loads(res)
site-packages/torch/multiprocessing/reductions.py", line 294, in rebuild_storage_fd
fd = df.detach()
File "/home/petteri/anaconda3/lib/python3.6/multiprocessing/resource_sharer.py", line 58, in detach
return reduction.recv_handle(conn)
File "/home/petteri/anaconda3/lib/python3.6/multiprocessing/reduction.py", line 182, in recv_handle
return recvfds(s, 1)[0]
File "/home/petteri/anaconda3/lib/python3.6/multiprocessing/reduction.py", line 161, in recvfds
len(ancdata))
RuntimeError: received 0 items of ancdata
Exception ignored in: <object repr() failed>
Traceback (most recent call last):
site-packages/tqdm/std.py", line 1086, in __del__
site-packages/tqdm/std.py", line 1293, in close
site-packages/tqdm/std.py", line 1471, in display
site-packages/tqdm/std.py", line 1089, in __repr__
site-packages/tqdm/std.py", line 1433, in format_dict
TypeError: 'NoneType' object is not iterable
Issue Analytics
- State:
- Created 3 years ago
- Comments:10 (5 by maintainers)
Top Results From Across the Web
RuntimeError: received 0 items of ancdata - PyTorch Forums
How to solve it? 1 Like. Training stops due to Caught RuntimeError in DataLoader worker process 0 with large dataset of files.
Read more >How to resolve the error: RuntimeError: received 0 items of ...
I have a torch.utils.data.DataLoader. I have created them with the following code. transform_train = transforms.Compose([ transforms.RandomCrop(ย ...
Read more >RuntimeError: received 0 items of ancdata - Part 1 (2019)
I have a -trained- learner, which I'm trying to use to make predictions on a validation set, which consists of 100.000 samples, via...
Read more >๏ปฟ[Pytorch] RuntimeError: received 0 items of ancdata ํด๊ฒฐ ๋ฐฉ๋ฒ
๏ปฟ[Pytorch] RuntimeError: received 0 items of ancdata ํด๊ฒฐ ๋ฐฉ๋ฒ ... >ulimit -a core file size (blocks, -c) 0 data seg size (kbytes,ย ...
Read more >RuntimeError: received 0 items of ancdata - CSDNๅๅฎข
ๆน1๏ผpool = torch.multiprocessing.Pool(torch.multiprocessing.cpu_count(), maxtasksperchild=1)ๆน2๏ผ ไฟฎๆนๅค็บฟ็จ็tensorๆนๅผไธบfile_system๏ผ้ป่ฎคย ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Thanks, it solved my problem.
@Nic-Ma : I came across again this error when trying a custom loss outside the
Monai
and added one more volume to the dataloader, thus giving CPUs more stuff to compute. I could not get past 1st epoch.Apparently it is the multiprocessing that is causing the headaches, thus making the problem a bit local to my machine and hard to reproduce.
And tried some of the fixes from there https://github.com/pytorch/pytorch/issues/973, and got at least past the first epoch, and will see how robust these workarounds are
https://github.com/pytorch/pytorch/issues/973#issuecomment-604473515:
https://github.com/pytorch/pytorch/issues/973#issuecomment-310397437:
https://github.com/pytorch/pytorch/issues/973#issuecomment-346405667:
@sampathweb had that suggestion for debugging https://github.com/pytorch/pytorch/issues/973#issuecomment-345089046: If the core devs want to see the error, just reduce the ulimit to 1024 and run the code of @kamo-naoyuki above and you might see the same problem.