Run inference script crashes
See original GitHub issueHi all,
I have tried to run the Pytorch version after I initially tried with the Tensorflow version. I tried to run the inference script in wsi mode with a ndpi image. It start correct but mid-way through the process I got this error:
Process Chunk 48/99: 61%|#############5 | 35/57 [02:19<01:11, 3.23s/it]|2021-01-06|13:06:15.182| [ERROR] Crash
Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 779, in _try_get_data
data = self._data_queue.get(timeout=timeout)
File "/usr/local/lib/python3.7/multiprocessing/queues.py", line 113, in get
return _ForkingPickler.loads(res)
File "/usr/local/lib/python3.7/site-packages/torch/multiprocessing/reductions.py", line 282, in rebuild_storage_fd
fd = df.detach()
File "/usr/local/lib/python3.7/multiprocessing/resource_sharer.py", line 58, in detach
return reduction.recv_handle(conn)
File "/usr/local/lib/python3.7/multiprocessing/reduction.py", line 185, in recv_handle
return recvfds(s, 1)[0]
File "/usr/local/lib/python3.7/multiprocessing/reduction.py", line 161, in recvfds
len(ancdata))
RuntimeError: received 0 items of ancdata
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 803, in _try_get_data
fs = [tempfile.NamedTemporaryFile() for i in range(fds_limit_margin)]
File "/usr/local/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 803, in <listcomp>
fs = [tempfile.NamedTemporaryFile() for i in range(fds_limit_margin)]
File "/usr/local/lib/python3.7/tempfile.py", line 547, in NamedTemporaryFile
(fd, name) = _mkstemp_inner(dir, prefix, suffix, flags, output_type)
File "/usr/local/lib/python3.7/tempfile.py", line 258, in _mkstemp_inner
fd = _os.open(file, flags, 0o600)
OSError: [Errno 24] Too many open files: '/tmp/tmpxrmts9vn'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/mnt/netcache/pathology/projects/colon-budding-he/nuclei_detection/hover_pytorch/hover_net-master/infer/wsi.py", line 746, in process_wsi_list
self.process_single_file(wsi_path, msk_path, self.output_dir)
File "/mnt/netcache/pathology/projects/colon-budding-he/nuclei_detection/hover_pytorch/hover_net-master/infer/wsi.py", line 550, in process_single_file
self.__get_raw_prediction(chunk_info_list, patch_info_list)
File "/mnt/netcache/pathology/projects/colon-budding-he/nuclei_detection/hover_pytorch/hover_net-master/infer/wsi.py", line 374, in __get_raw_prediction
chunk_patch_info_list[:, 0, 0], pbar_desc
File "/mnt/netcache/pathology/projects/colon-budding-he/nuclei_detection/hover_pytorch/hover_net-master/infer/wsi.py", line 287, in __run_model
for batch_idx, batch_data in enumerate(dataloader):
File "/usr/local/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 363, in __next__
data = self._next_data()
File "/usr/local/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 974, in _next_data
idx, data = self._get_data()
File "/usr/local/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 941, in _get_data
success, data = self._try_get_data()
File "/usr/local/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 807, in _try_get_data
"Too many open files. Communication with the"
RuntimeError: Too many open files. Communication with the workers is no longer possible. Please increase the limit using `ulimit -n` in the shell or change the sharing strategy by calling `torch.multiprocessing.set_sharing_strategy('file_system')` at the beginning of your code
Process Chunk 48/99: 61%|#############5 | 35/57 [02:19<01:27, 4.00s/it]
/usr/local/lib/python3.7/multiprocessing/semaphore_tracker.py:144: UserWarning: semaphore_tracker: There appear to be 1 leaked semaphores to clean up at shutdown
len(cache))
Do you know why this error might occur?
Running on an Ubuntu 20 machine that has a conda env with the requirements.
Issue Analytics
- State:
- Created 3 years ago
- Comments:40 (7 by maintainers)
Top Results From Across the Web
Solved: Openvino crashing during inference - Intel Communities
I'm trying to run a converted Keras model in OpenVino. The conversion process did well, but OpenVino crashes when my program comes to...
Read more >openvino crashes after running inference some seconds in ...
I tried to use Intel Neural Compute Stick 2 as an inference engine for my smart car. I installed l_openvino_toolkit_runtime_raspbian_p_2019 ...
Read more >InferDeleter crash for dynamic netowrk - TensorRT
Description After changing my network from static to dynamic, the program crash at inferDeleter when destroying the objects.
Read more >Automatically run a python script from slicer for deep learning ...
Hi, I am trying to automatically run a deep learning inference from the slicer. ... do from Slicer) and then run the script...
Read more >Torchscript inference is working on PC but is not working on ...
... trace mode and script mode, but when I run the same torchscript on my android application, it crashes without throwing explicit infor…...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
CPU: I7-7700k MEM: 32GB GPU: 1080Ti HDD: 500GB SSD OS: Ubuntu 20.04
I run this script on the server CPU:Intel® Xeon® Platinum 8165 CPU @ 2.30GHz MEM: 378G GPU:Tesla K80 OS:Ubuntu 18.04.2
I met crash out of file pointers. Error message on the terminal just like “RuntimeError: Too many open files. Communication with the workers is no longer possible. Please increase the limit using
ulimit -n
in the shell or change the sharing strategy by callingtorch.multiprocessing.set_sharing_strategy('file_system')
at the beginning of your code”