[Bug] How to handle exceptions raised in parallelized function
See original GitHub issueDescribe the bug I would like to know how I can handle any exception that would occur in the function that I’m trying to parallelize
Minimal code to reproduce
#!/ust/bin/env python3
import pypeln as pl
def compute(x):
if x == 3:
raise ValueError("Value 3 is not supported")
else:
return x*x
data = [1, 2, 3, 4, 5]
stage = pl.process.map(compute, data, workers=4)
for x in stage:
print(f"Result: {x}")
Results
Result: 1
Result: 4
Traceback (most recent call last):
File "test.py", line 14, in <module>
for x in stage:
File "/home/wenzel/local/test_python/pypeln/venv/lib/python3.8/site-packages/pypeln/process/stage.py", line 83, in to_iterable
for elem in main_queue:
File "/home/wenzel/local/test_python/pypeln/venv/lib/python3.8/site-packages/pypeln/process/queue.py", line 48, in __iter__
raise exception
ValueError:
('Value 3 is not supported',)
Traceback (most recent call last):
File "/home/wenzel/local/test_python/pypeln/venv/lib/python3.8/site-packages/pypeln/process/worker.py", line 99, in __call__
self.process_fn(
File "/home/wenzel/local/test_python/pypeln/venv/lib/python3.8/site-packages/pypeln/process/worker.py", line 186, in __call__
self.apply(worker, elem, **kwargs)
File "/home/wenzel/local/test_python/pypeln/venv/lib/python3.8/site-packages/pypeln/process/api/map.py", line 27, in apply
y = self.f(elem.value, **kwargs)
File "test.py", line 7, in compute
raise ValueError("Value 3 is not supported")
ValueError: Value 3 is not supported
Expected behavior I have no expected behavior. instead, i was looking for a way to use the API and get some error recovery. In this situation the whole pipeline is broken, and I’m not sure how to recover.
I’m trying to see if I can switch to your library, coming from concurrent.futures
.
This is the operation i would like to do (demo with concurrent.futures
):
class Downloader(AbstractContextManager):
def __init__(self):
# let Python decide how many workers to use
# usually the best decision for IO tasks
self._logger = logging.getLogger(f"{self.__class__.__module__}.{self.__class__.__name__}")
self._dl_pool = ThreadPoolExecutor()
self._future_to_obj: Dict[Future, FutureData] = {}
self.stats = Counter()
def __enter__(self):
return self
def __exit__(self, exc_type, exc_val, exc_tb):
self._dl_pool.shutdown()
def submit(self, url: str, callback: Callable):
user_data = (url, )
future_data = FutureData(user_data, callback)
future = self._dl_pool.submit(self._download_url, *user_data)
self._future_to_obj[future] = future_data
future.add_done_callback(self._on_download_done)
self.stats["submitted"] += 1
def _download_url(self, url: str) -> str:
# this function might raise multiple network errors
# .....
return r.read()
def _on_download_done(self, future: Future):
try:
future_data: FutureData = self._future_to_obj[future]
except KeyError:
self._logger.debug("Failed to find obj in callback for %s", future)
self.stats["future_fail"] += 1
return
else:
# call the user callback
url, *rest = future_data.user_data
try:
data = future.result()
except Exception: # Here we have error recovery
self._logger.debug("Error while fetching resource: %s", url)
self.stats["fetch_error"] += 1
else:
future_data.user_callback(*future_data.user_data, data)
finally:
self.stats["total"] += 1
⬆️ TLDR I’m using add_done_callback
in order to chain my futures into the next function and create a pipeline.
But as i’m dealing with Future
objects, their exception is only raised when you try to access their result()
(which is not the case with pypeln
)
Library Info 0.4.6
Additional context Add any other context about the problem here.
Thanks for your library, it looks amazing !
Issue Analytics
- State:
- Created 3 years ago
- Comments:6 (4 by maintainers)
Top GitHub Comments
I see, so in looking a bit more at your problem specific code it seems you want to know which elements from the stage failed. You could create a decorator that turns exceptions into return values:
With this you can either handle them immediately in the main thread:
Or even construct longer pipelines:
Maybe error handling of this type could be incorporated into the library either by providing these decorators or directly having a flag throughout the API. It would be nice to see alternative solutions before commiting to something.
Being a native function I guess it would be implemented in C and should not be a problem. There are already a bunch of
instance
in the codebase.