question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

mpi4py.futures: embed stringification of remote traceback in local traceback

See original GitHub issue

Tracebacks of exceptions raised remotely via Executor.map() don’t show the call stack of the worker but only

…/concurrent/futures/_base.py", line 391, in __get_result raise self._exception

Python’s internal concurrent.futures shows the remote traceback through the following “hack” at https://github.com/python/cpython/blob/5bc2390229bbcb4f13359e867fd8a140a1d5496b/Lib/concurrent/futures/process.py#L116

# Hack to embed stringification of remote traceback in local traceback

class _RemoteTraceback(Exception):
    def __init__(self, tb):
        self.tb = tb
    def __str__(self):
        return self.tb

class _ExceptionWithTraceback:
    def __init__(self, exc, tb):
        tb = ''.join(format_exception(type(exc), exc, tb))
        self.exc = exc
        # Traceback object needs to be garbage-collected as its frames
        # contain references to all the objects in the exception scope
        self.exc.__traceback__ = None
        self.tb = '\n"""\n%s"""' % tb
    def __reduce__(self):
        return _rebuild_exc, (self.exc, self.tb)

def _rebuild_exc(exc, tb):
    exc.__cause__ = _RemoteTraceback(tb)
    return exc

I suggest a similar technique be applied at https://github.com/mpi4py/mpi4py/blob/d56ecfd3ca29ef018420c964359e759fbe230f1c/src/mpi4py/futures/_lib.py#L59

def sys_exception():
    exc = sys.exc_info()[1]
    exc.__traceback__ = None
    return exc

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:10 (8 by maintainers)

github_iconTop GitHub Comments

2reactions
jcphillcommented, May 9, 2022

While we’re at it, it would be (sometimes very) useful to include mpi4py.MPI.Get_processor_name() and mpi4py.MPI.COMM_WORLD.Get_rank() in the traceback.

Can you elaborate? How would this be useful? The user has no control on which MPI rank executes what task. What would you do with such additional information?

@dalcinl Sometimes errors are related to problems with a specific host and having the host name (which is what I really want from Get_processor_name()) is critical to diagnosing the problem. A simple example would be failures due to /tmp being full but it could also be flaky hardware or inconsistent configuration. There could also be cases where the problem always occurs on a specific rank due to previous activity on that rank, and having both the rank and the hostname makes it possible to demonstrate that failures are on the same host regardless of its rank.

1reaction
dalcinlcommented, May 9, 2022

@jcphill @leofang The fun continues in #206

Read more comments on GitHub >

github_iconTop Results From Across the Web

mpi4py error during getting results (in pare with SLURM) #45
ERROR: Traceback (most recent call last): File "/opt/software/anaconda/3/lib/python3.6/runpy.py", line 193, in _run_module_as_main
Read more >
mpi4py.futures — MPI for Python 3.1.4 documentation
This package provides a high-level interface for asynchronously executing callables on a pool of worker processes using MPI for inter-process communication.
Read more >
Getting original line number for exception in concurrent.futures
I was able to develop this workaround which consists in using the following subclass of the ThreadPoolExecutor . import sys import traceback from...
Read more >
MPI for Python - Read the Docs
mpi4py.futures provides the MPIPoolExecutor class as a concrete ... MPI exceptions will print a traceback which helps in locating problems ...
Read more >
Error MPIPoolExecutor for integer overflow - mpi4py - Bitbucket
I am trying to MPIPoolExecutor to run in parallel a function. I get the following traceback error.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found