question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

0.12 locky backend: can't inherit of a class who uses joblib and is defined on another module

See original GitHub issue

Hello,

With joblib 0.12 and the default locky backend, I can’t inherit of a class who uses joblib and is defined on another module.

If I push that class in the same module, or if I go back to multiprocessing backend, it works.

It is possible, but I’m not sure, that this is a duplicate of https://github.com/joblib/joblib/issues/643. Sorry if that’s the case.

The error is “A process in the executor was terminated abruptly while the future was running or pending.”. Moreover, while showing stderr when I have the crash, I see (TestJoblibLog is my child class): AttributeError: Can't get attribute 'TestJoblibLog' on <module 'joblib.externals.loky.backend.popen_loky_posix' from '/home/tmp_mihoo/miniconda3/lib/python3.6/site-packages/joblib/externals/loky/backend/popen_loky_posix.py'>

So, it seems that joblib locky imports the module where joblib is called, but not the calling module which uses the results.

Please see example codes below.

Thanks!

Example and code to reproduce

Execution results

./in_modules.py 2>/dev/null

Computing chunks
Running in parallel
Hello, ['foo', 'bar', 'baz']
Extraction done
Success with default backend: ['foo', 'bar', 'baz']
Mapping foo start: 1 of 3: 33.33%
Hello, foo
Mapping foo done: 1 of 3: 33.33% in 0:00:00.180327, remaining 0:00:00.360654, end at 2018-07-25 10:52:15.640262
Mapping bar start: 2 of 3: 66.67%
Hello, bar
Mapping bar done: 2 of 3: 66.67% in 0:00:00.181148, remaining 0:00:00.090574, end at 2018-07-25 10:52:15.371003
Mapping baz start: 3 of 3: 100.00%
Hello, baz
Mapping baz done: 3 of 3: 100.00% in 0:00:00.181711, remaining 0:00:00, end at 2018-07-25 10:52:15.280992

./imported.py 2>/dev/null

Computing chunks
Running in parallel
Exception with default backend: A process in the executor was terminated abruptly while the future was running or pending.
Computing chunks
Running in parallel
Using multiprocessing joblib backend instead of the default one
Mapping foo start: 1 of 3: 33.33%
Hello, foo
Mapping foo done: 1 of 3: 33.33% in 0:00:00.013259, remaining 0:00:00.026518, end at 2018-07-25 10:56:25.080652
Mapping bar start: 2 of 3: 66.67%
Hello, bar
Mapping bar done: 2 of 3: 66.67% in 0:00:00.013529, remaining 0:00:00.006764, end at 2018-07-25 10:56:25.061168
Mapping baz start: 3 of 3: 100.00%
Hello, baz
Mapping baz done: 3 of 3: 100.00% in 0:00:00.013826, remaining 0:00:00, end at 2018-07-25 10:56:25.054701
Hello, ['foo', 'bar', 'baz']
Extraction done
Success with mutiprocessing backend: ['foo', 'bar', 'baz']

Files

imported.py

#!/usr/bin/env python3

from wrapper import MapReduceWrapper


class TestJoblibLog(MapReduceWrapper):
    def __init__(self, *args, **kwargs):
        MapReduceWrapper.__init__(self, *args, **kwargs)

    def computeChunks(self):
        print("Computing chunks")
        return ['foo', 'bar', 'baz']

    def map(self, chunk):
        print(f"Hello, {chunk}")
        return chunk

    def reduce(self, mapResults):
        print(f"Hello, {str(mapResults)}")
        return mapResults


if __name__ == "__main__":
    try:
        print(
            "Success with default backend:",
            TestJoblibLog(backend=None).get()
        )
    except Exception as e:
        print(f"Exception with default backend: {e}")
        try:
            print(
                "Success with mutiprocessing backend:",
                TestJoblibLog(backend='multiprocessing').get()
            )
        except Exception as e:
            print(f"Exception with multiprocessing backend: {e}")
            print("No result")

wrapper.py

#!/usr/bin/env python3

import abc
import datetime
import joblib as jl


class MapReduceWrapper(metaclass=abc.ABCMeta):
    def __init__(self, nJobs=-1, verbose=2, backend='multiprocessing'):
        """Constructor
        - nJobs: if 1 or 0, don't use parallel processing; else, use joblib
                 parallel processing, and forward this value to the n_jobs
                 argument of the joblib.Parallel's constructor
        - verbose: 0: don't show anything, 1: show chunk name as info message,
                   2: show chunk name and progres as info message
        - backend: the joblib backend. Useful because, since jl 0.12, the
                   locky backend is used by default, and it makes some issues;
                   in this case, multiprocessing backend is better. If
                   the provided value is None, it let joblib choose the
                   backend.
        Please note that, because of parallel processing, the estimated
        progress and elapsed time are not accurate
        """
        self._nJobs = nJobs
        self._verbose = verbose
        self._useParallel = (nJobs not in (0, 1))
        self._mapStart = None
        self._backend = backend
        self._chunks = self.computeChunks()

    @abc.abstractmethod
    def computeChunks(self):
        """Returns the list of the chunks
        Will typically give a list based on child's class constructor's
        arguments
        To be implemented in the child classes
        """

    @abc.abstractmethod
    def map(self, chunk):
        """Gets data from the provided chunk and returns the result
        - chunk: the chunk identifier
        To be implemented in the child classes
        """

    @abc.abstractmethod
    def reduce(self, mapResults):
        """Reduce the data retrieven from every map call and returns the result
        - mapResults: list of map results
        To be implemented in the child classes
        """

    def _loggedMap(self, i, n, x):
        """Wrapper for map, logging the ongoing chunk + progress estimation
        - i: id of the current chunk, from 1 to the number of chunks
        - n: number of chunks
        - x: chunk identifier
        """
        if self._verbose == 2:
            print("Mapping %s start: %d of %d: %.2f%%" % (
                x,
                i,
                n,
                100 * i / n,
            ))
        elif self._verbose == 1:
            print("Mapping %s" % x)
        ret = self.map(x)

        if self._verbose == 2:
            progress = 100 * i / n
            now = datetime.datetime.now()
            elapsed = now - self._mapStart
            remaining = datetime.timedelta(
                seconds=(
                    elapsed.total_seconds() * (100 - progress) / progress
                )
            )
            endAt = now + remaining
            print(
                (
                    "Mapping %s done: %d of %d: %.2f%% in %s, remaining %s, "
                    "end at %s"
                ) % (
                    x,
                    i,
                    n,
                    100 * i / n,
                    str(elapsed),
                    str(remaining),
                    str(endAt)
                )
            )
        return ret

    def get(self):
        """Computes and returns the selected data"""
        n = len(self._chunks)
        self._mapStart = datetime.datetime.now()
        if self._useParallel:
            print("Running in parallel")
            if self._nJobs != -1:
                self.logWarning(
                    "Running %d jobs in parallel, which might be different "
                    "than one per CPU"
                )
            jlKwargs = dict(n_jobs=self._nJobs, verbose=0)
            if self._backend is not None:
                print(
                    f"Using {self._backend} joblib backend "
                    "instead of the default one"
                )
                jlKwargs['backend'] = self._backend
            xs = jl.Parallel(**jlKwargs)(
                jl.delayed(self._loggedMap)(
                    i + 1, n, chunk
                ) for i, chunk in list(enumerate(self._chunks))
            )
        else:
            self.logWarning(
                "Parallel processing disabled; running in sequence"
            )
            xs = [
                self._loggedMap(
                    i + 1, n, chunk
                ) for (
                    i, chunk
                ) in enumerate(
                    self._chunks
                )
            ]
        self._mapStart = None
        ret = self.reduce(xs)
        try:
            ret = ret.copy()
        except AttributeError:
            self.logWarning(
                f"Deep copy unavailable for return type {type(ret)}, "
                "returning direct value instead."
            )
        print("Extraction done")
        return ret

in_module.py

#!/usr/bin/env python3

import abc
import datetime
import joblib as jl

from vitriol.base import VitriolBase
from vitriol.algorithms.parallel import MapReduceWrapper

class MapReduceWrapper(metaclass=abc.ABCMeta):
    def __init__(self, nJobs=-1, verbose=2, backend='multiprocessing'):
        """Constructor
        - nJobs: if 1 or 0, don't use parallel processing; else, use joblib
                 parallel processing, and forward this value to the n_jobs
                 argument of the joblib.Parallel's constructor
        - verbose: 0: don't show anything, 1: show chunk name as info message,
                   2: show chunk name and progres as info message
        - backend: the joblib backend. Useful because, since jl 0.12, the
                   locky backend is used by default, and it makes some issues;
                   in this case, multiprocessing backend is better. If
                   the provided value is None, it let joblib choose the
                   backend.
        Please note that, because of parallel processing, the estimated
        progress and elapsed time are not accurate
        """
        self._nJobs = nJobs
        self._verbose = verbose
        self._useParallel = (nJobs not in (0, 1))
        self._mapStart = None
        self._backend = backend
        self._chunks = self.computeChunks()

    @abc.abstractmethod
    def computeChunks(self):
        """Returns the list of the chunks
        Will typically give a list based on child's class constructor's
        arguments
        To be implemented in the child classes
        """

    @abc.abstractmethod
    def map(self, chunk):
        """Gets data from the provided chunk and returns the result
        - chunk: the chunk identifier
        To be implemented in the child classes
        """

    @abc.abstractmethod
    def reduce(self, mapResults):
        """Reduce the data retrieven from every map call and returns the result
        - mapResults: list of map results
        To be implemented in the child classes
        """

    def _loggedMap(self, i, n, x):
        """Wrapper for map, logging the ongoing chunk + progress estimation
        - i: id of the current chunk, from 1 to the number of chunks
        - n: number of chunks
        - x: chunk identifier
        """
        if self._verbose == 2:
            print("Mapping %s start: %d of %d: %.2f%%" % (
                x,
                i,
                n,
                100 * i / n,
            ))
        elif self._verbose == 1:
            print("Mapping %s" % x)
        ret = self.map(x)

        if self._verbose == 2:
            progress = 100 * i / n
            now = datetime.datetime.now()
            elapsed = now - self._mapStart
            remaining = datetime.timedelta(
                seconds=(
                    elapsed.total_seconds() * (100 - progress) / progress
                )
            )
            endAt = now + remaining
            print(
                (
                    "Mapping %s done: %d of %d: %.2f%% in %s, remaining %s, "
                    "end at %s"
                ) % (
                    x,
                    i,
                    n,
                    100 * i / n,
                    str(elapsed),
                    str(remaining),
                    str(endAt)
                )
            )
        return ret

    def get(self):
        """Computes and returns the selected data"""
        n = len(self._chunks)
        self._mapStart = datetime.datetime.now()
        if self._useParallel:
            print("Running in parallel")
            if self._nJobs != -1:
                self.logWarning(
                    "Running %d jobs in parallel, which might be different "
                    "than one per CPU"
                )
            jlKwargs = dict(n_jobs=self._nJobs, verbose=0)
            if self._backend is not None:
                print(
                    f"Using {self._backend} joblib backend "
                    "instead of the default one"
                )
                jlKwargs['backend'] = self._backend
            xs = jl.Parallel(**jlKwargs)(
                jl.delayed(self._loggedMap)(
                    i + 1, n, chunk
                ) for i, chunk in list(enumerate(self._chunks))
            )
        else:
            self.logWarning(
                "Parallel processing disabled; running in sequence"
            )
            xs = [
                self._loggedMap(
                    i + 1, n, chunk
                ) for (
                    i, chunk
                ) in enumerate(
                    self._chunks
                )
            ]
        self._mapStart = None
        ret = self.reduce(xs)
        try:
            ret = ret.copy()
        except AttributeError:
            self.logWarning(
                f"Deep copy unavailable for return type {type(ret)}, "
                "returning direct value instead."
            )
        print("Extraction done")
        return ret


class TestJoblibLog(MapReduceWrapper):
    def __init__(self, *args, **kwargs):
        MapReduceWrapper.__init__(self, *args, **kwargs)

    def computeChunks(self):
        print("Computing chunks")
        return ['foo', 'bar', 'baz']

    def map(self, chunk):
        print(f"Hello, {chunk}")
        return chunk

    def reduce(self, mapResults):
        print(f"Hello, {str(mapResults)}")
        return mapResults


if __name__ == "__main__":
    try:
        print(
            "Success with default backend:",
            TestJoblibLog(backend=None).get()
        )
    except Exception as e:
        print(f"Exception with default backend: {e}")
        try:
            print(
                "Success with mutiprocessing backend:",
                TestJoblibLog(backend='multiprocessing').get()
            )
        except Exception as e:
            print(f"Exception with multiprocessing backend: {e}")
            print("No result")

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:17 (11 by maintainers)

github_iconTop GitHub Comments

1reaction
mhooremancommented, Aug 5, 2018

@ogrisel

I’ve checked with joblib 0.12.2 on a dedicated conda env and manuall installation of joblib.

I’ve added (the forgotten) following to test.py:

if __name__ == "__main__":
    test(ConcreteWrapper)

With python 3.6 and joblib 0.12.2: If works perfectly.

With python 3.7 and joblib 0.12.2: Not event multiprocessing works (can’t pickle _abc_data objects)

1reaction
ogriselcommented, Jul 26, 2018

I see no reason why _abc_data would not be picklable. I think this need a fix in cloudpickle.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Development — joblib 1.3.0.dev0 documentation
Fix a problem in the constructors of of Parallel backends classes that inherit from the AutoBatchingMixin that prevented the dask backend to properly...
Read more >
Chapter 4: Classes and modules - eqqon
In this chapter, we'll see the details of the data structures created for classes and modules. Classes and methods definition. First, I'd like...
Read more >
joblib Documentation - Read the Docs
The Memory class defines a context for lazy evaluation of ... Parallel uses the 'loky' backend module to start separate Python worker ...
Read more >
Intermediate results from joblib - python - Stack Overflow
parallel.get_active_backend()[0]) . Like so: class ImmediateResultBackend(type(joblib.parallel.get_active_backend()[0])): ...
Read more >
Joomla! CMS #34982 - [4.0] Unable to add module class suffix
Steps to reproduce the issue. Try to add the module class suffix to the modules (backend and frontend). Expected result. Can add the...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found