0.12 locky backend: can't inherit of a class who uses joblib and is defined on another module
See original GitHub issueHello,
With joblib 0.12 and the default locky backend, I can’t inherit of a class who uses joblib and is defined on another module.
If I push that class in the same module, or if I go back to multiprocessing backend, it works.
It is possible, but I’m not sure, that this is a duplicate of https://github.com/joblib/joblib/issues/643. Sorry if that’s the case.
The error is “A process in the executor was terminated abruptly while the future was running or pending.
”. Moreover, while showing stderr when I have the crash, I see (TestJoblibLog
is my child class):
AttributeError: Can't get attribute 'TestJoblibLog' on <module 'joblib.externals.loky.backend.popen_loky_posix' from '/home/tmp_mihoo/miniconda3/lib/python3.6/site-packages/joblib/externals/loky/backend/popen_loky_posix.py'>
So, it seems that joblib locky imports the module where joblib is called, but not the calling module which uses the results.
Please see example codes below.
Thanks!
Example and code to reproduce
Execution results
./in_modules.py 2>/dev/null
Computing chunks
Running in parallel
Hello, ['foo', 'bar', 'baz']
Extraction done
Success with default backend: ['foo', 'bar', 'baz']
Mapping foo start: 1 of 3: 33.33%
Hello, foo
Mapping foo done: 1 of 3: 33.33% in 0:00:00.180327, remaining 0:00:00.360654, end at 2018-07-25 10:52:15.640262
Mapping bar start: 2 of 3: 66.67%
Hello, bar
Mapping bar done: 2 of 3: 66.67% in 0:00:00.181148, remaining 0:00:00.090574, end at 2018-07-25 10:52:15.371003
Mapping baz start: 3 of 3: 100.00%
Hello, baz
Mapping baz done: 3 of 3: 100.00% in 0:00:00.181711, remaining 0:00:00, end at 2018-07-25 10:52:15.280992
./imported.py 2>/dev/null
Computing chunks
Running in parallel
Exception with default backend: A process in the executor was terminated abruptly while the future was running or pending.
Computing chunks
Running in parallel
Using multiprocessing joblib backend instead of the default one
Mapping foo start: 1 of 3: 33.33%
Hello, foo
Mapping foo done: 1 of 3: 33.33% in 0:00:00.013259, remaining 0:00:00.026518, end at 2018-07-25 10:56:25.080652
Mapping bar start: 2 of 3: 66.67%
Hello, bar
Mapping bar done: 2 of 3: 66.67% in 0:00:00.013529, remaining 0:00:00.006764, end at 2018-07-25 10:56:25.061168
Mapping baz start: 3 of 3: 100.00%
Hello, baz
Mapping baz done: 3 of 3: 100.00% in 0:00:00.013826, remaining 0:00:00, end at 2018-07-25 10:56:25.054701
Hello, ['foo', 'bar', 'baz']
Extraction done
Success with mutiprocessing backend: ['foo', 'bar', 'baz']
Files
imported.py
#!/usr/bin/env python3
from wrapper import MapReduceWrapper
class TestJoblibLog(MapReduceWrapper):
def __init__(self, *args, **kwargs):
MapReduceWrapper.__init__(self, *args, **kwargs)
def computeChunks(self):
print("Computing chunks")
return ['foo', 'bar', 'baz']
def map(self, chunk):
print(f"Hello, {chunk}")
return chunk
def reduce(self, mapResults):
print(f"Hello, {str(mapResults)}")
return mapResults
if __name__ == "__main__":
try:
print(
"Success with default backend:",
TestJoblibLog(backend=None).get()
)
except Exception as e:
print(f"Exception with default backend: {e}")
try:
print(
"Success with mutiprocessing backend:",
TestJoblibLog(backend='multiprocessing').get()
)
except Exception as e:
print(f"Exception with multiprocessing backend: {e}")
print("No result")
wrapper.py
#!/usr/bin/env python3
import abc
import datetime
import joblib as jl
class MapReduceWrapper(metaclass=abc.ABCMeta):
def __init__(self, nJobs=-1, verbose=2, backend='multiprocessing'):
"""Constructor
- nJobs: if 1 or 0, don't use parallel processing; else, use joblib
parallel processing, and forward this value to the n_jobs
argument of the joblib.Parallel's constructor
- verbose: 0: don't show anything, 1: show chunk name as info message,
2: show chunk name and progres as info message
- backend: the joblib backend. Useful because, since jl 0.12, the
locky backend is used by default, and it makes some issues;
in this case, multiprocessing backend is better. If
the provided value is None, it let joblib choose the
backend.
Please note that, because of parallel processing, the estimated
progress and elapsed time are not accurate
"""
self._nJobs = nJobs
self._verbose = verbose
self._useParallel = (nJobs not in (0, 1))
self._mapStart = None
self._backend = backend
self._chunks = self.computeChunks()
@abc.abstractmethod
def computeChunks(self):
"""Returns the list of the chunks
Will typically give a list based on child's class constructor's
arguments
To be implemented in the child classes
"""
@abc.abstractmethod
def map(self, chunk):
"""Gets data from the provided chunk and returns the result
- chunk: the chunk identifier
To be implemented in the child classes
"""
@abc.abstractmethod
def reduce(self, mapResults):
"""Reduce the data retrieven from every map call and returns the result
- mapResults: list of map results
To be implemented in the child classes
"""
def _loggedMap(self, i, n, x):
"""Wrapper for map, logging the ongoing chunk + progress estimation
- i: id of the current chunk, from 1 to the number of chunks
- n: number of chunks
- x: chunk identifier
"""
if self._verbose == 2:
print("Mapping %s start: %d of %d: %.2f%%" % (
x,
i,
n,
100 * i / n,
))
elif self._verbose == 1:
print("Mapping %s" % x)
ret = self.map(x)
if self._verbose == 2:
progress = 100 * i / n
now = datetime.datetime.now()
elapsed = now - self._mapStart
remaining = datetime.timedelta(
seconds=(
elapsed.total_seconds() * (100 - progress) / progress
)
)
endAt = now + remaining
print(
(
"Mapping %s done: %d of %d: %.2f%% in %s, remaining %s, "
"end at %s"
) % (
x,
i,
n,
100 * i / n,
str(elapsed),
str(remaining),
str(endAt)
)
)
return ret
def get(self):
"""Computes and returns the selected data"""
n = len(self._chunks)
self._mapStart = datetime.datetime.now()
if self._useParallel:
print("Running in parallel")
if self._nJobs != -1:
self.logWarning(
"Running %d jobs in parallel, which might be different "
"than one per CPU"
)
jlKwargs = dict(n_jobs=self._nJobs, verbose=0)
if self._backend is not None:
print(
f"Using {self._backend} joblib backend "
"instead of the default one"
)
jlKwargs['backend'] = self._backend
xs = jl.Parallel(**jlKwargs)(
jl.delayed(self._loggedMap)(
i + 1, n, chunk
) for i, chunk in list(enumerate(self._chunks))
)
else:
self.logWarning(
"Parallel processing disabled; running in sequence"
)
xs = [
self._loggedMap(
i + 1, n, chunk
) for (
i, chunk
) in enumerate(
self._chunks
)
]
self._mapStart = None
ret = self.reduce(xs)
try:
ret = ret.copy()
except AttributeError:
self.logWarning(
f"Deep copy unavailable for return type {type(ret)}, "
"returning direct value instead."
)
print("Extraction done")
return ret
in_module.py
#!/usr/bin/env python3
import abc
import datetime
import joblib as jl
from vitriol.base import VitriolBase
from vitriol.algorithms.parallel import MapReduceWrapper
class MapReduceWrapper(metaclass=abc.ABCMeta):
def __init__(self, nJobs=-1, verbose=2, backend='multiprocessing'):
"""Constructor
- nJobs: if 1 or 0, don't use parallel processing; else, use joblib
parallel processing, and forward this value to the n_jobs
argument of the joblib.Parallel's constructor
- verbose: 0: don't show anything, 1: show chunk name as info message,
2: show chunk name and progres as info message
- backend: the joblib backend. Useful because, since jl 0.12, the
locky backend is used by default, and it makes some issues;
in this case, multiprocessing backend is better. If
the provided value is None, it let joblib choose the
backend.
Please note that, because of parallel processing, the estimated
progress and elapsed time are not accurate
"""
self._nJobs = nJobs
self._verbose = verbose
self._useParallel = (nJobs not in (0, 1))
self._mapStart = None
self._backend = backend
self._chunks = self.computeChunks()
@abc.abstractmethod
def computeChunks(self):
"""Returns the list of the chunks
Will typically give a list based on child's class constructor's
arguments
To be implemented in the child classes
"""
@abc.abstractmethod
def map(self, chunk):
"""Gets data from the provided chunk and returns the result
- chunk: the chunk identifier
To be implemented in the child classes
"""
@abc.abstractmethod
def reduce(self, mapResults):
"""Reduce the data retrieven from every map call and returns the result
- mapResults: list of map results
To be implemented in the child classes
"""
def _loggedMap(self, i, n, x):
"""Wrapper for map, logging the ongoing chunk + progress estimation
- i: id of the current chunk, from 1 to the number of chunks
- n: number of chunks
- x: chunk identifier
"""
if self._verbose == 2:
print("Mapping %s start: %d of %d: %.2f%%" % (
x,
i,
n,
100 * i / n,
))
elif self._verbose == 1:
print("Mapping %s" % x)
ret = self.map(x)
if self._verbose == 2:
progress = 100 * i / n
now = datetime.datetime.now()
elapsed = now - self._mapStart
remaining = datetime.timedelta(
seconds=(
elapsed.total_seconds() * (100 - progress) / progress
)
)
endAt = now + remaining
print(
(
"Mapping %s done: %d of %d: %.2f%% in %s, remaining %s, "
"end at %s"
) % (
x,
i,
n,
100 * i / n,
str(elapsed),
str(remaining),
str(endAt)
)
)
return ret
def get(self):
"""Computes and returns the selected data"""
n = len(self._chunks)
self._mapStart = datetime.datetime.now()
if self._useParallel:
print("Running in parallel")
if self._nJobs != -1:
self.logWarning(
"Running %d jobs in parallel, which might be different "
"than one per CPU"
)
jlKwargs = dict(n_jobs=self._nJobs, verbose=0)
if self._backend is not None:
print(
f"Using {self._backend} joblib backend "
"instead of the default one"
)
jlKwargs['backend'] = self._backend
xs = jl.Parallel(**jlKwargs)(
jl.delayed(self._loggedMap)(
i + 1, n, chunk
) for i, chunk in list(enumerate(self._chunks))
)
else:
self.logWarning(
"Parallel processing disabled; running in sequence"
)
xs = [
self._loggedMap(
i + 1, n, chunk
) for (
i, chunk
) in enumerate(
self._chunks
)
]
self._mapStart = None
ret = self.reduce(xs)
try:
ret = ret.copy()
except AttributeError:
self.logWarning(
f"Deep copy unavailable for return type {type(ret)}, "
"returning direct value instead."
)
print("Extraction done")
return ret
class TestJoblibLog(MapReduceWrapper):
def __init__(self, *args, **kwargs):
MapReduceWrapper.__init__(self, *args, **kwargs)
def computeChunks(self):
print("Computing chunks")
return ['foo', 'bar', 'baz']
def map(self, chunk):
print(f"Hello, {chunk}")
return chunk
def reduce(self, mapResults):
print(f"Hello, {str(mapResults)}")
return mapResults
if __name__ == "__main__":
try:
print(
"Success with default backend:",
TestJoblibLog(backend=None).get()
)
except Exception as e:
print(f"Exception with default backend: {e}")
try:
print(
"Success with mutiprocessing backend:",
TestJoblibLog(backend='multiprocessing').get()
)
except Exception as e:
print(f"Exception with multiprocessing backend: {e}")
print("No result")
Issue Analytics
- State:
- Created 5 years ago
- Comments:17 (11 by maintainers)
Top GitHub Comments
@ogrisel
I’ve checked with joblib 0.12.2 on a dedicated conda env and manuall installation of joblib.
I’ve added (the forgotten) following to test.py:
With python 3.6 and joblib 0.12.2: If works perfectly.
With python 3.7 and joblib 0.12.2: Not event multiprocessing works (can’t pickle _abc_data objects)
I see no reason why _abc_data would not be picklable. I think this need a fix in cloudpickle.