question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Incompatibility with joblib

See original GitHub issue

joblib with its loky backend is widely used for distributed computing. Without going into the details the loky backend has a lot of advantages over the multiprocessing backend.

Loguru does not seem to work with jobli/loky because of a pickle issue. I tried to apply the hints from the doc without success.

I was wondering whether it’s possible to make loguru compatible with joblib:

import sys
from joblib import Parallel, delayed
from loguru import logger


def func_async():
    logger.info("Hello")

# logger.remove()
# logger.add(sys.stderr, enqueue=True)
    
args = [delayed(func_async)() for _ in range(100)]

p = Parallel(n_jobs=16, backend="loky")
results = p(args)

The error:

---------------------------------------------------------------------------
_RemoteTraceback                          Traceback (most recent call last)
_RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/home/hadim/local/conda/envs/circus/lib/python3.8/site-packages/joblib/externals/loky/backend/queues.py", line 153, in _feed
    obj_ = dumps(obj, reducers=reducers)
  File "/home/hadim/local/conda/envs/circus/lib/python3.8/site-packages/joblib/externals/loky/backend/reduction.py", line 271, in dumps
    dump(obj, buf, reducers=reducers, protocol=protocol)
  File "/home/hadim/local/conda/envs/circus/lib/python3.8/site-packages/joblib/externals/loky/backend/reduction.py", line 264, in dump
    _LokyPickler(file, reducers=reducers, protocol=protocol).dump(obj)
  File "/home/hadim/local/conda/envs/circus/lib/python3.8/site-packages/joblib/externals/cloudpickle/cloudpickle_fast.py", line 563, in dump
    return Pickler.dump(self, obj)
TypeError: cannot pickle '_thread.lock' object
"""

The above exception was the direct cause of the following exception:

PicklingError                             Traceback (most recent call last)
/tmp/ipykernel_2094805/2576140850.py in <module>
     13 
     14 p = Parallel(n_jobs=16, backend="loky")
---> 15 results = p(args)

~/local/conda/envs/circus/lib/python3.8/site-packages/joblib/parallel.py in __call__(self, iterable)
   1052 
   1053             with self._backend.retrieval_context():
-> 1054                 self.retrieve()
   1055             # Make sure that we get a last message telling us we are done
   1056             elapsed_time = time.time() - self._start_time

~/local/conda/envs/circus/lib/python3.8/site-packages/joblib/parallel.py in retrieve(self)
    931             try:
    932                 if getattr(self._backend, 'supports_timeout', False):
--> 933                     self._output.extend(job.get(timeout=self.timeout))
    934                 else:
    935                     self._output.extend(job.get())

~/local/conda/envs/circus/lib/python3.8/site-packages/joblib/_parallel_backends.py in wrap_future_result(future, timeout)
    540         AsyncResults.get from multiprocessing."""
    541         try:
--> 542             return future.result(timeout=timeout)
    543         except CfTimeoutError as e:
    544             raise TimeoutError from e

~/local/conda/envs/circus/lib/python3.8/concurrent/futures/_base.py in result(self, timeout)
    442                     raise CancelledError()
    443                 elif self._state == FINISHED:
--> 444                     return self.__get_result()
    445                 else:
    446                     raise TimeoutError()

~/local/conda/envs/circus/lib/python3.8/concurrent/futures/_base.py in __get_result(self)
    387         if self._exception:
    388             try:
--> 389                 raise self._exception
    390             finally:
    391                 # Break a reference cycle with the exception in self._exception

PicklingError: Could not pickle the task to send it to the workers.

Issue Analytics

  • State:open
  • Created 2 years ago
  • Comments:9 (3 by maintainers)

github_iconTop GitHub Comments

2reactions
Delgancommented, Sep 5, 2021

So, I was able to reproduce the issue using a Jupyter notebook.

I can’t tell if this is related to the problem you’re facing in production.

This happens due to an internal Lock used by sys.stderr, I guess. Here is a reproducible example without involving Loguru:

import sys
from joblib import Parallel, delayed

output = sys.stderr

def func_async():
    output.write("Test")
    
args = [delayed(func_async)() for _ in range(10)]

p = Parallel(n_jobs=16, backend="loky")
results = p(args)

The thing is that the logger is configured with the sys.stderr handler by default which can’t be pickled.

I see three possible workarounds:

  • Call a setup function once the worker has been started, so that you could add the sys.stderr before running the job and thus avoiding the need to pickle it. In this case, each worker will have its own handler but shared resources won’t be protected, which is not ideal.
  • Replace the default handler with a pickable one. Wrapping the output with logger.add(lambda m: sys.stderr.write(m)) after calling logger.remove() seems to do the trick. Again, at worker initialization, handler will be deep-copied during pickling, so sys.stderr won’t be protected from parallel access.
  • Find a way to pass the logger and its handlers by making the workers inherit from it instead of pickling it (just like it’s done with args of Process).

I don’t know Joblib API, maybe you can find a clean way to pass the handler by inheritance without pickle?

0reactions
jmrichardsoncommented, Apr 8, 2022

I’m having the same issue in scripts (not Jupyter).

Read more comments on GitHub >

github_iconTop Results From Across the Web

Persistence — joblib 1.3.0.dev0 documentation - Read the Docs
Compatibility of joblib pickles across python versions is not fully supported. Note that, for a very restricted set of objects, this may appear...
Read more >
Python 2 / 3 incompatibility when fetching joblib compressed ...
For instance when running a Python 2 script that loads the olivetti dataset when it has already been loaded with Python 3 in...
Read more >
Update scikit model so it is compatible with newest version
I have a model (saved using joblib) created in Python 3.5 from scikit-learn 0.21.2, which I then analyze with the package shap version...
Read more >
Azure-core or AzureML version packages incompatibility
import os; import numpy as np; import pandas as pd; import pickle; import sklearn; import joblib; import math; from sklearn.model_selection ...
Read more >
Release History — scikit-learn 0.21.3 documentation
The v0.20.0 release notes failed to mention a backwards incompatibility in ... Enhancement Joblib is no longer vendored in scikit-learn, and becomes a ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found