question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

KeyError in distributed joblib

See original GitHub issue

I haven’t been able to reproduce the locally yet.

This is on distributed, dask, & dask-ml master, and the scikit-learn / joblib we used at the sprint.

I’m doing

%%time
with joblib.parallel_backend("dask"):
    gs.fit(X, y, classes=[0, 1])

with X and y being dask arrays (so can’t pre-scatter).

The behavior I observe is

  1. Make the call, tasks show up in the dashboard
  2. A short time later, the tasks go black / gray, indicating they failed
  3. The notebook is hanging
  4. ctrl-C to interrupt (the keyboard interrupt you see in the exception below)
tornado.application - ERROR - Exception in callback functools.partial(<function wrap.<locals>.null_wrapper at 0x7fc37ad6fbf8>, <Future finished exception=AssertionError("yield from wasn't used with future",)>)
Traceback (most recent call last):
  File "/opt/conda/lib/python3.6/site-packages/distributed/joblib.py", line 204, in maybe_to_futures
    f = call_data_futures[arg]
  File "/opt/conda/lib/python3.6/site-packages/distributed/joblib.py", line 67, in __getitem__
    ref, val = self._data[id(obj)]
KeyError: 140477560921584

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/conda/lib/python3.6/site-packages/tornado/ioloop.py", line 759, in _run_callback
    ret = callback()
  File "/opt/conda/lib/python3.6/site-packages/tornado/stack_context.py", line 276, in null_wrapper
    return fn(*args, **kwargs)
  File "/opt/conda/lib/python3.6/site-packages/tornado/ioloop.py", line 780, in _discard_future_result
    future.result()
  File "/opt/conda/lib/python3.6/site-packages/tornado/gen.py", line 1113, in run
    yielded = self.gen.send(value)
  File "/opt/conda/lib/python3.6/site-packages/distributed/joblib.py", line 244, in callback_wrapper
    callback(result)  # gets called in separate thread
  File "/home/jovyan/src/scikit-learn/sklearn/externals/joblib/parallel.py", line 326, in __call__
    self.parallel.dispatch_next()
  File "/home/jovyan/src/scikit-learn/sklearn/externals/joblib/parallel.py", line 746, in dispatch_next
    if not self.dispatch_one_batch(self._original_iterator):
  File "/home/jovyan/src/scikit-learn/sklearn/externals/joblib/parallel.py", line 774, in dispatch_one_batch
    self._dispatch(tasks)
  File "/home/jovyan/src/scikit-learn/sklearn/externals/joblib/parallel.py", line 731, in _dispatch
    job = self._backend.apply_async(batch, callback=cb)
  File "/opt/conda/lib/python3.6/site-packages/distributed/joblib.py", line 234, in apply_async
    func, args = self._to_func_args(func)
  File "/opt/conda/lib/python3.6/site-packages/distributed/joblib.py", line 224, in _to_func_args
    args = list(maybe_to_futures(args))
  File "/opt/conda/lib/python3.6/site-packages/distributed/joblib.py", line 212, in maybe_to_futures
    [f] = self.client.scatter([arg], broadcast=3)
AssertionError: yield from wasn't used with future

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
/opt/conda/lib/python3.6/site-packages/distributed/joblib.py in maybe_to_futures(args)
    203                     try:
--> 204                         f = call_data_futures[arg]
    205                     except KeyError:

/opt/conda/lib/python3.6/site-packages/distributed/joblib.py in __getitem__(self, obj)
     66     def __getitem__(self, obj):
---> 67         ref, val = self._data[id(obj)]
     68         if ref() is not obj:

KeyError: 140477560921344

During handling of the above exception, another exception occurred:

KeyboardInterrupt                         Traceback (most recent call last)
<timed exec> in <module>()

~/src/scikit-learn/sklearn/model_selection/_search.py in fit(self, X, y, groups, **fit_params)
    658                                   error_score=self.error_score)
    659           for parameters, (train, test) in product(candidate_params,
--> 660                                                    cv.split(X, y, groups)))
    661 
    662         # if one choose to see train score, "out" will contain train score info

~/src/scikit-learn/sklearn/externals/joblib/parallel.py in __call__(self, iterable)
    943                 self._iterating = self._original_iterator is not None
    944 
--> 945             while self.dispatch_one_batch(iterator):
    946                 pass
    947 

~/src/scikit-learn/sklearn/externals/joblib/parallel.py in dispatch_one_batch(self, iterator)
    772                 return False
    773             else:
--> 774                 self._dispatch(tasks)
    775                 return True
    776 

~/src/scikit-learn/sklearn/externals/joblib/parallel.py in _dispatch(self, batch)
    729         with self._lock:
    730             job_idx = len(self._jobs)
--> 731             job = self._backend.apply_async(batch, callback=cb)
    732             # A job can complete so quickly than its callback is
    733             # called before we get here, causing self._jobs to

/opt/conda/lib/python3.6/site-packages/distributed/joblib.py in apply_async(self, func, callback)
    232     def apply_async(self, func, callback=None):
    233         key = '%s-batch-%s' % (joblib_funcname(func), uuid4().hex)
--> 234         func, args = self._to_func_args(func)
    235 
    236         future = self.client.submit(func, *args, key=key, **self.submit_kwargs)

/opt/conda/lib/python3.6/site-packages/distributed/joblib.py in _to_func_args(self, func)
    222         tasks = []
    223         for f, args, kwargs in func.items:
--> 224             args = list(maybe_to_futures(args))
    225             kwargs = dict(zip(kwargs.keys(), maybe_to_futures(kwargs.values())))
    226             tasks.append((f, args, kwargs))

/opt/conda/lib/python3.6/site-packages/distributed/joblib.py in maybe_to_futures(args)
    210                             # more workers need to reuse this data concurrently
    211                             # beyond the initial broadcast arity.
--> 212                             [f] = self.client.scatter([arg], broadcast=3)
    213                             call_data_futures[arg] = f
    214 

/opt/conda/lib/python3.6/site-packages/distributed/client.py in scatter(self, data, workers, broadcast, direct, hash, maxsize, timeout, asynchronous)
   1771                              broadcast=broadcast, direct=direct,
   1772                              local_worker=local_worker, timeout=timeout,
-> 1773                              asynchronous=asynchronous, hash=hash)
   1774 
   1775     @gen.coroutine

/opt/conda/lib/python3.6/site-packages/distributed/client.py in sync(self, func, *args, **kwargs)
    650             return future
    651         else:
--> 652             return sync(self.loop, func, *args, **kwargs)
    653 
    654     def __repr__(self):

/opt/conda/lib/python3.6/site-packages/distributed/utils.py in sync(loop, func, *args, **kwargs)
    271     else:
    272         while not e.is_set():
--> 273             e.wait(10)
    274     if error[0]:
    275         six.reraise(*error[0])

/opt/conda/lib/python3.6/threading.py in wait(self, timeout)
    549             signaled = self._flag
    550             if not signaled:
--> 551                 signaled = self._cond.wait(timeout)
    552             return signaled
    553 

/opt/conda/lib/python3.6/threading.py in wait(self, timeout)
    297             else:
    298                 if timeout > 0:
--> 299                     gotit = waiter.acquire(True, timeout)
    300                 else:
    301                     gotit = waiter.acquire(False)

KeyboardInterrupt: 

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:26 (22 by maintainers)

github_iconTop GitHub Comments

1reaction
TomAugspurgercommented, Sep 19, 2018

@asifali22 what versions of distributed and joblib?

Does it make any difference on performance or any other metrics?

No, those two are aliases for the same backend.

0reactions
ebocommented, Jan 8, 2019

ok. Might have found it. I just reinstalled dask-ml with pip and it works again. I guess that installing bokeh overwrote something that dask-ml was expecting and it was incompatible. Not sure what that was other than reinstalling dask-ml also uninstalled and reinstalled sklearn.

Read more comments on GitHub >

github_iconTop Results From Across the Web

KeyError in distributed joblib · Issue #2058 · dask ... - GitHub
I haven't been able to reproduce the locally yet. This is on distributed, dask, & dask-ml master, and the scikit-learn / joblib we...
Read more >
KeyError when loading pickled scikit-learn model using joblib
With me, happened that I exported the model using from sklearn.externals import joblib and tried to load ...
Read more >
Embarrassingly parallel for loops - Joblib - Read the Docs
this fails KeyError: 'custom' # Import library to register external backend >>> import my_custom_backend_library >>> with joblib.parallel_backend('custom'): ...
Read more >
Developers - KeyError with joblib and sklearn cross_validate -
I tried be reproduce it on joblib master with scikit-learn 0.20.2 as follows. import os os.environ['SKLEARN_SITE_JOBLIB'] = "1" from dask.distributed import ...
Read more >
_dask.py · alkaline-ml/joblib - Gemfury
alkaline-ml / joblib python ... import distributed except ImportError: distributed = None if distributed is not None: from distributed.client import Client, ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found