question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

KeyError and Worker already exists

See original GitHub issue

I’m trying to setup dask with tpot.

My code looks like this:

  from dask_jobqueue import LSFCluster
cluster = LSFCluster(cores=1, memory='3GB', job_extra=['-R rusage[mem=2048,scratch=8000]'],
                    local_directory='$TMPDIR',
                    walltime='12:00')

from dask.distributed import Client
client = Client(cluster)
cluster.scale(10)

from tpot import TPOTRegressor

reg = TPOTRegressor(max_time_mins=30, generations=20, population_size=96,
                    cv=5,
                    scoring='r2',
                    memory='auto', random_state=42, verbosity=10, use_dask=True)
reg.fit(X, y)

and I keep getting those annoying errors:

distributed.scheduler - ERROR - '74905774'
Traceback (most recent call last):
File "/cluster/home/abrahalo/.local/lib64/python3.6/site-packages/distributed/scheduler.py", line 1306, in add_worker
    plugin.add_worker(scheduler=self, worker=address)
File "/cluster/home/abrahalo/.local/lib64/python3.6/site-packages/dask_jobqueue/core.py", line 62, in add_worker
    self.running_jobs[job_id] = self.pending_jobs.pop(job_id)
KeyError: '74905774'

distributed.utils - ERROR - Worker already exists tcp://10.205.103.50:35780
Traceback (most recent call last):
File "/cluster/home/abrahalo/.local/lib64/python3.6/site-packages/distributed/utils.py", line 648, in log_errors
    yield
File "/cluster/home/abrahalo/.local/lib64/python3.6/site-packages/distributed/scheduler.py", line 1261, in add_worker
    raise ValueError("Worker already exists %s" % address)
ValueError: Worker already exists tcp://10.205.103.50:35780

I think there might be a problem with LSFCluster because it puts a lot of workers in cluster.finished_jobs that are still running according to bjobs and even to the dask.distributed web interface.

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:11 (11 by maintainers)

github_iconTop GitHub Comments

1reaction
guillaumeebcommented, Oct 6, 2018

A pleasure to help!

Is there a way to signal that there is a memory error? Not a message but an exception or a special return type.

You should try to ask this upstream in distributed, I imagine there has been some thought in this behavior.

0reactions
louisabrahamcommented, Oct 7, 2018

I think that setting a dask-worker with the --memory-limit option will do the trick.

ulimit doesn’t work at all on macos and doesn’t limit effectively the memory on linux.

Read more comments on GitHub >

github_iconTop Results From Across the Web

KeyError Received unregistered task of type '' on celery ...
It means that Celery can't find the implementation of the task my_app.tasks.my_task when it was called. Some possible solutions you may want ...
Read more >
How to fix Python KeyError Exceptions in simple steps?
First, we access an existing key in the try-except block. If the Keyerror is not raised, there are no errors. Then the else...
Read more >
How To Handle KeyError Exceptions in Python | Nick McCullum
In simple terms, a KeyError is the result of attempting to access a key within a mapping that does not actually exist in...
Read more >
Python KeyError Exception Handling Examples
Python KeyError is raised when we try to access a key from dict, which doesn't exist. It's one of the built-in exception classes...
Read more >
"Object already exists for key" Error with Salesforce ...
"Object already exists for key" Error with Salesforce Connector Events Listener in Multiple CloudHub Workers. You are using Mule 4 and the Salesforce ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found