KeyError and Worker already exists
See original GitHub issueI’m trying to setup dask with tpot.
My code looks like this:
from dask_jobqueue import LSFCluster
cluster = LSFCluster(cores=1, memory='3GB', job_extra=['-R rusage[mem=2048,scratch=8000]'],
local_directory='$TMPDIR',
walltime='12:00')
from dask.distributed import Client
client = Client(cluster)
cluster.scale(10)
from tpot import TPOTRegressor
reg = TPOTRegressor(max_time_mins=30, generations=20, population_size=96,
cv=5,
scoring='r2',
memory='auto', random_state=42, verbosity=10, use_dask=True)
reg.fit(X, y)
and I keep getting those annoying errors:
distributed.scheduler - ERROR - '74905774'
Traceback (most recent call last):
File "/cluster/home/abrahalo/.local/lib64/python3.6/site-packages/distributed/scheduler.py", line 1306, in add_worker
plugin.add_worker(scheduler=self, worker=address)
File "/cluster/home/abrahalo/.local/lib64/python3.6/site-packages/dask_jobqueue/core.py", line 62, in add_worker
self.running_jobs[job_id] = self.pending_jobs.pop(job_id)
KeyError: '74905774'
distributed.utils - ERROR - Worker already exists tcp://10.205.103.50:35780
Traceback (most recent call last):
File "/cluster/home/abrahalo/.local/lib64/python3.6/site-packages/distributed/utils.py", line 648, in log_errors
yield
File "/cluster/home/abrahalo/.local/lib64/python3.6/site-packages/distributed/scheduler.py", line 1261, in add_worker
raise ValueError("Worker already exists %s" % address)
ValueError: Worker already exists tcp://10.205.103.50:35780
I think there might be a problem with LSFCluster because it puts a lot of workers in cluster.finished_jobs
that are still running according to bjobs
and even to the dask.distributed web interface.
Issue Analytics
- State:
- Created 5 years ago
- Comments:11 (11 by maintainers)
Top Results From Across the Web
KeyError Received unregistered task of type '' on celery ...
It means that Celery can't find the implementation of the task my_app.tasks.my_task when it was called. Some possible solutions you may want ...
Read more >How to fix Python KeyError Exceptions in simple steps?
First, we access an existing key in the try-except block. If the Keyerror is not raised, there are no errors. Then the else...
Read more >How To Handle KeyError Exceptions in Python | Nick McCullum
In simple terms, a KeyError is the result of attempting to access a key within a mapping that does not actually exist in...
Read more >Python KeyError Exception Handling Examples
Python KeyError is raised when we try to access a key from dict, which doesn't exist. It's one of the built-in exception classes...
Read more >"Object already exists for key" Error with Salesforce ...
"Object already exists for key" Error with Salesforce Connector Events Listener in Multiple CloudHub Workers. You are using Mule 4 and the Salesforce ......
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
A pleasure to help!
You should try to ask this upstream in distributed, I imagine there has been some thought in this behavior.
I think that setting a dask-worker with the --memory-limit option will do the trick.
ulimit doesn’t work at all on macos and doesn’t limit effectively the memory on linux.