KeyError: 'lengths'
See original GitHub issueTrying to find an older version of distributed that is not too buggy, having trouble.
rapids 0.14 pairs with 2.17 dask/disributed.
But 2.17 hits https://github.com/dask/distributed/issues/3851
Tried 2.18 as suggested there, but 2.18 hits below even for most basic test when have 2 GPUs.
Any thoughts? I can’t go to 2.27 used by rapids 0.14 because that causes many rapids tests to fail.
from dask.distributed import Client
from dask_cuda import LocalCUDACluster
from dask import dataframe as dd
import xgboost as xgb
def main(client):
dask_df = dd.read_csv('creditcard.csv')
target = 'default payment next month'
y = dask_df['default payment next month']
X = dask_df[dask_df.columns.difference([target])]
dtrain = xgb.dask.DaskDMatrix(client, X, y)
output = xgb.dask.train(client,
# Use GPU training algorithm
{'tree_method': 'gpu_hist'},
dtrain,
num_boost_round=100,
evals=[(dtrain, 'train')])
booster = output['booster'] # booster is the trained model
history = output['history'] # A dictionary containing evaluation results
# Save the model to file
booster.save_model('xgboost-model')
print('Training evaluation history:', history)
if __name__ == '__main__':
# `LocalCUDACluster` is used for assigning GPU to XGBoost
# processes. Here `n_workers` represents the number of GPUs
# since we use one GPU per worker process.
with LocalCUDACluster(n_workers=2) as cluster:
with Client(cluster) as client:
main(client)
(base) jon@mr-dl10:/data/jon/h2oai.fullcondatest3$ python dask_cudf_example.py
distributed.nanny - ERROR - Failed to start worker
Traceback (most recent call last):
File "/home/jon/minicondadai/lib/python3.6/site-packages/distributed/nanny.py", line 758, in run
await worker
File "/home/jon/minicondadai/lib/python3.6/site-packages/distributed/core.py", line 236, in _
await self.start()
File "/home/jon/minicondadai/lib/python3.6/site-packages/distributed/worker.py", line 1085, in start
await self._register_with_scheduler()
File "/home/jon/minicondadai/lib/python3.6/site-packages/distributed/worker.py", line 811, in _register_with_scheduler
types={k: typename(v) for k, v in self.data.items()},
File "/home/jon/minicondadai/lib/python3.6/site-packages/distributed/worker.py", line 811, in <dictcomp>
types={k: typename(v) for k, v in self.data.items()},
File "/home/jon/minicondadai/lib/python3.6/_collections_abc.py", line 744, in __iter__
yield (key, self._mapping[key])
File "/home/jon/minicondadai/lib/python3.6/site-packages/dask_cuda/device_host_file.py", line 150, in __getitem__
return self.host_buffer[key]
File "/home/jon/minicondadai/lib/python3.6/site-packages/zict/buffer.py", line 78, in __getitem__
return self.slow_to_fast(key)
File "/home/jon/minicondadai/lib/python3.6/site-packages/zict/buffer.py", line 65, in slow_to_fast
value = self.slow[key]
File "/home/jon/minicondadai/lib/python3.6/site-packages/zict/func.py", line 38, in __getitem__
return self.load(self.d[key])
File "/home/jon/minicondadai/lib/python3.6/site-packages/distributed/protocol/serialize.py", line 505, in deserialize_bytes
frames = merge_frames(header, frames)
File "/home/jon/minicondadai/lib/python3.6/site-packages/distributed/protocol/utils.py", line 60, in merge_frames
lengths = list(header["lengths"])
KeyError: 'lengths'
Issue Analytics
- State:
- Created 3 years ago
- Comments:40 (12 by maintainers)
Top Results From Across the Web
I am getting error KeyError: 'duration' when it exists [closed]
I think that you misspelled the key 'duration', try to change: exam_df['duration'] = pd.to_datetime(i,(exam_df['Duration '])[i]). With:
Read more >Python KeyError Exceptions and How to Handle Them
In this tutorial, you'll learn how to handle Python KeyError exceptions. They are often caused by a bad key lookup in a dictionary,...
Read more >KeyError: "length" - load_from_disk Training Model on AWS ...
Hello everyone! I was following the workshop by @philschmid - MLOps - E2E Why is not working anymore?
Read more >keyerror in Python – How to Fix Dictionary Error
When working with dictionaries in Python, a KeyError gets raised when you try to access an item that doesn't exist in a Python...
Read more >How to Fix: KeyError in Pandas - GeeksforGeeks
How to Fix: ValueError: Operands could not be broadcast together with shapes? 8. How to Fix: ValueError: All arrays must be of the...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
I think it would be a large undertaking to patch/work around
To answer your question:
I would not expect latest dask/distributed to work that far back as a lot of changes occurred in the serialization layers between Dask and RAPIDS. The errors you posted a probably a result of those changes. @jakirkham do you have any thoughts here ?