Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

KeyError: 'lengths'

See original GitHub issue

Trying to find an older version of distributed that is not too buggy, having trouble.

rapids 0.14 pairs with 2.17 dask/disributed.

But 2.17 hits https://github.com/dask/distributed/issues/3851

Tried 2.18 as suggested there, but 2.18 hits below even for most basic test when have 2 GPUs.

Any thoughts? I can’t go to 2.27 used by rapids 0.14 because that causes many rapids tests to fail.

from dask.distributed import Client
from dask_cuda import LocalCUDACluster
from dask import dataframe as dd
import xgboost as xgb
def main(client):
    dask_df = dd.read_csv('creditcard.csv')
    target = 'default payment next month'
    y = dask_df['default payment next month']
    X = dask_df[dask_df.columns.difference([target])]
    dtrain = xgb.dask.DaskDMatrix(client, X, y)
    output = xgb.dask.train(client,
                            # Use GPU training algorithm
                            {'tree_method': 'gpu_hist'},
                            dtrain,
                            num_boost_round=100,
                            evals=[(dtrain, 'train')])
    booster = output['booster']  # booster is the trained model
    history = output['history']  # A dictionary containing evaluation results
    # Save the model to file
    booster.save_model('xgboost-model')
    print('Training evaluation history:', history)

    
if __name__ == '__main__':
    # `LocalCUDACluster` is used for assigning GPU to XGBoost 
    # processes. Here `n_workers` represents the number of GPUs 
    # since we use one GPU per worker process.
    with LocalCUDACluster(n_workers=2) as cluster:
        with Client(cluster) as client:
            main(client)

(base) jon@mr-dl10:/data/jon/h2oai.fullcondatest3$ python dask_cudf_example.py 
distributed.nanny - ERROR - Failed to start worker
Traceback (most recent call last):
  File "/home/jon/minicondadai/lib/python3.6/site-packages/distributed/nanny.py", line 758, in run
    await worker
  File "/home/jon/minicondadai/lib/python3.6/site-packages/distributed/core.py", line 236, in _
    await self.start()
  File "/home/jon/minicondadai/lib/python3.6/site-packages/distributed/worker.py", line 1085, in start
    await self._register_with_scheduler()
  File "/home/jon/minicondadai/lib/python3.6/site-packages/distributed/worker.py", line 811, in _register_with_scheduler
    types={k: typename(v) for k, v in self.data.items()},
  File "/home/jon/minicondadai/lib/python3.6/site-packages/distributed/worker.py", line 811, in <dictcomp>
    types={k: typename(v) for k, v in self.data.items()},
  File "/home/jon/minicondadai/lib/python3.6/_collections_abc.py", line 744, in __iter__
    yield (key, self._mapping[key])
  File "/home/jon/minicondadai/lib/python3.6/site-packages/dask_cuda/device_host_file.py", line 150, in __getitem__
    return self.host_buffer[key]
  File "/home/jon/minicondadai/lib/python3.6/site-packages/zict/buffer.py", line 78, in __getitem__
    return self.slow_to_fast(key)
  File "/home/jon/minicondadai/lib/python3.6/site-packages/zict/buffer.py", line 65, in slow_to_fast
    value = self.slow[key]
  File "/home/jon/minicondadai/lib/python3.6/site-packages/zict/func.py", line 38, in __getitem__
    return self.load(self.d[key])
  File "/home/jon/minicondadai/lib/python3.6/site-packages/distributed/protocol/serialize.py", line 505, in deserialize_bytes
    frames = merge_frames(header, frames)
  File "/home/jon/minicondadai/lib/python3.6/site-packages/distributed/protocol/utils.py", line 60, in merge_frames
    lengths = list(header["lengths"])
KeyError: 'lengths'

Issue Analytics

State:
Created 3 years ago
Comments:40 (12 by maintainers)

Top GitHub Comments

1reaction

quasibencommented, Dec 4, 2020

I think it would be a large undertaking to patch/work around

1reaction

quasibencommented, Dec 4, 2020

To answer your question:

should I be able to use latest dask/distributed with old rapids 0.14?

I would not expect latest dask/distributed to work that far back as a lot of changes occurred in the serialization layers between Dask and RAPIDS. The errors you posted a probably a result of those changes. @jakirkham do you have any thoughts here ?