question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Could not deserialize task when using `npartitions="auto"` in `DataFrame.set_index()`

See original GitHub issue

What happened:

When using npartitions="auto" in DataFrame.set_index() on a local distributed cluster, a “Could not deserialize task” error occurs (see code and output below).

This happens only when:

  1. Instantiating a cluster using dask.distributed.Client() (only tested on local, but the intended use case is Dask on Kubernetes).
  2. Repartitioning using “auto” number of partitions.

What you expected to happen:

No error when using npartitions="auto". The following scenarios work, and all of them produce output like this:

Dask DataFrame Structure:
                data
npartitions=1       
               int64
                 ...
Dask Name: sort_index, 10 tasks
  1. Not running using distributed.
import dask.dataframe as dd
import pandas as pd

# No call to dask.distributed.Client()

dd.from_pandas(
    pd.DataFrame({
        "id": pd.Series(dtype="str"),
        "data": pd.Series(dtype="int")
    }),
    npartitions=1
).set_index("id", npartitions="auto")
  1. Using fixed number of partitions:
from dask.distributed import Client
import dask.dataframe as dd
import pandas as pd

client = Client()

dd.from_pandas(
    pd.DataFrame({
        "id": pd.Series(dtype="str"),
        "data": pd.Series(dtype="int")
    }),
    npartitions=1
).set_index("id", npartitions=1)
  1. Not specifying npartitions argument in set_index():
from dask.distributed import Client
import dask.dataframe as dd
import pandas as pd

client = Client()

dd.from_pandas(
    pd.DataFrame({
        "id": pd.Series(dtype="str"),
        "data": pd.Series(dtype="int")
    }),
    npartitions=1
).set_index("id")

Minimal Complete Verifiable Example:

from dask.distributed import Client
import dask.dataframe as dd
import pandas as pd

client = Client()

dd.from_pandas(
    pd.DataFrame({
        "id": pd.Series(dtype="str"),
        "data": pd.Series(dtype="int")
    }),
    npartitions=1
).set_index("id", npartitions="auto")

Output:

distributed.worker - ERROR - Could not deserialize task
Traceback (most recent call last):
  File "/Users/aloysius/mambaforge/envs/swarm/lib/python3.8/site-packages/distributed/worker.py", line 4044, in loads_function
    result = cache_loads[bytes_object]
  File "/Users/aloysius/mambaforge/envs/swarm/lib/python3.8/site-packages/distributed/utils.py", line 1354, in __getitem__
    value = super().__getitem__(key)
  File "/Users/aloysius/mambaforge/envs/swarm/lib/python3.8/collections/__init__.py", line 1010, in __getitem__
    raise KeyError(key)
KeyError: b'\x80\x04\x95|\x13\x00\x00\x00\x00\x00\x00\x8c\x11dask.optimization\x94\x8c\x10SubgraphCallable\x94\x93\x94(}\x94(\x8c\'sizeof-a3c10fd37d65e26bfa433c2cdb2dd83c\x94(\x8c\ndask.utils\x94\x8c\x05apply\x94\x93\x94\x8c\x13dask.dataframe.core\x94\x8c\x11apply_and_enforce\x94\x93\x94]\x94\x8c\x13__dask_blockwise__0\x94a\x8c\x08builtins\x94\x8c\x04dict\x94\x93\x94]\x94(]\x94(\x8c\x05_func\x94h\x05\x8c\x08Dispatch\x94\x93\x94)\x81\x94}\x94(\x8c\x07_lookup\x94}\x94(h\r\x8c\x06object\x94\x93\x94\x8c\x0bdask.sizeof\x94\x8c\x0esizeof_default\x94\x93\x94h\r\x8c\tbytearray\x94\x93\x94h\x1b\x8c\x0csizeof_bytes\x94\x93\x94h\r\x8c\x05bytes\x94\x93\x94h!h\r\x8c\nmemoryview\x94\x93\x94h\x1b\x8c\x11sizeof_memoryview\x94\x93\x94\x8c\x05array\x94\x8c\x05array\x94\x93\x94h\x1b\x8c\x0csizeof_array\x94\x93\x94h\r\x8c\tfrozenset\x94\x93\x94h\x1b\x8c\x18sizeof_python_collection\x94\x93\x94h\r\x8c\x03set\x94\x93\x94h0h\r\x8c\x05tuple\x94\x93\x94h0h\r\x8c\x04list\x94\x93\x94h0h\x1b\x8c\x0cSimpleSizeof\x94\x93\x94h\x1b\x8c\x0esizeof_blocked\x94\x93\x94h\x0fh\x1b\x8c\x12sizeof_python_dict\x94\x93\x94\x8c\x11pandas.core.frame\x94\x8c\tDataFrame\x94\x93\x94\x8c\x17cloudpickle.cloudpickle\x94\x8c\r_builtin_type\x94\x93\x94\x8c\nLambdaType\x94\x85\x94R\x94(hB\x8c\x08CodeType\x94\x85\x94R\x94(K\x01K\x00K\x00K\x04K\x05K\x13CPt\x00|\x00j\x01\x83\x01}\x01|\x00\xa0\x02\xa1\x00D\x00]0\\\x02}\x02}\x03|\x01|\x03j\x03d\x01d\x02\x8d\x017\x00}\x01|\x03j\x04t\x05k\x02r\x12|\x01\x88\x00|\x03j\x06\x83\x017\x00}\x01q\x12t\x07|\x01\x83\x01d\x03\x17\x00S\x00\x94(N\x89\x8c\x05index\x94\x85\x94M\xe8\x03t\x94(\x8c\x06sizeof\x94hJ\x8c\titeritems\x94\x8c\x0cmemory_usage\x94\x8c\x05dtype\x94\x8c\x06object\x94\x8c\x07_values\x94\x8c\x03int\x94t\x94(\x8c\x02df\x94\x8c\x01p\x94\x8c\x04name\x94\x8c\x03col\x94t\x94\x8cP/Users/aloysius/mambaforge/envs/swarm/lib/python3.8/site-packages/dask/sizeof.py\x94\x8c\x17sizeof_pandas_dataframe\x94K\x94C\x0c\x00\x02\n\x01\x10\x01\x10\x01\n\x01\x10\x01\x94\x8c\x0bobject_size\x94\x85\x94)t\x94R\x94}\x94(\x8c\x0b__package__\x94\x8c\x04dask\x94\x8c\x08__name__\x94h\x1b\x8c\x08__file__\x94\x8cP/Users/aloysius/mambaforge/envs/swarm/lib/python3.8/site-packages/dask/sizeof.py\x94uNNh@\x8c\x10_make_empty_cell\x94\x93\x94)R\x94\x85\x94t\x94R\x94\x8c\x1ccloudpickle.cloudpickle_fast\x94\x8c\x12_function_setstate\x94\x93\x94hl}\x94}\x94(hdh[\x8c\x0c__qualname__\x94\x8c0register_pandas.<locals>.sizeof_pandas_dataframe\x94\x8c\x0f__annotations__\x94}\x94\x8c\x0e__kwdefaults__\x94N\x8c\x0c__defaults__\x94N\x8c\n__module__\x94h\x1b\x8c\x07__doc__\x94N\x8c\x0b__closure__\x94h@\x8c\n_make_cell\x94\x93\x94hE(hH(K\x01K\x00K\x00K\x02K\x05K\x13C@t\x00|\x00\x83\x01s\x0cd\x01S\x00\x88\x00j\x01j\x02|\x00d\x02d\x03d\x04\x8d\x03}\x01t\x03t\x04t\x05|\x01\x83\x02\x83\x01}\x01t\x06|\x01\x83\x01d\x02\x1b\x00t\x00|\x00\x83\x01\x14\x00S\x00\x94(NK\x00K\x14\x88\x8c\x04size\x94\x8c\x07replace\x94\x86\x94t\x94(\x8c\x03len\x94\x8c\x06random\x94\x8c\x06choice\x94\x8c\x04list\x94\x8c\x03map\x94hM\x8c\x03sum\x94t\x94\x8c\x01x\x94\x8c\x06sample\x94\x86\x94hZh]K\x8dC\n\x00\x01\x08\x01\x04\x01\x12\x01\x0e\x01\x94\x8c\x02np\x94\x85\x94)t\x94R\x94haNNhh)R\x94\x85\x94t\x94R\x94hoh\x94}\x94}\x94(hdh]hr\x8c$register_pandas.<locals>.object_size\x94ht}\x94hvNhwNhxh\x1bhyNhzh|h@\x8c\tsubimport\x94\x93\x94\x8c\x05numpy\x94\x85\x94R\x94\x85\x94R\x94\x85\x94\x8c\x17_cloudpickle_submodules\x94]\x94h\x9a\x8c\x0cnumpy.random\x94\x85\x94R\x94a\x8c\x0b__globals__\x94}\x94hMh\x15su\x86\x94\x86R0\x85\x94R\x94\x85\x94h\xa1]\x94h\xa6}\x94hMh\x15su\x86\x94\x86R0\x8c\x12pandas.core.series\x94\x8c\x06Series\x94\x93\x94hE(hH(K\x01K\x00K\x00K\x02K\x04K\x13CNt\x00|\x00j\x01d\x01d\x02\x8d\x01\x83\x01}\x01|\x00j\x02t\x03k\x02r(|\x01\x88\x00|\x00j\x04\x83\x017\x00}\x01|\x00j\x05j\x02t\x03k\x02rB|\x01\x88\x00|\x00j\x05\x83\x017\x00}\x01t\x00|\x01\x83\x01d\x03\x17\x00S\x00\x94(N\x88hKM\xe8\x03t\x94(hShOhPhQhRhJt\x94\x8c\x01s\x94hV\x86\x94hZ\x8c\x14sizeof_pandas_series\x94K\x9dC\x0c\x00\x02\x10\x01\n\x01\x0e\x01\x0c\x01\x0e\x01\x94h^)t\x94R\x94haNNhh)R\x94\x85\x94t\x94R\x94hoh\xbe}\x94}\x94(hdh\xb7hr\x8c-register_pandas.<locals>.sizeof_pandas_series\x94ht}\x94hvNhwNhxh\x1bhyNhzh\xaa\x85\x94h\xa1]\x94h\xa6}\x94u\x86\x94\x86R0\x8c\x18pandas.core.indexes.base\x94\x8c\x05Index\x94\x93\x94hE(hH(K\x01K\x00K\x00K\x02K\x03K\x13C.t\x00|\x00\xa0\x01\xa1\x00\x83\x01}\x01|\x00j\x02t\x03k\x02r"|\x01\x88\x00|\x00\x83\x017\x00}\x01t\x00|\x01\x83\x01d\x01\x17\x00S\x00\x94NM\xe8\x03\x86\x94(hShOhPhQt\x94\x8c\x01i\x94hV\x86\x94hZ\x8c\x13sizeof_pandas_index\x94K\xa6C\x08\x00\x02\x0c\x01\n\x01\x0c\x01\x94h^)t\x94R\x94haNNhh)R\x94\x85\x94t\x94R\x94hoh\xd6}\x94}\x94(hdh\xcfhr\x8c,register_pandas.<locals>.sizeof_pandas_index\x94ht}\x94hvNhwNhxh\x1bhyNhzh\xaa\x85\x94h\xa1]\x94h\xa6}\x94u\x86\x94\x86R0\x8c\x19pandas.core.indexes.multi\x94\x8c\nMultiIndex\x94\x93\x94hE(hH(K\x01K\x00K\x00K\x03K\x05K\x13CNt\x00t\x01\x87\x00f\x01d\x01d\x02\x84\x08|\x00j\x02D\x00\x83\x01\x83\x01\x83\x01}\x01t\x03|\x00d\x03\x83\x02r,|\x00j\x04n\x04|\x00j\x05D\x00]\x0e}\x02|\x01|\x02j\x067\x00}\x01q2t\x00|\x01\x83\x01d\x04\x17\x00S\x00\x94(NhH(K\x01K\x00K\x00K\x02K\x03K3C\x16|\x00]\x0e}\x01\x88\x00|\x01\x83\x01V\x00\x01\x00q\x02d\x00S\x00\x94N\x85\x94)\x8c\x02.0\x94\x8c\x01l\x94\x86\x94hZ\x8c\t<genexpr>\x94K\xafC\x04\x04\x00\x02\x00\x94h^)t\x94R\x94\x8cDregister_pandas.<locals>.sizeof_pandas_multiindex.<locals>.<genexpr>\x94\x8c\x05codes\x94M\xe8\x03t\x94(hSh\x87\x8c\x06levels\x94\x8c\x07hasattr\x94h\xed\x8c\x06labels\x94\x8c\x06nbytes\x94t\x94h\xcdhV\x8c\x01c\x94\x87\x94hZ\x8c\x18sizeof_pandas_multiindex\x94K\xadC\x08\x00\x02\x1c\x01\x1a\x01\x0c\x01\x94h^)t\x94R\x94haNNhh)R\x94\x85\x94t\x94R\x94hoh\xfd}\x94}\x94(hdh\xf6hr\x8c1register_pandas.<locals>.sizeof_pandas_multiindex\x94ht}\x94hvNhwNhxh\x1bhyNhzh\xaa\x85\x94h\xa1]\x94h\xa6}\x94u\x86\x94\x86R0h\r\x8c\x03str\x94\x93\x94h\x1dh\r\x8c\x04type\x94\x93\x94N\x85\x94R\x94h\x1dh\r\x8c\x04bool\x94\x93\x94h\x1dh\x9b\x8c\x07ndarray\x94\x93\x94hE(hH(K\x01K\x00K\x00K\x02K\x04KSC2d\x01|\x00j\x00k\x06r(|\x00t\x01d\x02d\x03\x84\x00|\x00j\x00D\x00\x83\x01\x83\x01\x19\x00}\x01|\x01j\x02S\x00t\x03|\x00j\x02\x83\x01S\x00\x94(NK\x00hH(K\x01K\x00K\x00K\x02K\x03KsC&|\x00]\x1e}\x01|\x01d\x00k\x03r\x16t\x00d\x01\x83\x01n\x06t\x00d\x02\x83\x01V\x00\x01\x00q\x02d\x01S\x00\x94K\x00NK\x01\x87\x94\x8c\x05slice\x94\x85\x94h\xe5h\xb5\x86\x94hZh\xe8K\x83C\x04\x04\x00\x02\x00\x94))t\x94R\x94\x8c?register_numpy.<locals>.sizeof_numpy_ndarray.<locals>.<genexpr>\x94t\x94(\x8c\x07strides\x94\x8c\x05tuple\x94h\xf2hSt\x94h\x89\x8c\x02xs\x94\x86\x94hZ\x8c\x14sizeof_numpy_ndarray\x94K\x80C\x08\x00\x02\n\x01\x18\x01\x06\x01\x94))t\x94R\x94haNNNt\x94R\x94hoj%\x01\x00\x00}\x94}\x94(hdj \x01\x00\x00hr\x8c,register_numpy.<locals>.sizeof_numpy_ndarray\x94ht}\x94hvNhwNhxh\x1bhyNhzNh\xa1]\x94h\xa6}\x94u\x86\x94\x86R0h@\x8c\x14_make_skeleton_class\x94\x93\x94(j.\x01\x00\x00(j\t\x01\x00\x00\x8c\n_DTypeMeta\x94j\t\x01\x00\x00\x85\x94}\x94\x8c 59f7a7c093a8408cb673ce696810d0d8\x94Nt\x94R\x94hm\x8c\x0f_class_setstate\x94\x93\x94j4\x01\x00\x00}\x94(\x8c\x08__init__\x94\x8c\x08builtins\x94\x8c\x07getattr\x94\x93\x94j4\x01\x00\x00j8\x01\x00\x00\x86\x94R\x94\x8c\x07__new__\x94j;\x01\x00\x00j4\x01\x00\x00\x8c\x07__new__\x94\x86\x94R\x94\x8c\t_abstract\x94j;\x01\x00\x00j4\x01\x00\x00jB\x01\x00\x00\x86\x94R\x94\x8c\x04type\x94j;\x01\x00\x00j4\x01\x00\x00jE\x01\x00\x00\x86\x94R\x94\x8c\x0b_parametric\x94j;\x01\x00\x00j4\x01\x00\x00jH\x01\x00\x00\x86\x94R\x94hy\x8c;Preliminary NumPy API: The Type of NumPy DTypes (metaclass)\x94u}\x94\x86\x94\x86R0\x8c\x0edtype[object_]\x94h\x9bhP\x93\x94\x85\x94}\x94\x8c 907c0e4265ec4814bf26c2d9a242a34a\x94Nt\x94R\x94j6\x01\x00\x00jT\x01\x00\x00}\x94(j>\x01\x00\x00j;\x01\x00\x00jT\x01\x00\x00\x8c\x07__new__\x94\x86\x94R\x94hyNu}\x94\x86\x94\x86R0h\x1dh\r\x8c\x03int\x94\x93\x94h\x1du\x8c\x05_lazy\x94}\x94(\x8c\x04cupy\x94h\x1b\x8c\rregister_cupy\x94\x93\x94\x8c\x05numba\x94h\x1b\x8c\x0eregister_numba\x94\x93\x94\x8c\x03rmm\x94h\x1b\x8c\x0cregister_rmm\x94\x93\x94\x8c\x05scipy\x94h\x1b\x8c\x11register_spmatrix\x94\x93\x94\x8c\x07pyarrow\x94h\x1b\x8c\x10register_pyarrow\x94\x93\x94uhdhMube]\x94(\x8c\x05_meta\x94h\xb1)\x81\x94}\x94(\x8c\x04_mgr\x94\x8c\x1epandas.core.internals.managers\x94\x8c\x12SingleBlockManager\x94\x93\x94)\x81\x94(]\x94h\xc7\x8c\n_new_Index\x94\x93\x94\x8c\x19pandas.core.indexes.range\x94\x8c\nRangeIndex\x94\x93\x94}\x94(hWN\x8c\x05start\x94K\x00\x8c\x04stop\x94K\x00\x8c\x04step\x94K\x01u\x86\x94R\x94a]\x94\x8c\x15numpy.core.multiarray\x94\x8c\x0c_reconstruct\x94\x93\x94j\x0f\x01\x00\x00K\x00\x85\x94C\x01b\x94\x87\x94R\x94(K\x01K\x00\x85\x94jO\x01\x00\x00\x8c\x02i8\x94\x89\x88\x87\x94R\x94(K\x03\x8c\x01<\x94NNNJ\xff\xff\xff\xffJ\xff\xff\xff\xffK\x00t\x94b\x89C\x00\x94t\x94ba]\x94jy\x01\x00\x00\x8c\x1bpandas.core.indexes.numeric\x94\x8c\nInt64Index\x94\x93\x94}\x94(\x8c\x04data\x94j\x86\x01\x00\x00j\x0f\x01\x00\x00K\x00\x85\x94j\x88\x01\x00\x00\x87\x94R\x94(K\x01K\x00\x85\x94j\x8e\x01\x00\x00\x89j\x91\x01\x00\x00t\x94bhWNu\x86\x94R\x94a}\x94\x8c\x060.14.1\x94}\x94(\x8c\x04axes\x94jw\x01\x00\x00\x8c\x06blocks\x94]\x94}\x94(\x8c\x06values\x94j\x8a\x01\x00\x00\x8c\x08mgr_locs\x94j\x86\x01\x00\x00j\x0f\x01\x00\x00K\x00\x85\x94j\x88\x01\x00\x00\x87\x94R\x94(K\x01K\x00\x85\x94j\x8e\x01\x00\x00\x89j\x91\x01\x00\x00t\x94buaust\x94b\x8c\x04_typ\x94\x8c\x06series\x94\x8c\t_metadata\x94]\x94hWa\x8c\x05attrs\x94}\x94\x8c\x06_flags\x94}\x94\x8c\x17allows_duplicate_labels\x94\x88shWNubee\x86\x94t\x94\x8c\x13__dask_blockwise__1\x94\x8c,from_pandas-1f79779fe4332aa2089519b52d3e3b08\x94uh\x04\x8c\x13__dask_blockwise__0\x94\x85\x94\x8c6subgraph_callable-240ce47e-86ed-41ab-88e2-655035bf9e42\x94t\x94R\x94.'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/aloysius/mambaforge/envs/swarm/lib/python3.8/site-packages/distributed/worker.py", line 3100, in _maybe_deserialize_task
    function, args, kwargs = _deserialize(*ts.runspec)
  File "/Users/aloysius/mambaforge/envs/swarm/lib/python3.8/site-packages/distributed/worker.py", line 4055, in _deserialize
    function = loads_function(function)
  File "/Users/aloysius/mambaforge/envs/swarm/lib/python3.8/site-packages/distributed/worker.py", line 4046, in loads_function
    result = pickle.loads(bytes_object)
  File "/Users/aloysius/mambaforge/envs/swarm/lib/python3.8/site-packages/distributed/protocol/pickle.py", line 75, in loads
    return pickle.loads(x)
AttributeError: type object '_DTypeMeta' has no attribute '_abstract'
distributed.worker - ERROR - Exception during execution of task ('sizeof-a3c10fd37d65e26bfa433c2cdb2dd83c', 0).
Traceback (most recent call last):
  File "/Users/aloysius/mambaforge/envs/swarm/lib/python3.8/site-packages/distributed/worker.py", line 4044, in loads_function
    result = cache_loads[bytes_object]
  File "/Users/aloysius/mambaforge/envs/swarm/lib/python3.8/site-packages/distributed/utils.py", line 1354, in __getitem__
    value = super().__getitem__(key)
  File "/Users/aloysius/mambaforge/envs/swarm/lib/python3.8/collections/__init__.py", line 1010, in __getitem__
    raise KeyError(key)
KeyError: b'\x80\x04\x95|\x13\x00\x00\x00\x00\x00\x00\x8c\x11dask.optimization\x94\x8c\x10SubgraphCallable\x94\x93\x94(}\x94(\x8c\'sizeof-a3c10fd37d65e26bfa433c2cdb2dd83c\x94(\x8c\ndask.utils\x94\x8c\x05apply\x94\x93\x94\x8c\x13dask.dataframe.core\x94\x8c\x11apply_and_enforce\x94\x93\x94]\x94\x8c\x13__dask_blockwise__0\x94a\x8c\x08builtins\x94\x8c\x04dict\x94\x93\x94]\x94(]\x94(\x8c\x05_func\x94h\x05\x8c\x08Dispatch\x94\x93\x94)\x81\x94}\x94(\x8c\x07_lookup\x94}\x94(h\r\x8c\x06object\x94\x93\x94\x8c\x0bdask.sizeof\x94\x8c\x0esizeof_default\x94\x93\x94h\r\x8c\tbytearray\x94\x93\x94h\x1b\x8c\x0csizeof_bytes\x94\x93\x94h\r\x8c\x05bytes\x94\x93\x94h!h\r\x8c\nmemoryview\x94\x93\x94h\x1b\x8c\x11sizeof_memoryview\x94\x93\x94\x8c\x05array\x94\x8c\x05array\x94\x93\x94h\x1b\x8c\x0csizeof_array\x94\x93\x94h\r\x8c\tfrozenset\x94\x93\x94h\x1b\x8c\x18sizeof_python_collection\x94\x93\x94h\r\x8c\x03set\x94\x93\x94h0h\r\x8c\x05tuple\x94\x93\x94h0h\r\x8c\x04list\x94\x93\x94h0h\x1b\x8c\x0cSimpleSizeof\x94\x93\x94h\x1b\x8c\x0esizeof_blocked\x94\x93\x94h\x0fh\x1b\x8c\x12sizeof_python_dict\x94\x93\x94\x8c\x11pandas.core.frame\x94\x8c\tDataFrame\x94\x93\x94\x8c\x17cloudpickle.cloudpickle\x94\x8c\r_builtin_type\x94\x93\x94\x8c\nLambdaType\x94\x85\x94R\x94(hB\x8c\x08CodeType\x94\x85\x94R\x94(K\x01K\x00K\x00K\x04K\x05K\x13CPt\x00|\x00j\x01\x83\x01}\x01|\x00\xa0\x02\xa1\x00D\x00]0\\\x02}\x02}\x03|\x01|\x03j\x03d\x01d\x02\x8d\x017\x00}\x01|\x03j\x04t\x05k\x02r\x12|\x01\x88\x00|\x03j\x06\x83\x017\x00}\x01q\x12t\x07|\x01\x83\x01d\x03\x17\x00S\x00\x94(N\x89\x8c\x05index\x94\x85\x94M\xe8\x03t\x94(\x8c\x06sizeof\x94hJ\x8c\titeritems\x94\x8c\x0cmemory_usage\x94\x8c\x05dtype\x94\x8c\x06object\x94\x8c\x07_values\x94\x8c\x03int\x94t\x94(\x8c\x02df\x94\x8c\x01p\x94\x8c\x04name\x94\x8c\x03col\x94t\x94\x8cP/Users/aloysius/mambaforge/envs/swarm/lib/python3.8/site-packages/dask/sizeof.py\x94\x8c\x17sizeof_pandas_dataframe\x94K\x94C\x0c\x00\x02\n\x01\x10\x01\x10\x01\n\x01\x10\x01\x94\x8c\x0bobject_size\x94\x85\x94)t\x94R\x94}\x94(\x8c\x0b__package__\x94\x8c\x04dask\x94\x8c\x08__name__\x94h\x1b\x8c\x08__file__\x94\x8cP/Users/aloysius/mambaforge/envs/swarm/lib/python3.8/site-packages/dask/sizeof.py\x94uNNh@\x8c\x10_make_empty_cell\x94\x93\x94)R\x94\x85\x94t\x94R\x94\x8c\x1ccloudpickle.cloudpickle_fast\x94\x8c\x12_function_setstate\x94\x93\x94hl}\x94}\x94(hdh[\x8c\x0c__qualname__\x94\x8c0register_pandas.<locals>.sizeof_pandas_dataframe\x94\x8c\x0f__annotations__\x94}\x94\x8c\x0e__kwdefaults__\x94N\x8c\x0c__defaults__\x94N\x8c\n__module__\x94h\x1b\x8c\x07__doc__\x94N\x8c\x0b__closure__\x94h@\x8c\n_make_cell\x94\x93\x94hE(hH(K\x01K\x00K\x00K\x02K\x05K\x13C@t\x00|\x00\x83\x01s\x0cd\x01S\x00\x88\x00j\x01j\x02|\x00d\x02d\x03d\x04\x8d\x03}\x01t\x03t\x04t\x05|\x01\x83\x02\x83\x01}\x01t\x06|\x01\x83\x01d\x02\x1b\x00t\x00|\x00\x83\x01\x14\x00S\x00\x94(NK\x00K\x14\x88\x8c\x04size\x94\x8c\x07replace\x94\x86\x94t\x94(\x8c\x03len\x94\x8c\x06random\x94\x8c\x06choice\x94\x8c\x04list\x94\x8c\x03map\x94hM\x8c\x03sum\x94t\x94\x8c\x01x\x94\x8c\x06sample\x94\x86\x94hZh]K\x8dC\n\x00\x01\x08\x01\x04\x01\x12\x01\x0e\x01\x94\x8c\x02np\x94\x85\x94)t\x94R\x94haNNhh)R\x94\x85\x94t\x94R\x94hoh\x94}\x94}\x94(hdh]hr\x8c$register_pandas.<locals>.object_size\x94ht}\x94hvNhwNhxh\x1bhyNhzh|h@\x8c\tsubimport\x94\x93\x94\x8c\x05numpy\x94\x85\x94R\x94\x85\x94R\x94\x85\x94\x8c\x17_cloudpickle_submodules\x94]\x94h\x9a\x8c\x0cnumpy.random\x94\x85\x94R\x94a\x8c\x0b__globals__\x94}\x94hMh\x15su\x86\x94\x86R0\x85\x94R\x94\x85\x94h\xa1]\x94h\xa6}\x94hMh\x15su\x86\x94\x86R0\x8c\x12pandas.core.series\x94\x8c\x06Series\x94\x93\x94hE(hH(K\x01K\x00K\x00K\x02K\x04K\x13CNt\x00|\x00j\x01d\x01d\x02\x8d\x01\x83\x01}\x01|\x00j\x02t\x03k\x02r(|\x01\x88\x00|\x00j\x04\x83\x017\x00}\x01|\x00j\x05j\x02t\x03k\x02rB|\x01\x88\x00|\x00j\x05\x83\x017\x00}\x01t\x00|\x01\x83\x01d\x03\x17\x00S\x00\x94(N\x88hKM\xe8\x03t\x94(hShOhPhQhRhJt\x94\x8c\x01s\x94hV\x86\x94hZ\x8c\x14sizeof_pandas_series\x94K\x9dC\x0c\x00\x02\x10\x01\n\x01\x0e\x01\x0c\x01\x0e\x01\x94h^)t\x94R\x94haNNhh)R\x94\x85\x94t\x94R\x94hoh\xbe}\x94}\x94(hdh\xb7hr\x8c-register_pandas.<locals>.sizeof_pandas_series\x94ht}\x94hvNhwNhxh\x1bhyNhzh\xaa\x85\x94h\xa1]\x94h\xa6}\x94u\x86\x94\x86R0\x8c\x18pandas.core.indexes.base\x94\x8c\x05Index\x94\x93\x94hE(hH(K\x01K\x00K\x00K\x02K\x03K\x13C.t\x00|\x00\xa0\x01\xa1\x00\x83\x01}\x01|\x00j\x02t\x03k\x02r"|\x01\x88\x00|\x00\x83\x017\x00}\x01t\x00|\x01\x83\x01d\x01\x17\x00S\x00\x94NM\xe8\x03\x86\x94(hShOhPhQt\x94\x8c\x01i\x94hV\x86\x94hZ\x8c\x13sizeof_pandas_index\x94K\xa6C\x08\x00\x02\x0c\x01\n\x01\x0c\x01\x94h^)t\x94R\x94haNNhh)R\x94\x85\x94t\x94R\x94hoh\xd6}\x94}\x94(hdh\xcfhr\x8c,register_pandas.<locals>.sizeof_pandas_index\x94ht}\x94hvNhwNhxh\x1bhyNhzh\xaa\x85\x94h\xa1]\x94h\xa6}\x94u\x86\x94\x86R0\x8c\x19pandas.core.indexes.multi\x94\x8c\nMultiIndex\x94\x93\x94hE(hH(K\x01K\x00K\x00K\x03K\x05K\x13CNt\x00t\x01\x87\x00f\x01d\x01d\x02\x84\x08|\x00j\x02D\x00\x83\x01\x83\x01\x83\x01}\x01t\x03|\x00d\x03\x83\x02r,|\x00j\x04n\x04|\x00j\x05D\x00]\x0e}\x02|\x01|\x02j\x067\x00}\x01q2t\x00|\x01\x83\x01d\x04\x17\x00S\x00\x94(NhH(K\x01K\x00K\x00K\x02K\x03K3C\x16|\x00]\x0e}\x01\x88\x00|\x01\x83\x01V\x00\x01\x00q\x02d\x00S\x00\x94N\x85\x94)\x8c\x02.0\x94\x8c\x01l\x94\x86\x94hZ\x8c\t<genexpr>\x94K\xafC\x04\x04\x00\x02\x00\x94h^)t\x94R\x94\x8cDregister_pandas.<locals>.sizeof_pandas_multiindex.<locals>.<genexpr>\x94\x8c\x05codes\x94M\xe8\x03t\x94(hSh\x87\x8c\x06levels\x94\x8c\x07hasattr\x94h\xed\x8c\x06labels\x94\x8c\x06nbytes\x94t\x94h\xcdhV\x8c\x01c\x94\x87\x94hZ\x8c\x18sizeof_pandas_multiindex\x94K\xadC\x08\x00\x02\x1c\x01\x1a\x01\x0c\x01\x94h^)t\x94R\x94haNNhh)R\x94\x85\x94t\x94R\x94hoh\xfd}\x94}\x94(hdh\xf6hr\x8c1register_pandas.<locals>.sizeof_pandas_multiindex\x94ht}\x94hvNhwNhxh\x1bhyNhzh\xaa\x85\x94h\xa1]\x94h\xa6}\x94u\x86\x94\x86R0h\r\x8c\x03str\x94\x93\x94h\x1dh\r\x8c\x04type\x94\x93\x94N\x85\x94R\x94h\x1dh\r\x8c\x04bool\x94\x93\x94h\x1dh\x9b\x8c\x07ndarray\x94\x93\x94hE(hH(K\x01K\x00K\x00K\x02K\x04KSC2d\x01|\x00j\x00k\x06r(|\x00t\x01d\x02d\x03\x84\x00|\x00j\x00D\x00\x83\x01\x83\x01\x19\x00}\x01|\x01j\x02S\x00t\x03|\x00j\x02\x83\x01S\x00\x94(NK\x00hH(K\x01K\x00K\x00K\x02K\x03KsC&|\x00]\x1e}\x01|\x01d\x00k\x03r\x16t\x00d\x01\x83\x01n\x06t\x00d\x02\x83\x01V\x00\x01\x00q\x02d\x01S\x00\x94K\x00NK\x01\x87\x94\x8c\x05slice\x94\x85\x94h\xe5h\xb5\x86\x94hZh\xe8K\x83C\x04\x04\x00\x02\x00\x94))t\x94R\x94\x8c?register_numpy.<locals>.sizeof_numpy_ndarray.<locals>.<genexpr>\x94t\x94(\x8c\x07strides\x94\x8c\x05tuple\x94h\xf2hSt\x94h\x89\x8c\x02xs\x94\x86\x94hZ\x8c\x14sizeof_numpy_ndarray\x94K\x80C\x08\x00\x02\n\x01\x18\x01\x06\x01\x94))t\x94R\x94haNNNt\x94R\x94hoj%\x01\x00\x00}\x94}\x94(hdj \x01\x00\x00hr\x8c,register_numpy.<locals>.sizeof_numpy_ndarray\x94ht}\x94hvNhwNhxh\x1bhyNhzNh\xa1]\x94h\xa6}\x94u\x86\x94\x86R0h@\x8c\x14_make_skeleton_class\x94\x93\x94(j.\x01\x00\x00(j\t\x01\x00\x00\x8c\n_DTypeMeta\x94j\t\x01\x00\x00\x85\x94}\x94\x8c 59f7a7c093a8408cb673ce696810d0d8\x94Nt\x94R\x94hm\x8c\x0f_class_setstate\x94\x93\x94j4\x01\x00\x00}\x94(\x8c\x08__init__\x94\x8c\x08builtins\x94\x8c\x07getattr\x94\x93\x94j4\x01\x00\x00j8\x01\x00\x00\x86\x94R\x94\x8c\x07__new__\x94j;\x01\x00\x00j4\x01\x00\x00\x8c\x07__new__\x94\x86\x94R\x94\x8c\t_abstract\x94j;\x01\x00\x00j4\x01\x00\x00jB\x01\x00\x00\x86\x94R\x94\x8c\x04type\x94j;\x01\x00\x00j4\x01\x00\x00jE\x01\x00\x00\x86\x94R\x94\x8c\x0b_parametric\x94j;\x01\x00\x00j4\x01\x00\x00jH\x01\x00\x00\x86\x94R\x94hy\x8c;Preliminary NumPy API: The Type of NumPy DTypes (metaclass)\x94u}\x94\x86\x94\x86R0\x8c\x0edtype[object_]\x94h\x9bhP\x93\x94\x85\x94}\x94\x8c 907c0e4265ec4814bf26c2d9a242a34a\x94Nt\x94R\x94j6\x01\x00\x00jT\x01\x00\x00}\x94(j>\x01\x00\x00j;\x01\x00\x00jT\x01\x00\x00\x8c\x07__new__\x94\x86\x94R\x94hyNu}\x94\x86\x94\x86R0h\x1dh\r\x8c\x03int\x94\x93\x94h\x1du\x8c\x05_lazy\x94}\x94(\x8c\x04cupy\x94h\x1b\x8c\rregister_cupy\x94\x93\x94\x8c\x05numba\x94h\x1b\x8c\x0eregister_numba\x94\x93\x94\x8c\x03rmm\x94h\x1b\x8c\x0cregister_rmm\x94\x93\x94\x8c\x05scipy\x94h\x1b\x8c\x11register_spmatrix\x94\x93\x94\x8c\x07pyarrow\x94h\x1b\x8c\x10register_pyarrow\x94\x93\x94uhdhMube]\x94(\x8c\x05_meta\x94h\xb1)\x81\x94}\x94(\x8c\x04_mgr\x94\x8c\x1epandas.core.internals.managers\x94\x8c\x12SingleBlockManager\x94\x93\x94)\x81\x94(]\x94h\xc7\x8c\n_new_Index\x94\x93\x94\x8c\x19pandas.core.indexes.range\x94\x8c\nRangeIndex\x94\x93\x94}\x94(hWN\x8c\x05start\x94K\x00\x8c\x04stop\x94K\x00\x8c\x04step\x94K\x01u\x86\x94R\x94a]\x94\x8c\x15numpy.core.multiarray\x94\x8c\x0c_reconstruct\x94\x93\x94j\x0f\x01\x00\x00K\x00\x85\x94C\x01b\x94\x87\x94R\x94(K\x01K\x00\x85\x94jO\x01\x00\x00\x8c\x02i8\x94\x89\x88\x87\x94R\x94(K\x03\x8c\x01<\x94NNNJ\xff\xff\xff\xffJ\xff\xff\xff\xffK\x00t\x94b\x89C\x00\x94t\x94ba]\x94jy\x01\x00\x00\x8c\x1bpandas.core.indexes.numeric\x94\x8c\nInt64Index\x94\x93\x94}\x94(\x8c\x04data\x94j\x86\x01\x00\x00j\x0f\x01\x00\x00K\x00\x85\x94j\x88\x01\x00\x00\x87\x94R\x94(K\x01K\x00\x85\x94j\x8e\x01\x00\x00\x89j\x91\x01\x00\x00t\x94bhWNu\x86\x94R\x94a}\x94\x8c\x060.14.1\x94}\x94(\x8c\x04axes\x94jw\x01\x00\x00\x8c\x06blocks\x94]\x94}\x94(\x8c\x06values\x94j\x8a\x01\x00\x00\x8c\x08mgr_locs\x94j\x86\x01\x00\x00j\x0f\x01\x00\x00K\x00\x85\x94j\x88\x01\x00\x00\x87\x94R\x94(K\x01K\x00\x85\x94j\x8e\x01\x00\x00\x89j\x91\x01\x00\x00t\x94buaust\x94b\x8c\x04_typ\x94\x8c\x06series\x94\x8c\t_metadata\x94]\x94hWa\x8c\x05attrs\x94}\x94\x8c\x06_flags\x94}\x94\x8c\x17allows_duplicate_labels\x94\x88shWNubee\x86\x94t\x94\x8c\x13__dask_blockwise__1\x94\x8c,from_pandas-1f79779fe4332aa2089519b52d3e3b08\x94uh\x04\x8c\x13__dask_blockwise__0\x94\x85\x94\x8c6subgraph_callable-240ce47e-86ed-41ab-88e2-655035bf9e42\x94t\x94R\x94.'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/aloysius/mambaforge/envs/swarm/lib/python3.8/site-packages/distributed/worker.py", line 3181, in execute
    function, args, kwargs = await self._maybe_deserialize_task(
  File "/Users/aloysius/mambaforge/envs/swarm/lib/python3.8/site-packages/distributed/worker.py", line 3100, in _maybe_deserialize_task
    function, args, kwargs = _deserialize(*ts.runspec)
  File "/Users/aloysius/mambaforge/envs/swarm/lib/python3.8/site-packages/distributed/worker.py", line 4055, in _deserialize
    function = loads_function(function)
  File "/Users/aloysius/mambaforge/envs/swarm/lib/python3.8/site-packages/distributed/worker.py", line 4046, in loads_function
    result = pickle.loads(bytes_object)
  File "/Users/aloysius/mambaforge/envs/swarm/lib/python3.8/site-packages/distributed/protocol/pickle.py", line 75, in loads
    return pickle.loads(x)
AttributeError: type object '_DTypeMeta' has no attribute '_abstract'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/aloysius/mambaforge/envs/swarm/lib/python3.8/site-packages/dask/dataframe/core.py", line 4344, in set_index
    return set_index(
  File "/Users/aloysius/mambaforge/envs/swarm/lib/python3.8/site-packages/dask/dataframe/shuffle.py", line 191, in set_index
    divisions, mins, maxes = _calculate_divisions(
  File "/Users/aloysius/mambaforge/envs/swarm/lib/python3.8/site-packages/dask/dataframe/shuffle.py", line 40, in _calculate_divisions
    divisions, sizes, mins, maxes = base.compute(divisions, sizes, mins, maxes)
  File "/Users/aloysius/mambaforge/envs/swarm/lib/python3.8/site-packages/dask/base.py", line 570, in compute
    results = schedule(dsk, keys, **kwargs)
  File "/Users/aloysius/mambaforge/envs/swarm/lib/python3.8/site-packages/distributed/client.py", line 2693, in get
    results = self.gather(packed, asynchronous=asynchronous, direct=direct)
  File "/Users/aloysius/mambaforge/envs/swarm/lib/python3.8/site-packages/distributed/client.py", line 1969, in gather
    return self.sync(
  File "/Users/aloysius/mambaforge/envs/swarm/lib/python3.8/site-packages/distributed/client.py", line 865, in sync
    return sync(
  File "/Users/aloysius/mambaforge/envs/swarm/lib/python3.8/site-packages/distributed/utils.py", line 327, in sync
    raise exc.with_traceback(tb)
  File "/Users/aloysius/mambaforge/envs/swarm/lib/python3.8/site-packages/distributed/utils.py", line 310, in f
    result[0] = yield future
  File "/Users/aloysius/mambaforge/envs/swarm/lib/python3.8/site-packages/tornado/gen.py", line 762, in run
    value = future.result()
  File "/Users/aloysius/mambaforge/envs/swarm/lib/python3.8/site-packages/distributed/client.py", line 1834, in _gather
    raise exception.with_traceback(traceback)
  File "/Users/aloysius/mambaforge/envs/swarm/lib/python3.8/site-packages/distributed/protocol/pickle.py", line 75, in loads
    return pickle.loads(x)
AttributeError: type object '_DTypeMeta' has no attribute '_abstract'

Anything else we need to know?:

Environment:

  • Dask version: 2021.10.0
  • Distributed version: 2021.10.0
  • Python version: 3.8.12
  • Operating System: MacOS 11.6
  • Install method (conda, pip, source): pip

Issue Analytics

  • State:open
  • Created 2 years ago
  • Comments:5 (1 by maintainers)

github_iconTop GitHub Comments

1reaction
ian-r-rosecommented, Nov 17, 2021

Hi @aloysius-lim , thanks for the update.

I was able to reproduce your issue on using numpy==1.21.4, so it seems the underlying problem in #7170 is not fully resolved. Of note: the error seems to be coming from deserializing cached values, so it makes sense that it would only crop up on the second time you are executing set_index():

distributed.worker - ERROR - Could not deserialize task
Traceback (most recent call last):
  File "/home/ian/dask/distributed/distributed/worker.py", line 4121, in loads_function
    result = cache_loads[bytes_object]
  File "/home/ian/dask/distributed/distributed/utils.py", line 1363, in __getitem__
    value = super().__getitem__(key)
  File "/home/ian/miniconda3/envs/dask/lib/python3.8/collections/__init__.py", line 1010, in __getitem__
    raise KeyError(key)
KeyError: b'\x80\x04\x95R\x13\x00\x00\x00\x00\x00\x00\x8c\x11dask.optimization\x94\x8c\x10SubgraphCallable\
...

We’ll do some more digging to see what’s going on

0reactions
carlomarxdkcommented, Sep 14, 2022

The workaround: It seems that if your index is sorted (and hence you specify sorted=True in case of df2), it works without errors.

import dask.dataframe as dd
import pandas as pd

client = Client()

df = pd.DataFrame({
    "id": pd.Series(dtype="str"),
    "data": pd.Series(dtype="int")
})

df2 = pd.DataFrame({
    "id": pd.Series(dtype="str"),
    "data": pd.Series(dtype="int")
})

# No error
ddf = dd.from_pandas(df, npartitions=1)
ddf.set_index("id", npartitions="auto")

# No error (if you specify sorted = True)
ddf2 = dd.from_pandas(df2, npartitions=1)
ddf2.set_index("id", npartitions="auto", sorted=True) ```
Read more comments on GitHub >

github_iconTop Results From Across the Web

Unable to write PySpark Dataframe created from two zipped ...
I only know how to specify the # of partitions, not the way to partition. Or, more specifically, I do not know the...
Read more >
pandas.DataFrame.set_index — pandas 1.5.2 documentation
This parameter can be either a single column key, a single array of the same length as the calling DataFrame, or a list...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found