Could not deserialize task when using `npartitions="auto"` in `DataFrame.set_index()`
See original GitHub issueWhat happened:
When using npartitions="auto"
in DataFrame.set_index()
on a local distributed cluster, a “Could not deserialize task” error occurs (see code and output below).
This happens only when:
- Instantiating a cluster using
dask.distributed.Client()
(only tested on local, but the intended use case is Dask on Kubernetes). - Repartitioning using “auto” number of partitions.
What you expected to happen:
No error when using npartitions="auto"
. The following scenarios work, and all of them produce output like this:
Dask DataFrame Structure:
data
npartitions=1
int64
...
Dask Name: sort_index, 10 tasks
- Not running using distributed.
import dask.dataframe as dd
import pandas as pd
# No call to dask.distributed.Client()
dd.from_pandas(
pd.DataFrame({
"id": pd.Series(dtype="str"),
"data": pd.Series(dtype="int")
}),
npartitions=1
).set_index("id", npartitions="auto")
- Using fixed number of partitions:
from dask.distributed import Client
import dask.dataframe as dd
import pandas as pd
client = Client()
dd.from_pandas(
pd.DataFrame({
"id": pd.Series(dtype="str"),
"data": pd.Series(dtype="int")
}),
npartitions=1
).set_index("id", npartitions=1)
- Not specifying
npartitions
argument inset_index()
:
from dask.distributed import Client
import dask.dataframe as dd
import pandas as pd
client = Client()
dd.from_pandas(
pd.DataFrame({
"id": pd.Series(dtype="str"),
"data": pd.Series(dtype="int")
}),
npartitions=1
).set_index("id")
Minimal Complete Verifiable Example:
from dask.distributed import Client
import dask.dataframe as dd
import pandas as pd
client = Client()
dd.from_pandas(
pd.DataFrame({
"id": pd.Series(dtype="str"),
"data": pd.Series(dtype="int")
}),
npartitions=1
).set_index("id", npartitions="auto")
Output:
distributed.worker - ERROR - Could not deserialize task
Traceback (most recent call last):
File "/Users/aloysius/mambaforge/envs/swarm/lib/python3.8/site-packages/distributed/worker.py", line 4044, in loads_function
result = cache_loads[bytes_object]
File "/Users/aloysius/mambaforge/envs/swarm/lib/python3.8/site-packages/distributed/utils.py", line 1354, in __getitem__
value = super().__getitem__(key)
File "/Users/aloysius/mambaforge/envs/swarm/lib/python3.8/collections/__init__.py", line 1010, in __getitem__
raise KeyError(key)
KeyError: b'\x80\x04\x95|\x13\x00\x00\x00\x00\x00\x00\x8c\x11dask.optimization\x94\x8c\x10SubgraphCallable\x94\x93\x94(}\x94(\x8c\'sizeof-a3c10fd37d65e26bfa433c2cdb2dd83c\x94(\x8c\ndask.utils\x94\x8c\x05apply\x94\x93\x94\x8c\x13dask.dataframe.core\x94\x8c\x11apply_and_enforce\x94\x93\x94]\x94\x8c\x13__dask_blockwise__0\x94a\x8c\x08builtins\x94\x8c\x04dict\x94\x93\x94]\x94(]\x94(\x8c\x05_func\x94h\x05\x8c\x08Dispatch\x94\x93\x94)\x81\x94}\x94(\x8c\x07_lookup\x94}\x94(h\r\x8c\x06object\x94\x93\x94\x8c\x0bdask.sizeof\x94\x8c\x0esizeof_default\x94\x93\x94h\r\x8c\tbytearray\x94\x93\x94h\x1b\x8c\x0csizeof_bytes\x94\x93\x94h\r\x8c\x05bytes\x94\x93\x94h!h\r\x8c\nmemoryview\x94\x93\x94h\x1b\x8c\x11sizeof_memoryview\x94\x93\x94\x8c\x05array\x94\x8c\x05array\x94\x93\x94h\x1b\x8c\x0csizeof_array\x94\x93\x94h\r\x8c\tfrozenset\x94\x93\x94h\x1b\x8c\x18sizeof_python_collection\x94\x93\x94h\r\x8c\x03set\x94\x93\x94h0h\r\x8c\x05tuple\x94\x93\x94h0h\r\x8c\x04list\x94\x93\x94h0h\x1b\x8c\x0cSimpleSizeof\x94\x93\x94h\x1b\x8c\x0esizeof_blocked\x94\x93\x94h\x0fh\x1b\x8c\x12sizeof_python_dict\x94\x93\x94\x8c\x11pandas.core.frame\x94\x8c\tDataFrame\x94\x93\x94\x8c\x17cloudpickle.cloudpickle\x94\x8c\r_builtin_type\x94\x93\x94\x8c\nLambdaType\x94\x85\x94R\x94(hB\x8c\x08CodeType\x94\x85\x94R\x94(K\x01K\x00K\x00K\x04K\x05K\x13CPt\x00|\x00j\x01\x83\x01}\x01|\x00\xa0\x02\xa1\x00D\x00]0\\\x02}\x02}\x03|\x01|\x03j\x03d\x01d\x02\x8d\x017\x00}\x01|\x03j\x04t\x05k\x02r\x12|\x01\x88\x00|\x03j\x06\x83\x017\x00}\x01q\x12t\x07|\x01\x83\x01d\x03\x17\x00S\x00\x94(N\x89\x8c\x05index\x94\x85\x94M\xe8\x03t\x94(\x8c\x06sizeof\x94hJ\x8c\titeritems\x94\x8c\x0cmemory_usage\x94\x8c\x05dtype\x94\x8c\x06object\x94\x8c\x07_values\x94\x8c\x03int\x94t\x94(\x8c\x02df\x94\x8c\x01p\x94\x8c\x04name\x94\x8c\x03col\x94t\x94\x8cP/Users/aloysius/mambaforge/envs/swarm/lib/python3.8/site-packages/dask/sizeof.py\x94\x8c\x17sizeof_pandas_dataframe\x94K\x94C\x0c\x00\x02\n\x01\x10\x01\x10\x01\n\x01\x10\x01\x94\x8c\x0bobject_size\x94\x85\x94)t\x94R\x94}\x94(\x8c\x0b__package__\x94\x8c\x04dask\x94\x8c\x08__name__\x94h\x1b\x8c\x08__file__\x94\x8cP/Users/aloysius/mambaforge/envs/swarm/lib/python3.8/site-packages/dask/sizeof.py\x94uNNh@\x8c\x10_make_empty_cell\x94\x93\x94)R\x94\x85\x94t\x94R\x94\x8c\x1ccloudpickle.cloudpickle_fast\x94\x8c\x12_function_setstate\x94\x93\x94hl}\x94}\x94(hdh[\x8c\x0c__qualname__\x94\x8c0register_pandas.<locals>.sizeof_pandas_dataframe\x94\x8c\x0f__annotations__\x94}\x94\x8c\x0e__kwdefaults__\x94N\x8c\x0c__defaults__\x94N\x8c\n__module__\x94h\x1b\x8c\x07__doc__\x94N\x8c\x0b__closure__\x94h@\x8c\n_make_cell\x94\x93\x94hE(hH(K\x01K\x00K\x00K\x02K\x05K\x13C@t\x00|\x00\x83\x01s\x0cd\x01S\x00\x88\x00j\x01j\x02|\x00d\x02d\x03d\x04\x8d\x03}\x01t\x03t\x04t\x05|\x01\x83\x02\x83\x01}\x01t\x06|\x01\x83\x01d\x02\x1b\x00t\x00|\x00\x83\x01\x14\x00S\x00\x94(NK\x00K\x14\x88\x8c\x04size\x94\x8c\x07replace\x94\x86\x94t\x94(\x8c\x03len\x94\x8c\x06random\x94\x8c\x06choice\x94\x8c\x04list\x94\x8c\x03map\x94hM\x8c\x03sum\x94t\x94\x8c\x01x\x94\x8c\x06sample\x94\x86\x94hZh]K\x8dC\n\x00\x01\x08\x01\x04\x01\x12\x01\x0e\x01\x94\x8c\x02np\x94\x85\x94)t\x94R\x94haNNhh)R\x94\x85\x94t\x94R\x94hoh\x94}\x94}\x94(hdh]hr\x8c$register_pandas.<locals>.object_size\x94ht}\x94hvNhwNhxh\x1bhyNhzh|h@\x8c\tsubimport\x94\x93\x94\x8c\x05numpy\x94\x85\x94R\x94\x85\x94R\x94\x85\x94\x8c\x17_cloudpickle_submodules\x94]\x94h\x9a\x8c\x0cnumpy.random\x94\x85\x94R\x94a\x8c\x0b__globals__\x94}\x94hMh\x15su\x86\x94\x86R0\x85\x94R\x94\x85\x94h\xa1]\x94h\xa6}\x94hMh\x15su\x86\x94\x86R0\x8c\x12pandas.core.series\x94\x8c\x06Series\x94\x93\x94hE(hH(K\x01K\x00K\x00K\x02K\x04K\x13CNt\x00|\x00j\x01d\x01d\x02\x8d\x01\x83\x01}\x01|\x00j\x02t\x03k\x02r(|\x01\x88\x00|\x00j\x04\x83\x017\x00}\x01|\x00j\x05j\x02t\x03k\x02rB|\x01\x88\x00|\x00j\x05\x83\x017\x00}\x01t\x00|\x01\x83\x01d\x03\x17\x00S\x00\x94(N\x88hKM\xe8\x03t\x94(hShOhPhQhRhJt\x94\x8c\x01s\x94hV\x86\x94hZ\x8c\x14sizeof_pandas_series\x94K\x9dC\x0c\x00\x02\x10\x01\n\x01\x0e\x01\x0c\x01\x0e\x01\x94h^)t\x94R\x94haNNhh)R\x94\x85\x94t\x94R\x94hoh\xbe}\x94}\x94(hdh\xb7hr\x8c-register_pandas.<locals>.sizeof_pandas_series\x94ht}\x94hvNhwNhxh\x1bhyNhzh\xaa\x85\x94h\xa1]\x94h\xa6}\x94u\x86\x94\x86R0\x8c\x18pandas.core.indexes.base\x94\x8c\x05Index\x94\x93\x94hE(hH(K\x01K\x00K\x00K\x02K\x03K\x13C.t\x00|\x00\xa0\x01\xa1\x00\x83\x01}\x01|\x00j\x02t\x03k\x02r"|\x01\x88\x00|\x00\x83\x017\x00}\x01t\x00|\x01\x83\x01d\x01\x17\x00S\x00\x94NM\xe8\x03\x86\x94(hShOhPhQt\x94\x8c\x01i\x94hV\x86\x94hZ\x8c\x13sizeof_pandas_index\x94K\xa6C\x08\x00\x02\x0c\x01\n\x01\x0c\x01\x94h^)t\x94R\x94haNNhh)R\x94\x85\x94t\x94R\x94hoh\xd6}\x94}\x94(hdh\xcfhr\x8c,register_pandas.<locals>.sizeof_pandas_index\x94ht}\x94hvNhwNhxh\x1bhyNhzh\xaa\x85\x94h\xa1]\x94h\xa6}\x94u\x86\x94\x86R0\x8c\x19pandas.core.indexes.multi\x94\x8c\nMultiIndex\x94\x93\x94hE(hH(K\x01K\x00K\x00K\x03K\x05K\x13CNt\x00t\x01\x87\x00f\x01d\x01d\x02\x84\x08|\x00j\x02D\x00\x83\x01\x83\x01\x83\x01}\x01t\x03|\x00d\x03\x83\x02r,|\x00j\x04n\x04|\x00j\x05D\x00]\x0e}\x02|\x01|\x02j\x067\x00}\x01q2t\x00|\x01\x83\x01d\x04\x17\x00S\x00\x94(NhH(K\x01K\x00K\x00K\x02K\x03K3C\x16|\x00]\x0e}\x01\x88\x00|\x01\x83\x01V\x00\x01\x00q\x02d\x00S\x00\x94N\x85\x94)\x8c\x02.0\x94\x8c\x01l\x94\x86\x94hZ\x8c\t<genexpr>\x94K\xafC\x04\x04\x00\x02\x00\x94h^)t\x94R\x94\x8cDregister_pandas.<locals>.sizeof_pandas_multiindex.<locals>.<genexpr>\x94\x8c\x05codes\x94M\xe8\x03t\x94(hSh\x87\x8c\x06levels\x94\x8c\x07hasattr\x94h\xed\x8c\x06labels\x94\x8c\x06nbytes\x94t\x94h\xcdhV\x8c\x01c\x94\x87\x94hZ\x8c\x18sizeof_pandas_multiindex\x94K\xadC\x08\x00\x02\x1c\x01\x1a\x01\x0c\x01\x94h^)t\x94R\x94haNNhh)R\x94\x85\x94t\x94R\x94hoh\xfd}\x94}\x94(hdh\xf6hr\x8c1register_pandas.<locals>.sizeof_pandas_multiindex\x94ht}\x94hvNhwNhxh\x1bhyNhzh\xaa\x85\x94h\xa1]\x94h\xa6}\x94u\x86\x94\x86R0h\r\x8c\x03str\x94\x93\x94h\x1dh\r\x8c\x04type\x94\x93\x94N\x85\x94R\x94h\x1dh\r\x8c\x04bool\x94\x93\x94h\x1dh\x9b\x8c\x07ndarray\x94\x93\x94hE(hH(K\x01K\x00K\x00K\x02K\x04KSC2d\x01|\x00j\x00k\x06r(|\x00t\x01d\x02d\x03\x84\x00|\x00j\x00D\x00\x83\x01\x83\x01\x19\x00}\x01|\x01j\x02S\x00t\x03|\x00j\x02\x83\x01S\x00\x94(NK\x00hH(K\x01K\x00K\x00K\x02K\x03KsC&|\x00]\x1e}\x01|\x01d\x00k\x03r\x16t\x00d\x01\x83\x01n\x06t\x00d\x02\x83\x01V\x00\x01\x00q\x02d\x01S\x00\x94K\x00NK\x01\x87\x94\x8c\x05slice\x94\x85\x94h\xe5h\xb5\x86\x94hZh\xe8K\x83C\x04\x04\x00\x02\x00\x94))t\x94R\x94\x8c?register_numpy.<locals>.sizeof_numpy_ndarray.<locals>.<genexpr>\x94t\x94(\x8c\x07strides\x94\x8c\x05tuple\x94h\xf2hSt\x94h\x89\x8c\x02xs\x94\x86\x94hZ\x8c\x14sizeof_numpy_ndarray\x94K\x80C\x08\x00\x02\n\x01\x18\x01\x06\x01\x94))t\x94R\x94haNNNt\x94R\x94hoj%\x01\x00\x00}\x94}\x94(hdj \x01\x00\x00hr\x8c,register_numpy.<locals>.sizeof_numpy_ndarray\x94ht}\x94hvNhwNhxh\x1bhyNhzNh\xa1]\x94h\xa6}\x94u\x86\x94\x86R0h@\x8c\x14_make_skeleton_class\x94\x93\x94(j.\x01\x00\x00(j\t\x01\x00\x00\x8c\n_DTypeMeta\x94j\t\x01\x00\x00\x85\x94}\x94\x8c 59f7a7c093a8408cb673ce696810d0d8\x94Nt\x94R\x94hm\x8c\x0f_class_setstate\x94\x93\x94j4\x01\x00\x00}\x94(\x8c\x08__init__\x94\x8c\x08builtins\x94\x8c\x07getattr\x94\x93\x94j4\x01\x00\x00j8\x01\x00\x00\x86\x94R\x94\x8c\x07__new__\x94j;\x01\x00\x00j4\x01\x00\x00\x8c\x07__new__\x94\x86\x94R\x94\x8c\t_abstract\x94j;\x01\x00\x00j4\x01\x00\x00jB\x01\x00\x00\x86\x94R\x94\x8c\x04type\x94j;\x01\x00\x00j4\x01\x00\x00jE\x01\x00\x00\x86\x94R\x94\x8c\x0b_parametric\x94j;\x01\x00\x00j4\x01\x00\x00jH\x01\x00\x00\x86\x94R\x94hy\x8c;Preliminary NumPy API: The Type of NumPy DTypes (metaclass)\x94u}\x94\x86\x94\x86R0\x8c\x0edtype[object_]\x94h\x9bhP\x93\x94\x85\x94}\x94\x8c 907c0e4265ec4814bf26c2d9a242a34a\x94Nt\x94R\x94j6\x01\x00\x00jT\x01\x00\x00}\x94(j>\x01\x00\x00j;\x01\x00\x00jT\x01\x00\x00\x8c\x07__new__\x94\x86\x94R\x94hyNu}\x94\x86\x94\x86R0h\x1dh\r\x8c\x03int\x94\x93\x94h\x1du\x8c\x05_lazy\x94}\x94(\x8c\x04cupy\x94h\x1b\x8c\rregister_cupy\x94\x93\x94\x8c\x05numba\x94h\x1b\x8c\x0eregister_numba\x94\x93\x94\x8c\x03rmm\x94h\x1b\x8c\x0cregister_rmm\x94\x93\x94\x8c\x05scipy\x94h\x1b\x8c\x11register_spmatrix\x94\x93\x94\x8c\x07pyarrow\x94h\x1b\x8c\x10register_pyarrow\x94\x93\x94uhdhMube]\x94(\x8c\x05_meta\x94h\xb1)\x81\x94}\x94(\x8c\x04_mgr\x94\x8c\x1epandas.core.internals.managers\x94\x8c\x12SingleBlockManager\x94\x93\x94)\x81\x94(]\x94h\xc7\x8c\n_new_Index\x94\x93\x94\x8c\x19pandas.core.indexes.range\x94\x8c\nRangeIndex\x94\x93\x94}\x94(hWN\x8c\x05start\x94K\x00\x8c\x04stop\x94K\x00\x8c\x04step\x94K\x01u\x86\x94R\x94a]\x94\x8c\x15numpy.core.multiarray\x94\x8c\x0c_reconstruct\x94\x93\x94j\x0f\x01\x00\x00K\x00\x85\x94C\x01b\x94\x87\x94R\x94(K\x01K\x00\x85\x94jO\x01\x00\x00\x8c\x02i8\x94\x89\x88\x87\x94R\x94(K\x03\x8c\x01<\x94NNNJ\xff\xff\xff\xffJ\xff\xff\xff\xffK\x00t\x94b\x89C\x00\x94t\x94ba]\x94jy\x01\x00\x00\x8c\x1bpandas.core.indexes.numeric\x94\x8c\nInt64Index\x94\x93\x94}\x94(\x8c\x04data\x94j\x86\x01\x00\x00j\x0f\x01\x00\x00K\x00\x85\x94j\x88\x01\x00\x00\x87\x94R\x94(K\x01K\x00\x85\x94j\x8e\x01\x00\x00\x89j\x91\x01\x00\x00t\x94bhWNu\x86\x94R\x94a}\x94\x8c\x060.14.1\x94}\x94(\x8c\x04axes\x94jw\x01\x00\x00\x8c\x06blocks\x94]\x94}\x94(\x8c\x06values\x94j\x8a\x01\x00\x00\x8c\x08mgr_locs\x94j\x86\x01\x00\x00j\x0f\x01\x00\x00K\x00\x85\x94j\x88\x01\x00\x00\x87\x94R\x94(K\x01K\x00\x85\x94j\x8e\x01\x00\x00\x89j\x91\x01\x00\x00t\x94buaust\x94b\x8c\x04_typ\x94\x8c\x06series\x94\x8c\t_metadata\x94]\x94hWa\x8c\x05attrs\x94}\x94\x8c\x06_flags\x94}\x94\x8c\x17allows_duplicate_labels\x94\x88shWNubee\x86\x94t\x94\x8c\x13__dask_blockwise__1\x94\x8c,from_pandas-1f79779fe4332aa2089519b52d3e3b08\x94uh\x04\x8c\x13__dask_blockwise__0\x94\x85\x94\x8c6subgraph_callable-240ce47e-86ed-41ab-88e2-655035bf9e42\x94t\x94R\x94.'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/aloysius/mambaforge/envs/swarm/lib/python3.8/site-packages/distributed/worker.py", line 3100, in _maybe_deserialize_task
function, args, kwargs = _deserialize(*ts.runspec)
File "/Users/aloysius/mambaforge/envs/swarm/lib/python3.8/site-packages/distributed/worker.py", line 4055, in _deserialize
function = loads_function(function)
File "/Users/aloysius/mambaforge/envs/swarm/lib/python3.8/site-packages/distributed/worker.py", line 4046, in loads_function
result = pickle.loads(bytes_object)
File "/Users/aloysius/mambaforge/envs/swarm/lib/python3.8/site-packages/distributed/protocol/pickle.py", line 75, in loads
return pickle.loads(x)
AttributeError: type object '_DTypeMeta' has no attribute '_abstract'
distributed.worker - ERROR - Exception during execution of task ('sizeof-a3c10fd37d65e26bfa433c2cdb2dd83c', 0).
Traceback (most recent call last):
File "/Users/aloysius/mambaforge/envs/swarm/lib/python3.8/site-packages/distributed/worker.py", line 4044, in loads_function
result = cache_loads[bytes_object]
File "/Users/aloysius/mambaforge/envs/swarm/lib/python3.8/site-packages/distributed/utils.py", line 1354, in __getitem__
value = super().__getitem__(key)
File "/Users/aloysius/mambaforge/envs/swarm/lib/python3.8/collections/__init__.py", line 1010, in __getitem__
raise KeyError(key)
KeyError: b'\x80\x04\x95|\x13\x00\x00\x00\x00\x00\x00\x8c\x11dask.optimization\x94\x8c\x10SubgraphCallable\x94\x93\x94(}\x94(\x8c\'sizeof-a3c10fd37d65e26bfa433c2cdb2dd83c\x94(\x8c\ndask.utils\x94\x8c\x05apply\x94\x93\x94\x8c\x13dask.dataframe.core\x94\x8c\x11apply_and_enforce\x94\x93\x94]\x94\x8c\x13__dask_blockwise__0\x94a\x8c\x08builtins\x94\x8c\x04dict\x94\x93\x94]\x94(]\x94(\x8c\x05_func\x94h\x05\x8c\x08Dispatch\x94\x93\x94)\x81\x94}\x94(\x8c\x07_lookup\x94}\x94(h\r\x8c\x06object\x94\x93\x94\x8c\x0bdask.sizeof\x94\x8c\x0esizeof_default\x94\x93\x94h\r\x8c\tbytearray\x94\x93\x94h\x1b\x8c\x0csizeof_bytes\x94\x93\x94h\r\x8c\x05bytes\x94\x93\x94h!h\r\x8c\nmemoryview\x94\x93\x94h\x1b\x8c\x11sizeof_memoryview\x94\x93\x94\x8c\x05array\x94\x8c\x05array\x94\x93\x94h\x1b\x8c\x0csizeof_array\x94\x93\x94h\r\x8c\tfrozenset\x94\x93\x94h\x1b\x8c\x18sizeof_python_collection\x94\x93\x94h\r\x8c\x03set\x94\x93\x94h0h\r\x8c\x05tuple\x94\x93\x94h0h\r\x8c\x04list\x94\x93\x94h0h\x1b\x8c\x0cSimpleSizeof\x94\x93\x94h\x1b\x8c\x0esizeof_blocked\x94\x93\x94h\x0fh\x1b\x8c\x12sizeof_python_dict\x94\x93\x94\x8c\x11pandas.core.frame\x94\x8c\tDataFrame\x94\x93\x94\x8c\x17cloudpickle.cloudpickle\x94\x8c\r_builtin_type\x94\x93\x94\x8c\nLambdaType\x94\x85\x94R\x94(hB\x8c\x08CodeType\x94\x85\x94R\x94(K\x01K\x00K\x00K\x04K\x05K\x13CPt\x00|\x00j\x01\x83\x01}\x01|\x00\xa0\x02\xa1\x00D\x00]0\\\x02}\x02}\x03|\x01|\x03j\x03d\x01d\x02\x8d\x017\x00}\x01|\x03j\x04t\x05k\x02r\x12|\x01\x88\x00|\x03j\x06\x83\x017\x00}\x01q\x12t\x07|\x01\x83\x01d\x03\x17\x00S\x00\x94(N\x89\x8c\x05index\x94\x85\x94M\xe8\x03t\x94(\x8c\x06sizeof\x94hJ\x8c\titeritems\x94\x8c\x0cmemory_usage\x94\x8c\x05dtype\x94\x8c\x06object\x94\x8c\x07_values\x94\x8c\x03int\x94t\x94(\x8c\x02df\x94\x8c\x01p\x94\x8c\x04name\x94\x8c\x03col\x94t\x94\x8cP/Users/aloysius/mambaforge/envs/swarm/lib/python3.8/site-packages/dask/sizeof.py\x94\x8c\x17sizeof_pandas_dataframe\x94K\x94C\x0c\x00\x02\n\x01\x10\x01\x10\x01\n\x01\x10\x01\x94\x8c\x0bobject_size\x94\x85\x94)t\x94R\x94}\x94(\x8c\x0b__package__\x94\x8c\x04dask\x94\x8c\x08__name__\x94h\x1b\x8c\x08__file__\x94\x8cP/Users/aloysius/mambaforge/envs/swarm/lib/python3.8/site-packages/dask/sizeof.py\x94uNNh@\x8c\x10_make_empty_cell\x94\x93\x94)R\x94\x85\x94t\x94R\x94\x8c\x1ccloudpickle.cloudpickle_fast\x94\x8c\x12_function_setstate\x94\x93\x94hl}\x94}\x94(hdh[\x8c\x0c__qualname__\x94\x8c0register_pandas.<locals>.sizeof_pandas_dataframe\x94\x8c\x0f__annotations__\x94}\x94\x8c\x0e__kwdefaults__\x94N\x8c\x0c__defaults__\x94N\x8c\n__module__\x94h\x1b\x8c\x07__doc__\x94N\x8c\x0b__closure__\x94h@\x8c\n_make_cell\x94\x93\x94hE(hH(K\x01K\x00K\x00K\x02K\x05K\x13C@t\x00|\x00\x83\x01s\x0cd\x01S\x00\x88\x00j\x01j\x02|\x00d\x02d\x03d\x04\x8d\x03}\x01t\x03t\x04t\x05|\x01\x83\x02\x83\x01}\x01t\x06|\x01\x83\x01d\x02\x1b\x00t\x00|\x00\x83\x01\x14\x00S\x00\x94(NK\x00K\x14\x88\x8c\x04size\x94\x8c\x07replace\x94\x86\x94t\x94(\x8c\x03len\x94\x8c\x06random\x94\x8c\x06choice\x94\x8c\x04list\x94\x8c\x03map\x94hM\x8c\x03sum\x94t\x94\x8c\x01x\x94\x8c\x06sample\x94\x86\x94hZh]K\x8dC\n\x00\x01\x08\x01\x04\x01\x12\x01\x0e\x01\x94\x8c\x02np\x94\x85\x94)t\x94R\x94haNNhh)R\x94\x85\x94t\x94R\x94hoh\x94}\x94}\x94(hdh]hr\x8c$register_pandas.<locals>.object_size\x94ht}\x94hvNhwNhxh\x1bhyNhzh|h@\x8c\tsubimport\x94\x93\x94\x8c\x05numpy\x94\x85\x94R\x94\x85\x94R\x94\x85\x94\x8c\x17_cloudpickle_submodules\x94]\x94h\x9a\x8c\x0cnumpy.random\x94\x85\x94R\x94a\x8c\x0b__globals__\x94}\x94hMh\x15su\x86\x94\x86R0\x85\x94R\x94\x85\x94h\xa1]\x94h\xa6}\x94hMh\x15su\x86\x94\x86R0\x8c\x12pandas.core.series\x94\x8c\x06Series\x94\x93\x94hE(hH(K\x01K\x00K\x00K\x02K\x04K\x13CNt\x00|\x00j\x01d\x01d\x02\x8d\x01\x83\x01}\x01|\x00j\x02t\x03k\x02r(|\x01\x88\x00|\x00j\x04\x83\x017\x00}\x01|\x00j\x05j\x02t\x03k\x02rB|\x01\x88\x00|\x00j\x05\x83\x017\x00}\x01t\x00|\x01\x83\x01d\x03\x17\x00S\x00\x94(N\x88hKM\xe8\x03t\x94(hShOhPhQhRhJt\x94\x8c\x01s\x94hV\x86\x94hZ\x8c\x14sizeof_pandas_series\x94K\x9dC\x0c\x00\x02\x10\x01\n\x01\x0e\x01\x0c\x01\x0e\x01\x94h^)t\x94R\x94haNNhh)R\x94\x85\x94t\x94R\x94hoh\xbe}\x94}\x94(hdh\xb7hr\x8c-register_pandas.<locals>.sizeof_pandas_series\x94ht}\x94hvNhwNhxh\x1bhyNhzh\xaa\x85\x94h\xa1]\x94h\xa6}\x94u\x86\x94\x86R0\x8c\x18pandas.core.indexes.base\x94\x8c\x05Index\x94\x93\x94hE(hH(K\x01K\x00K\x00K\x02K\x03K\x13C.t\x00|\x00\xa0\x01\xa1\x00\x83\x01}\x01|\x00j\x02t\x03k\x02r"|\x01\x88\x00|\x00\x83\x017\x00}\x01t\x00|\x01\x83\x01d\x01\x17\x00S\x00\x94NM\xe8\x03\x86\x94(hShOhPhQt\x94\x8c\x01i\x94hV\x86\x94hZ\x8c\x13sizeof_pandas_index\x94K\xa6C\x08\x00\x02\x0c\x01\n\x01\x0c\x01\x94h^)t\x94R\x94haNNhh)R\x94\x85\x94t\x94R\x94hoh\xd6}\x94}\x94(hdh\xcfhr\x8c,register_pandas.<locals>.sizeof_pandas_index\x94ht}\x94hvNhwNhxh\x1bhyNhzh\xaa\x85\x94h\xa1]\x94h\xa6}\x94u\x86\x94\x86R0\x8c\x19pandas.core.indexes.multi\x94\x8c\nMultiIndex\x94\x93\x94hE(hH(K\x01K\x00K\x00K\x03K\x05K\x13CNt\x00t\x01\x87\x00f\x01d\x01d\x02\x84\x08|\x00j\x02D\x00\x83\x01\x83\x01\x83\x01}\x01t\x03|\x00d\x03\x83\x02r,|\x00j\x04n\x04|\x00j\x05D\x00]\x0e}\x02|\x01|\x02j\x067\x00}\x01q2t\x00|\x01\x83\x01d\x04\x17\x00S\x00\x94(NhH(K\x01K\x00K\x00K\x02K\x03K3C\x16|\x00]\x0e}\x01\x88\x00|\x01\x83\x01V\x00\x01\x00q\x02d\x00S\x00\x94N\x85\x94)\x8c\x02.0\x94\x8c\x01l\x94\x86\x94hZ\x8c\t<genexpr>\x94K\xafC\x04\x04\x00\x02\x00\x94h^)t\x94R\x94\x8cDregister_pandas.<locals>.sizeof_pandas_multiindex.<locals>.<genexpr>\x94\x8c\x05codes\x94M\xe8\x03t\x94(hSh\x87\x8c\x06levels\x94\x8c\x07hasattr\x94h\xed\x8c\x06labels\x94\x8c\x06nbytes\x94t\x94h\xcdhV\x8c\x01c\x94\x87\x94hZ\x8c\x18sizeof_pandas_multiindex\x94K\xadC\x08\x00\x02\x1c\x01\x1a\x01\x0c\x01\x94h^)t\x94R\x94haNNhh)R\x94\x85\x94t\x94R\x94hoh\xfd}\x94}\x94(hdh\xf6hr\x8c1register_pandas.<locals>.sizeof_pandas_multiindex\x94ht}\x94hvNhwNhxh\x1bhyNhzh\xaa\x85\x94h\xa1]\x94h\xa6}\x94u\x86\x94\x86R0h\r\x8c\x03str\x94\x93\x94h\x1dh\r\x8c\x04type\x94\x93\x94N\x85\x94R\x94h\x1dh\r\x8c\x04bool\x94\x93\x94h\x1dh\x9b\x8c\x07ndarray\x94\x93\x94hE(hH(K\x01K\x00K\x00K\x02K\x04KSC2d\x01|\x00j\x00k\x06r(|\x00t\x01d\x02d\x03\x84\x00|\x00j\x00D\x00\x83\x01\x83\x01\x19\x00}\x01|\x01j\x02S\x00t\x03|\x00j\x02\x83\x01S\x00\x94(NK\x00hH(K\x01K\x00K\x00K\x02K\x03KsC&|\x00]\x1e}\x01|\x01d\x00k\x03r\x16t\x00d\x01\x83\x01n\x06t\x00d\x02\x83\x01V\x00\x01\x00q\x02d\x01S\x00\x94K\x00NK\x01\x87\x94\x8c\x05slice\x94\x85\x94h\xe5h\xb5\x86\x94hZh\xe8K\x83C\x04\x04\x00\x02\x00\x94))t\x94R\x94\x8c?register_numpy.<locals>.sizeof_numpy_ndarray.<locals>.<genexpr>\x94t\x94(\x8c\x07strides\x94\x8c\x05tuple\x94h\xf2hSt\x94h\x89\x8c\x02xs\x94\x86\x94hZ\x8c\x14sizeof_numpy_ndarray\x94K\x80C\x08\x00\x02\n\x01\x18\x01\x06\x01\x94))t\x94R\x94haNNNt\x94R\x94hoj%\x01\x00\x00}\x94}\x94(hdj \x01\x00\x00hr\x8c,register_numpy.<locals>.sizeof_numpy_ndarray\x94ht}\x94hvNhwNhxh\x1bhyNhzNh\xa1]\x94h\xa6}\x94u\x86\x94\x86R0h@\x8c\x14_make_skeleton_class\x94\x93\x94(j.\x01\x00\x00(j\t\x01\x00\x00\x8c\n_DTypeMeta\x94j\t\x01\x00\x00\x85\x94}\x94\x8c 59f7a7c093a8408cb673ce696810d0d8\x94Nt\x94R\x94hm\x8c\x0f_class_setstate\x94\x93\x94j4\x01\x00\x00}\x94(\x8c\x08__init__\x94\x8c\x08builtins\x94\x8c\x07getattr\x94\x93\x94j4\x01\x00\x00j8\x01\x00\x00\x86\x94R\x94\x8c\x07__new__\x94j;\x01\x00\x00j4\x01\x00\x00\x8c\x07__new__\x94\x86\x94R\x94\x8c\t_abstract\x94j;\x01\x00\x00j4\x01\x00\x00jB\x01\x00\x00\x86\x94R\x94\x8c\x04type\x94j;\x01\x00\x00j4\x01\x00\x00jE\x01\x00\x00\x86\x94R\x94\x8c\x0b_parametric\x94j;\x01\x00\x00j4\x01\x00\x00jH\x01\x00\x00\x86\x94R\x94hy\x8c;Preliminary NumPy API: The Type of NumPy DTypes (metaclass)\x94u}\x94\x86\x94\x86R0\x8c\x0edtype[object_]\x94h\x9bhP\x93\x94\x85\x94}\x94\x8c 907c0e4265ec4814bf26c2d9a242a34a\x94Nt\x94R\x94j6\x01\x00\x00jT\x01\x00\x00}\x94(j>\x01\x00\x00j;\x01\x00\x00jT\x01\x00\x00\x8c\x07__new__\x94\x86\x94R\x94hyNu}\x94\x86\x94\x86R0h\x1dh\r\x8c\x03int\x94\x93\x94h\x1du\x8c\x05_lazy\x94}\x94(\x8c\x04cupy\x94h\x1b\x8c\rregister_cupy\x94\x93\x94\x8c\x05numba\x94h\x1b\x8c\x0eregister_numba\x94\x93\x94\x8c\x03rmm\x94h\x1b\x8c\x0cregister_rmm\x94\x93\x94\x8c\x05scipy\x94h\x1b\x8c\x11register_spmatrix\x94\x93\x94\x8c\x07pyarrow\x94h\x1b\x8c\x10register_pyarrow\x94\x93\x94uhdhMube]\x94(\x8c\x05_meta\x94h\xb1)\x81\x94}\x94(\x8c\x04_mgr\x94\x8c\x1epandas.core.internals.managers\x94\x8c\x12SingleBlockManager\x94\x93\x94)\x81\x94(]\x94h\xc7\x8c\n_new_Index\x94\x93\x94\x8c\x19pandas.core.indexes.range\x94\x8c\nRangeIndex\x94\x93\x94}\x94(hWN\x8c\x05start\x94K\x00\x8c\x04stop\x94K\x00\x8c\x04step\x94K\x01u\x86\x94R\x94a]\x94\x8c\x15numpy.core.multiarray\x94\x8c\x0c_reconstruct\x94\x93\x94j\x0f\x01\x00\x00K\x00\x85\x94C\x01b\x94\x87\x94R\x94(K\x01K\x00\x85\x94jO\x01\x00\x00\x8c\x02i8\x94\x89\x88\x87\x94R\x94(K\x03\x8c\x01<\x94NNNJ\xff\xff\xff\xffJ\xff\xff\xff\xffK\x00t\x94b\x89C\x00\x94t\x94ba]\x94jy\x01\x00\x00\x8c\x1bpandas.core.indexes.numeric\x94\x8c\nInt64Index\x94\x93\x94}\x94(\x8c\x04data\x94j\x86\x01\x00\x00j\x0f\x01\x00\x00K\x00\x85\x94j\x88\x01\x00\x00\x87\x94R\x94(K\x01K\x00\x85\x94j\x8e\x01\x00\x00\x89j\x91\x01\x00\x00t\x94bhWNu\x86\x94R\x94a}\x94\x8c\x060.14.1\x94}\x94(\x8c\x04axes\x94jw\x01\x00\x00\x8c\x06blocks\x94]\x94}\x94(\x8c\x06values\x94j\x8a\x01\x00\x00\x8c\x08mgr_locs\x94j\x86\x01\x00\x00j\x0f\x01\x00\x00K\x00\x85\x94j\x88\x01\x00\x00\x87\x94R\x94(K\x01K\x00\x85\x94j\x8e\x01\x00\x00\x89j\x91\x01\x00\x00t\x94buaust\x94b\x8c\x04_typ\x94\x8c\x06series\x94\x8c\t_metadata\x94]\x94hWa\x8c\x05attrs\x94}\x94\x8c\x06_flags\x94}\x94\x8c\x17allows_duplicate_labels\x94\x88shWNubee\x86\x94t\x94\x8c\x13__dask_blockwise__1\x94\x8c,from_pandas-1f79779fe4332aa2089519b52d3e3b08\x94uh\x04\x8c\x13__dask_blockwise__0\x94\x85\x94\x8c6subgraph_callable-240ce47e-86ed-41ab-88e2-655035bf9e42\x94t\x94R\x94.'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/aloysius/mambaforge/envs/swarm/lib/python3.8/site-packages/distributed/worker.py", line 3181, in execute
function, args, kwargs = await self._maybe_deserialize_task(
File "/Users/aloysius/mambaforge/envs/swarm/lib/python3.8/site-packages/distributed/worker.py", line 3100, in _maybe_deserialize_task
function, args, kwargs = _deserialize(*ts.runspec)
File "/Users/aloysius/mambaforge/envs/swarm/lib/python3.8/site-packages/distributed/worker.py", line 4055, in _deserialize
function = loads_function(function)
File "/Users/aloysius/mambaforge/envs/swarm/lib/python3.8/site-packages/distributed/worker.py", line 4046, in loads_function
result = pickle.loads(bytes_object)
File "/Users/aloysius/mambaforge/envs/swarm/lib/python3.8/site-packages/distributed/protocol/pickle.py", line 75, in loads
return pickle.loads(x)
AttributeError: type object '_DTypeMeta' has no attribute '_abstract'
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/aloysius/mambaforge/envs/swarm/lib/python3.8/site-packages/dask/dataframe/core.py", line 4344, in set_index
return set_index(
File "/Users/aloysius/mambaforge/envs/swarm/lib/python3.8/site-packages/dask/dataframe/shuffle.py", line 191, in set_index
divisions, mins, maxes = _calculate_divisions(
File "/Users/aloysius/mambaforge/envs/swarm/lib/python3.8/site-packages/dask/dataframe/shuffle.py", line 40, in _calculate_divisions
divisions, sizes, mins, maxes = base.compute(divisions, sizes, mins, maxes)
File "/Users/aloysius/mambaforge/envs/swarm/lib/python3.8/site-packages/dask/base.py", line 570, in compute
results = schedule(dsk, keys, **kwargs)
File "/Users/aloysius/mambaforge/envs/swarm/lib/python3.8/site-packages/distributed/client.py", line 2693, in get
results = self.gather(packed, asynchronous=asynchronous, direct=direct)
File "/Users/aloysius/mambaforge/envs/swarm/lib/python3.8/site-packages/distributed/client.py", line 1969, in gather
return self.sync(
File "/Users/aloysius/mambaforge/envs/swarm/lib/python3.8/site-packages/distributed/client.py", line 865, in sync
return sync(
File "/Users/aloysius/mambaforge/envs/swarm/lib/python3.8/site-packages/distributed/utils.py", line 327, in sync
raise exc.with_traceback(tb)
File "/Users/aloysius/mambaforge/envs/swarm/lib/python3.8/site-packages/distributed/utils.py", line 310, in f
result[0] = yield future
File "/Users/aloysius/mambaforge/envs/swarm/lib/python3.8/site-packages/tornado/gen.py", line 762, in run
value = future.result()
File "/Users/aloysius/mambaforge/envs/swarm/lib/python3.8/site-packages/distributed/client.py", line 1834, in _gather
raise exception.with_traceback(traceback)
File "/Users/aloysius/mambaforge/envs/swarm/lib/python3.8/site-packages/distributed/protocol/pickle.py", line 75, in loads
return pickle.loads(x)
AttributeError: type object '_DTypeMeta' has no attribute '_abstract'
Anything else we need to know?:
Environment:
- Dask version: 2021.10.0
- Distributed version: 2021.10.0
- Python version: 3.8.12
- Operating System: MacOS 11.6
- Install method (conda, pip, source): pip
Issue Analytics
- State:
- Created 2 years ago
- Comments:5 (1 by maintainers)
Top Results From Across the Web
Unable to write PySpark Dataframe created from two zipped ...
I only know how to specify the # of partitions, not the way to partition. Or, more specifically, I do not know the...
Read more >pandas.DataFrame.set_index — pandas 1.5.2 documentation
This parameter can be either a single column key, a single array of the same length as the calling DataFrame, or a list...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Hi @aloysius-lim , thanks for the update.
I was able to reproduce your issue on using numpy==1.21.4, so it seems the underlying problem in #7170 is not fully resolved. Of note: the error seems to be coming from deserializing cached values, so it makes sense that it would only crop up on the second time you are executing
set_index()
:We’ll do some more digging to see what’s going on
The workaround: It seems that if your index is sorted (and hence you specify
sorted=True
in case ofdf2
), it works without errors.