question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

BUG: KeyError when UDF in groupby.apply accesses data from another column partition

See original GitHub issue

System information

  • **OS Platform and Distribution Ubuntu 20.04.2 LTS **:
  • Modin version 0.10.2:
  • Python version 3.7.11:

import modin.pandas as pd
def printcol(dataset, **kwargs):
    col = kwargs["column"]
    print(dataset[col])
    return dataset
df = pd.DataFrame(np.random.randn(11, 2), columns=["A", "B"])
df['KEY'] = df['A'].apply(lambda x: 'K1' if x < 0 else 'K2')

df = df.groupby(['KEY']).apply(printcol,column="A")

[snip]
File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/pandas/core/groupby/groupby.py", line 1237, in f
    return func(g, *args, **kwargs)
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/modin/utils.py", line 500, in wrapper
    result = func(*args, **kwargs)
  File "<ipython-input-2-5658305649cc>", line 3, in printcol
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/pandas/core/frame.py", line 3455, in __getitem__
    indexer = self.columns.get_loc(key)
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 3363, in get_loc
    raise KeyError(key) from err
KeyError: 'A'

Describe the problem

A key error occurs when trying to reference a column in a grouping within the function you are applying. This works as expected if a standard pandas.DataFrame object is used.

Source code / logs

Full stack trace
2021-09-16 10:05:17,089	ERROR worker.py:79 -- Unhandled error (suppress with RAY_IGNORE_UNHANDLED_ERRORS=1): ray::apply_func() (pid=19062, ip=172.31.12.170)
  File "python/ray/_raylet.pyx", line 460, in ray._raylet.execute_task
  File "python/ray/_raylet.pyx", line 481, in ray._raylet.execute_task
  File "python/ray/_raylet.pyx", line 351, in ray._raylet.raise_if_dependency_failed
ray.exceptions.RayTaskError: ray::deploy_ray_func() (pid=19062, ip=172.31.12.170)
  File "pandas/_libs/index.pyx", line 76, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 108, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 5198, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 5206, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'A'

The above exception was the direct cause of the following exception:

ray::deploy_ray_func() (pid=19062, ip=172.31.12.170)
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/modin/backends/pandas/query_compiler.py", line 2834, in groupby_agg_builder
    return compute_groupby(df, drop, partition_idx)
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/modin/backends/pandas/query_compiler.py", line 2752, in compute_groupby
    result = agg_func(grouped_df, **agg_kwargs)
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/modin/utils.py", line 500, in wrapper
    result = func(*args, **kwargs)
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/modin/pandas/groupby.py", line 280, in <lambda>
    return self._apply_agg_function(lambda df: df.apply(func, *args, **kwargs))
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/pandas/core/groupby/groupby.py", line 1253, in apply
    result = self._python_apply_general(f, self._selected_obj)
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/pandas/core/groupby/groupby.py", line 1287, in _python_apply_general
    keys, values, mutated = self.grouper.apply(f, data, self.axis)
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/pandas/core/groupby/ops.py", line 783, in apply
    result_values, mutated = splitter.fast_apply(f, sdata, group_keys)
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/pandas/core/groupby/ops.py", line 1328, in fast_apply
    return libreduction.apply_frame_axis0(sdata, f, names, starts, ends)
  File "pandas/_libs/reduction.pyx", line 381, in pandas._libs.reduction.apply_frame_axis0
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/pandas/core/groupby/groupby.py", line 1237, in f
    return func(g, *args, **kwargs)
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/modin/utils.py", line 500, in wrapper
    result = func(*args, **kwargs)
  File "<ipython-input-2-5658305649cc>", line 3, in printcol
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/pandas/core/frame.py", line 3455, in __getitem__
    indexer = self.columns.get_loc(key)
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 3363, in get_loc
    raise KeyError(key) from err
KeyError: 'A'

During handling of the above exception, another exception occurred:

ray::deploy_ray_func() (pid=19062, ip=172.31.12.170)
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 3361, in get_loc
    return self._engine.get_loc(casted_key)
  File "pandas/_libs/index.pyx", line 76, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 108, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 5198, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 5206, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'A'

The above exception was the direct cause of the following exception:

ray::deploy_ray_func() (pid=19062, ip=172.31.12.170)
  File "python/ray/_raylet.pyx", line 490, in ray._raylet.execute_task
  File "python/ray/_raylet.pyx", line 497, in ray._raylet.execute_task
  File "python/ray/_raylet.pyx", line 501, in ray._raylet.execute_task
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/ray/util/tracing/tracing_helper.py", line 330, in _function_with_tracing
    return function(*args, **kwargs)
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/modin/engines/ray/pandas_on_ray/frame/axis_partition.py", line 207, in deploy_ray_func
    result = func(*args)
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/modin/engines/base/frame/axis_partition.py", line 369, in deploy_func_between_two_axis_partitions
    result = func(lt_frame, rt_frame, **kwargs)
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/modin/engines/base/frame/data.py", line 1153, in _map_reduce_func
    series_result = func(df, *args, **kwargs)
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/modin/backends/pandas/query_compiler.py", line 2845, in <lambda>
    df, by, drop, partition_idx
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/modin/backends/pandas/query_compiler.py", line 2838, in groupby_agg_builder
    return compute_groupby(df.copy(), drop, partition_idx)
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/modin/backends/pandas/query_compiler.py", line 2752, in compute_groupby
    result = agg_func(grouped_df, **agg_kwargs)
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/modin/utils.py", line 500, in wrapper
    result = func(*args, **kwargs)
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/modin/pandas/groupby.py", line 280, in <lambda>
    return self._apply_agg_function(lambda df: df.apply(func, *args, **kwargs))
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/pandas/core/groupby/groupby.py", line 1253, in apply
    result = self._python_apply_general(f, self._selected_obj)
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/pandas/core/groupby/groupby.py", line 1287, in _python_apply_general
    keys, values, mutated = self.grouper.apply(f, data, self.axis)
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/pandas/core/groupby/ops.py", line 783, in apply
    result_values, mutated = splitter.fast_apply(f, sdata, group_keys)
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/pandas/core/groupby/ops.py", line 1328, in fast_apply
    return libreduction.apply_frame_axis0(sdata, f, names, starts, ends)
  File "pandas/_libs/reduction.pyx", line 381, in pandas._libs.reduction.apply_frame_axis0
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/pandas/core/groupby/groupby.py", line 1237, in f
    return func(g, *args, **kwargs)
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/modin/utils.py", line 500, in wrapper
    result = func(*args, **kwargs)
  File "<ipython-input-2-5658305649cc>", line 3, in printcol
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/pandas/core/frame.py", line 3455, in __getitem__
    indexer = self.columns.get_loc(key)
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 3363, in get_loc
    raise KeyError(key) from err
KeyError: 'A'
2021-09-16 10:05:17,092	ERROR worker.py:79 -- Unhandled error (suppress with RAY_IGNORE_UNHANDLED_ERRORS=1): ray::apply_func() (pid=19062, ip=172.31.12.170)
  File "python/ray/_raylet.pyx", line 460, in ray._raylet.execute_task
  File "python/ray/_raylet.pyx", line 481, in ray._raylet.execute_task
  File "python/ray/_raylet.pyx", line 351, in ray._raylet.raise_if_dependency_failed
ray.exceptions.RayTaskError: ray::deploy_ray_func() (pid=19062, ip=172.31.12.170)
  File "pandas/_libs/index.pyx", line 76, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 108, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 5198, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 5206, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'A'

The above exception was the direct cause of the following exception:

ray::deploy_ray_func() (pid=19062, ip=172.31.12.170)
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/modin/backends/pandas/query_compiler.py", line 2834, in groupby_agg_builder
    return compute_groupby(df, drop, partition_idx)
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/modin/backends/pandas/query_compiler.py", line 2752, in compute_groupby
    result = agg_func(grouped_df, **agg_kwargs)
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/modin/utils.py", line 500, in wrapper
    result = func(*args, **kwargs)
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/modin/pandas/groupby.py", line 280, in <lambda>
    return self._apply_agg_function(lambda df: df.apply(func, *args, **kwargs))
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/pandas/core/groupby/groupby.py", line 1253, in apply
    result = self._python_apply_general(f, self._selected_obj)
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/pandas/core/groupby/groupby.py", line 1287, in _python_apply_general
    keys, values, mutated = self.grouper.apply(f, data, self.axis)
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/pandas/core/groupby/ops.py", line 783, in apply
    result_values, mutated = splitter.fast_apply(f, sdata, group_keys)
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/pandas/core/groupby/ops.py", line 1328, in fast_apply
    return libreduction.apply_frame_axis0(sdata, f, names, starts, ends)
  File "pandas/_libs/reduction.pyx", line 381, in pandas._libs.reduction.apply_frame_axis0
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/pandas/core/groupby/groupby.py", line 1237, in f
    return func(g, *args, **kwargs)
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/modin/utils.py", line 500, in wrapper
    result = func(*args, **kwargs)
  File "<ipython-input-2-5658305649cc>", line 3, in printcol
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/pandas/core/frame.py", line 3455, in __getitem__
    indexer = self.columns.get_loc(key)
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 3363, in get_loc
    raise KeyError(key) from err
KeyError: 'A'

During handling of the above exception, another exception occurred:

ray::deploy_ray_func() (pid=19062, ip=172.31.12.170)
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 3361, in get_loc
    return self._engine.get_loc(casted_key)
  File "pandas/_libs/index.pyx", line 76, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 108, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 5198, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 5206, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'A'

The above exception was the direct cause of the following exception:

ray::deploy_ray_func() (pid=19062, ip=172.31.12.170)
  File "python/ray/_raylet.pyx", line 490, in ray._raylet.execute_task
  File "python/ray/_raylet.pyx", line 497, in ray._raylet.execute_task
  File "python/ray/_raylet.pyx", line 501, in ray._raylet.execute_task
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/ray/util/tracing/tracing_helper.py", line 330, in _function_with_tracing
    return function(*args, **kwargs)
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/modin/engines/ray/pandas_on_ray/frame/axis_partition.py", line 207, in deploy_ray_func
    result = func(*args)
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/modin/engines/base/frame/axis_partition.py", line 369, in deploy_func_between_two_axis_partitions
    result = func(lt_frame, rt_frame, **kwargs)
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/modin/engines/base/frame/data.py", line 1153, in _map_reduce_func
    series_result = func(df, *args, **kwargs)
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/modin/backends/pandas/query_compiler.py", line 2845, in <lambda>
    df, by, drop, partition_idx
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/modin/backends/pandas/query_compiler.py", line 2838, in groupby_agg_builder
    return compute_groupby(df.copy(), drop, partition_idx)
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/modin/backends/pandas/query_compiler.py", line 2752, in compute_groupby
    result = agg_func(grouped_df, **agg_kwargs)
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/modin/utils.py", line 500, in wrapper
    result = func(*args, **kwargs)
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/modin/pandas/groupby.py", line 280, in <lambda>
    return self._apply_agg_function(lambda df: df.apply(func, *args, **kwargs))
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/pandas/core/groupby/groupby.py", line 1253, in apply
    result = self._python_apply_general(f, self._selected_obj)
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/pandas/core/groupby/groupby.py", line 1287, in _python_apply_general
    keys, values, mutated = self.grouper.apply(f, data, self.axis)
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/pandas/core/groupby/ops.py", line 783, in apply
    result_values, mutated = splitter.fast_apply(f, sdata, group_keys)
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/pandas/core/groupby/ops.py", line 1328, in fast_apply
    return libreduction.apply_frame_axis0(sdata, f, names, starts, ends)
  File "pandas/_libs/reduction.pyx", line 381, in pandas._libs.reduction.apply_frame_axis0
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/pandas/core/groupby/groupby.py", line 1237, in f
    return func(g, *args, **kwargs)
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/modin/utils.py", line 500, in wrapper
    result = func(*args, **kwargs)
  File "<ipython-input-2-5658305649cc>", line 3, in printcol
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/pandas/core/frame.py", line 3455, in __getitem__
    indexer = self.columns.get_loc(key)
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 3363, in get_loc
    raise KeyError(key) from err
KeyError: 'A'
2021-09-16 10:05:17,095	ERROR worker.py:79 -- Unhandled error (suppress with RAY_IGNORE_UNHANDLED_ERRORS=1): ray::apply_func() (pid=19062, ip=172.31.12.170)
  File "python/ray/_raylet.pyx", line 460, in ray._raylet.execute_task
  File "python/ray/_raylet.pyx", line 481, in ray._raylet.execute_task
  File "python/ray/_raylet.pyx", line 351, in ray._raylet.raise_if_dependency_failed
ray.exceptions.RayTaskError: ray::deploy_ray_func() (pid=19062, ip=172.31.12.170)
  File "pandas/_libs/index.pyx", line 76, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 108, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 5198, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 5206, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'A'

The above exception was the direct cause of the following exception:

ray::deploy_ray_func() (pid=19062, ip=172.31.12.170)
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/modin/backends/pandas/query_compiler.py", line 2834, in groupby_agg_builder
    return compute_groupby(df, drop, partition_idx)
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/modin/backends/pandas/query_compiler.py", line 2752, in compute_groupby
    result = agg_func(grouped_df, **agg_kwargs)
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/modin/utils.py", line 500, in wrapper
    result = func(*args, **kwargs)
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/modin/pandas/groupby.py", line 280, in <lambda>
    return self._apply_agg_function(lambda df: df.apply(func, *args, **kwargs))
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/pandas/core/groupby/groupby.py", line 1253, in apply
    result = self._python_apply_general(f, self._selected_obj)
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/pandas/core/groupby/groupby.py", line 1287, in _python_apply_general
    keys, values, mutated = self.grouper.apply(f, data, self.axis)
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/pandas/core/groupby/ops.py", line 783, in apply
    result_values, mutated = splitter.fast_apply(f, sdata, group_keys)
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/pandas/core/groupby/ops.py", line 1328, in fast_apply
    return libreduction.apply_frame_axis0(sdata, f, names, starts, ends)
  File "pandas/_libs/reduction.pyx", line 381, in pandas._libs.reduction.apply_frame_axis0
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/pandas/core/groupby/groupby.py", line 1237, in f
    return func(g, *args, **kwargs)
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/modin/utils.py", line 500, in wrapper
    result = func(*args, **kwargs)
  File "<ipython-input-2-5658305649cc>", line 3, in printcol
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/pandas/core/frame.py", line 3455, in __getitem__
    indexer = self.columns.get_loc(key)
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 3363, in get_loc
    raise KeyError(key) from err
KeyError: 'A'

During handling of the above exception, another exception occurred:

ray::deploy_ray_func() (pid=19062, ip=172.31.12.170)
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 3361, in get_loc
    return self._engine.get_loc(casted_key)
  File "pandas/_libs/index.pyx", line 76, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 108, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 5198, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 5206, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'A'

The above exception was the direct cause of the following exception:

ray::deploy_ray_func() (pid=19062, ip=172.31.12.170)
  File "python/ray/_raylet.pyx", line 490, in ray._raylet.execute_task
  File "python/ray/_raylet.pyx", line 497, in ray._raylet.execute_task
  File "python/ray/_raylet.pyx", line 501, in ray._raylet.execute_task
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/ray/util/tracing/tracing_helper.py", line 330, in _function_with_tracing
    return function(*args, **kwargs)
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/modin/engines/ray/pandas_on_ray/frame/axis_partition.py", line 207, in deploy_ray_func
    result = func(*args)
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/modin/engines/base/frame/axis_partition.py", line 369, in deploy_func_between_two_axis_partitions
    result = func(lt_frame, rt_frame, **kwargs)
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/modin/engines/base/frame/data.py", line 1153, in _map_reduce_func
    series_result = func(df, *args, **kwargs)
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/modin/backends/pandas/query_compiler.py", line 2845, in <lambda>
    df, by, drop, partition_idx
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/modin/backends/pandas/query_compiler.py", line 2838, in groupby_agg_builder
    return compute_groupby(df.copy(), drop, partition_idx)
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/modin/backends/pandas/query_compiler.py", line 2752, in compute_groupby
    result = agg_func(grouped_df, **agg_kwargs)
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/modin/utils.py", line 500, in wrapper
    result = func(*args, **kwargs)
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/modin/pandas/groupby.py", line 280, in <lambda>
    return self._apply_agg_function(lambda df: df.apply(func, *args, **kwargs))
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/pandas/core/groupby/groupby.py", line 1253, in apply
    result = self._python_apply_general(f, self._selected_obj)
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/pandas/core/groupby/groupby.py", line 1287, in _python_apply_general
    keys, values, mutated = self.grouper.apply(f, data, self.axis)
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/pandas/core/groupby/ops.py", line 783, in apply
    result_values, mutated = splitter.fast_apply(f, sdata, group_keys)
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/pandas/core/groupby/ops.py", line 1328, in fast_apply
    return libreduction.apply_frame_axis0(sdata, f, names, starts, ends)
  File "pandas/_libs/reduction.pyx", line 381, in pandas._libs.reduction.apply_frame_axis0
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/pandas/core/groupby/groupby.py", line 1237, in f
    return func(g, *args, **kwargs)
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/modin/utils.py", line 500, in wrapper
    result = func(*args, **kwargs)
  File "<ipython-input-2-5658305649cc>", line 3, in printcol
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/pandas/core/frame.py", line 3455, in __getitem__
    indexer = self.columns.get_loc(key)
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 3363, in get_loc
    raise KeyError(key) from err
KeyError: 'A'
(pid=19064) 1    -1.342643
(pid=19064) 3    -0.855525
(pid=19064) 5    -0.656729
(pid=19064) 6    -0.795196
(pid=19064) 8    -1.158136
(pid=19064) 9    -1.422635
(pid=19064) 10   -1.237912
(pid=19064) Name: A, dtype: float64
(pid=19064) 0    2.028227
(pid=19064) 2    0.469303
(pid=19064) 4    0.867193
(pid=19064) 7    1.239291
(pid=19064) Name: A, dtype: float64
---------------------------------------------------------------------------
RayTaskError(KeyError)                    Traceback (most recent call last)
<ipython-input-4-97c0bcb4466e> in <module>
----> 1 df = df.groupby(['KEY']).apply(printcol,column="A")

~/miniconda3/envs/fctk/lib/python3.7/site-packages/modin/pandas/groupby.py in apply(self, func, *args, **kwargs)
    278         if not isinstance(func, BuiltinFunctionType):
    279             func = wrap_udf_function(func)
--> 280         return self._apply_agg_function(lambda df: df.apply(func, *args, **kwargs))
    281 
    282     @property

~/miniconda3/envs/fctk/lib/python3.7/site-packages/modin/pandas/groupby.py in _apply_agg_function(self, f, *args, **kwargs)
    921             agg_kwargs=kwargs,
    922             groupby_kwargs=self._kwargs,
--> 923             drop=self._drop,
    924         )
    925         if self._idx_name is not None and self._as_index:

~/miniconda3/envs/fctk/lib/python3.7/site-packages/modin/backends/pandas/query_compiler.py in groupby_agg(self, by, is_multi_by, axis, agg_func, agg_args, agg_kwargs, groupby_kwargs, drop)
   2847             other=broadcastable_by,
   2848             apply_indices=apply_indices,
-> 2849             enumerate_partitions=True,
   2850         )
   2851         result = self.__constructor__(new_modin_frame)

~/miniconda3/envs/fctk/lib/python3.7/site-packages/modin/engines/base/frame/data.py in broadcast_apply_full_axis(self, axis, func, other, new_index, new_columns, apply_indices, enumerate_partitions, dtypes)
   1816             if new_axis is None
   1817             else new_axis
-> 1818             for i, new_axis in enumerate([new_index, new_columns])
   1819         ]
   1820         if dtypes == "copy":

~/miniconda3/envs/fctk/lib/python3.7/site-packages/modin/engines/base/frame/data.py in <listcomp>(.0)
   1816             if new_axis is None
   1817             else new_axis
-> 1818             for i, new_axis in enumerate([new_index, new_columns])
   1819         ]
   1820         if dtypes == "copy":

~/miniconda3/envs/fctk/lib/python3.7/site-packages/modin/engines/base/frame/data.py in _compute_axis_labels(self, axis, partitions)
    305             partitions = self._partitions
    306         return self._partition_mgr_cls.get_indices(
--> 307             axis, partitions, lambda df: df.axes[axis]
    308         )
    309 

~/miniconda3/envs/fctk/lib/python3.7/site-packages/modin/engines/ray/pandas_on_ray/frame/partition_manager.py in get_indices(cls, axis, partitions, index_func)
    135                 else []
    136             )
--> 137         new_idx = ray.get(new_idx)
    138         return new_idx[0].append(new_idx[1:]) if len(new_idx) else new_idx
    139 

~/miniconda3/envs/fctk/lib/python3.7/site-packages/ray/_private/client_mode_hook.py in wrapper(*args, **kwargs)
     60         if client_mode_should_convert():
     61             return getattr(ray, func.__name__)(*args, **kwargs)
---> 62         return func(*args, **kwargs)
     63 
     64     return wrapper

~/miniconda3/envs/fctk/lib/python3.7/site-packages/ray/worker.py in get(object_refs, timeout)
   1493                     worker.core_worker.dump_object_store_memory_usage()
   1494                 if isinstance(value, RayTaskError):
-> 1495                     raise value.as_instanceof_cause()
   1496                 else:
   1497                     raise value

RayTaskError(KeyError): ray::apply_func() (pid=19062, ip=172.31.12.170)
  File "python/ray/_raylet.pyx", line 460, in ray._raylet.execute_task
  File "python/ray/_raylet.pyx", line 481, in ray._raylet.execute_task
  File "python/ray/_raylet.pyx", line 351, in ray._raylet.raise_if_dependency_failed
ray.exceptions.RayTaskError: ray::deploy_ray_func() (pid=19062, ip=172.31.12.170)
  File "pandas/_libs/index.pyx", line 76, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 108, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 5198, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 5206, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'A'

The above exception was the direct cause of the following exception:

ray::deploy_ray_func() (pid=19062, ip=172.31.12.170)
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/modin/backends/pandas/query_compiler.py", line 2834, in groupby_agg_builder
    return compute_groupby(df, drop, partition_idx)
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/modin/backends/pandas/query_compiler.py", line 2752, in compute_groupby
    result = agg_func(grouped_df, **agg_kwargs)
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/modin/utils.py", line 500, in wrapper
    result = func(*args, **kwargs)
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/modin/pandas/groupby.py", line 280, in <lambda>
    return self._apply_agg_function(lambda df: df.apply(func, *args, **kwargs))
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/pandas/core/groupby/groupby.py", line 1253, in apply
    result = self._python_apply_general(f, self._selected_obj)
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/pandas/core/groupby/groupby.py", line 1287, in _python_apply_general
    keys, values, mutated = self.grouper.apply(f, data, self.axis)
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/pandas/core/groupby/ops.py", line 783, in apply
    result_values, mutated = splitter.fast_apply(f, sdata, group_keys)
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/pandas/core/groupby/ops.py", line 1328, in fast_apply
    return libreduction.apply_frame_axis0(sdata, f, names, starts, ends)
  File "pandas/_libs/reduction.pyx", line 381, in pandas._libs.reduction.apply_frame_axis0
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/pandas/core/groupby/groupby.py", line 1237, in f
    return func(g, *args, **kwargs)
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/modin/utils.py", line 500, in wrapper
    result = func(*args, **kwargs)
  File "<ipython-input-2-5658305649cc>", line 3, in printcol
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/pandas/core/frame.py", line 3455, in __getitem__
    indexer = self.columns.get_loc(key)
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 3363, in get_loc
    raise KeyError(key) from err
KeyError: 'A'

During handling of the above exception, another exception occurred:

ray::deploy_ray_func() (pid=19062, ip=172.31.12.170)
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 3361, in get_loc
    return self._engine.get_loc(casted_key)
  File "pandas/_libs/index.pyx", line 76, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 108, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 5198, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 5206, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'A'

The above exception was the direct cause of the following exception:

ray::deploy_ray_func() (pid=19062, ip=172.31.12.170)
  File "python/ray/_raylet.pyx", line 490, in ray._raylet.execute_task
  File "python/ray/_raylet.pyx", line 497, in ray._raylet.execute_task
  File "python/ray/_raylet.pyx", line 501, in ray._raylet.execute_task
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/ray/util/tracing/tracing_helper.py", line 330, in _function_with_tracing
    return function(*args, **kwargs)
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/modin/engines/ray/pandas_on_ray/frame/axis_partition.py", line 207, in deploy_ray_func
    result = func(*args)
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/modin/engines/base/frame/axis_partition.py", line 369, in deploy_func_between_two_axis_partitions
    result = func(lt_frame, rt_frame, **kwargs)
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/modin/engines/base/frame/data.py", line 1153, in _map_reduce_func
    series_result = func(df, *args, **kwargs)
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/modin/backends/pandas/query_compiler.py", line 2845, in <lambda>
    df, by, drop, partition_idx
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/modin/backends/pandas/query_compiler.py", line 2838, in groupby_agg_builder
    return compute_groupby(df.copy(), drop, partition_idx)
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/modin/backends/pandas/query_compiler.py", line 2752, in compute_groupby
    result = agg_func(grouped_df, **agg_kwargs)
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/modin/utils.py", line 500, in wrapper
    result = func(*args, **kwargs)
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/modin/pandas/groupby.py", line 280, in <lambda>
    return self._apply_agg_function(lambda df: df.apply(func, *args, **kwargs))
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/pandas/core/groupby/groupby.py", line 1253, in apply
    result = self._python_apply_general(f, self._selected_obj)
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/pandas/core/groupby/groupby.py", line 1287, in _python_apply_general
    keys, values, mutated = self.grouper.apply(f, data, self.axis)
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/pandas/core/groupby/ops.py", line 783, in apply
    result_values, mutated = splitter.fast_apply(f, sdata, group_keys)
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/pandas/core/groupby/ops.py", line 1328, in fast_apply
    return libreduction.apply_frame_axis0(sdata, f, names, starts, ends)
  File "pandas/_libs/reduction.pyx", line 381, in pandas._libs.reduction.apply_frame_axis0
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/pandas/core/groupby/groupby.py", line 1237, in f
    return func(g, *args, **kwargs)
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/modin/utils.py", line 500, in wrapper
    result = func(*args, **kwargs)
  File "<ipython-input-2-5658305649cc>", line 3, in printcol
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/pandas/core/frame.py", line 3455, in __getitem__
    indexer = self.columns.get_loc(key)
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 3363, in get_loc
    raise KeyError(key) from err
KeyError: 'A'

2021-09-16 10:05:22,867	ERROR worker.py:79 -- Unhandled error (suppress with RAY_IGNORE_UNHANDLED_ERRORS=1): ray::deploy_ray_func() (pid=19062, ip=172.31.12.170)
  File "pandas/_libs/index.pyx", line 76, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 108, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 5198, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 5206, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'A'

The above exception was the direct cause of the following exception:

ray::deploy_ray_func() (pid=19062, ip=172.31.12.170)
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/modin/backends/pandas/query_compiler.py", line 2834, in groupby_agg_builder
    return compute_groupby(df, drop, partition_idx)
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/modin/backends/pandas/query_compiler.py", line 2752, in compute_groupby
    result = agg_func(grouped_df, **agg_kwargs)
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/modin/utils.py", line 500, in wrapper
    result = func(*args, **kwargs)
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/modin/pandas/groupby.py", line 280, in <lambda>
    return self._apply_agg_function(lambda df: df.apply(func, *args, **kwargs))
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/pandas/core/groupby/groupby.py", line 1253, in apply
    result = self._python_apply_general(f, self._selected_obj)
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/pandas/core/groupby/groupby.py", line 1287, in _python_apply_general
    keys, values, mutated = self.grouper.apply(f, data, self.axis)
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/pandas/core/groupby/ops.py", line 783, in apply
    result_values, mutated = splitter.fast_apply(f, sdata, group_keys)
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/pandas/core/groupby/ops.py", line 1328, in fast_apply
    return libreduction.apply_frame_axis0(sdata, f, names, starts, ends)
  File "pandas/_libs/reduction.pyx", line 381, in pandas._libs.reduction.apply_frame_axis0
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/pandas/core/groupby/groupby.py", line 1237, in f
    return func(g, *args, **kwargs)
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/modin/utils.py", line 500, in wrapper
    result = func(*args, **kwargs)
  File "<ipython-input-2-5658305649cc>", line 3, in printcol
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/pandas/core/frame.py", line 3455, in __getitem__
    indexer = self.columns.get_loc(key)
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 3363, in get_loc
    raise KeyError(key) from err
KeyError: 'A'

During handling of the above exception, another exception occurred:

ray::deploy_ray_func() (pid=19062, ip=172.31.12.170)
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 3361, in get_loc
    return self._engine.get_loc(casted_key)
  File "pandas/_libs/index.pyx", line 76, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 108, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 5198, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 5206, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'A'

The above exception was the direct cause of the following exception:

ray::deploy_ray_func() (pid=19062, ip=172.31.12.170)
  File "python/ray/_raylet.pyx", line 490, in ray._raylet.execute_task
  File "python/ray/_raylet.pyx", line 497, in ray._raylet.execute_task
  File "python/ray/_raylet.pyx", line 501, in ray._raylet.execute_task
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/ray/util/tracing/tracing_helper.py", line 330, in _function_with_tracing
    return function(*args, **kwargs)
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/modin/engines/ray/pandas_on_ray/frame/axis_partition.py", line 207, in deploy_ray_func
    result = func(*args)
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/modin/engines/base/frame/axis_partition.py", line 369, in deploy_func_between_two_axis_partitions
    result = func(lt_frame, rt_frame, **kwargs)
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/modin/engines/base/frame/data.py", line 1153, in _map_reduce_func
    series_result = func(df, *args, **kwargs)
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/modin/backends/pandas/query_compiler.py", line 2845, in <lambda>
    df, by, drop, partition_idx
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/modin/backends/pandas/query_compiler.py", line 2838, in groupby_agg_builder
    return compute_groupby(df.copy(), drop, partition_idx)
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/modin/backends/pandas/query_compiler.py", line 2752, in compute_groupby
    result = agg_func(grouped_df, **agg_kwargs)
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/modin/utils.py", line 500, in wrapper
    result = func(*args, **kwargs)
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/modin/pandas/groupby.py", line 280, in <lambda>
    return self._apply_agg_function(lambda df: df.apply(func, *args, **kwargs))
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/pandas/core/groupby/groupby.py", line 1253, in apply
    result = self._python_apply_general(f, self._selected_obj)
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/pandas/core/groupby/groupby.py", line 1287, in _python_apply_general
    keys, values, mutated = self.grouper.apply(f, data, self.axis)
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/pandas/core/groupby/ops.py", line 783, in apply
    result_values, mutated = splitter.fast_apply(f, sdata, group_keys)
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/pandas/core/groupby/ops.py", line 1328, in fast_apply
    return libreduction.apply_frame_axis0(sdata, f, names, starts, ends)
  File "pandas/_libs/reduction.pyx", line 381, in pandas._libs.reduction.apply_frame_axis0
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/pandas/core/groupby/groupby.py", line 1237, in f
    return func(g, *args, **kwargs)
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/modin/utils.py", line 500, in wrapper
    result = func(*args, **kwargs)
  File "<ipython-input-2-5658305649cc>", line 3, in printcol
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/pandas/core/frame.py", line 3455, in __getitem__
    indexer = self.columns.get_loc(key)
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 3363, in get_loc
    raise KeyError(key) from err
KeyError: 'A'
2021-09-16 10:05:22,870	ERROR worker.py:79 -- Unhandled error (suppress with RAY_IGNORE_UNHANDLED_ERRORS=1): ray::deploy_ray_func() (pid=19062, ip=172.31.12.170)
  File "pandas/_libs/index.pyx", line 76, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 108, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 5198, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 5206, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'A'

The above exception was the direct cause of the following exception:

ray::deploy_ray_func() (pid=19062, ip=172.31.12.170)
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/modin/backends/pandas/query_compiler.py", line 2834, in groupby_agg_builder
    return compute_groupby(df, drop, partition_idx)
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/modin/backends/pandas/query_compiler.py", line 2752, in compute_groupby
    result = agg_func(grouped_df, **agg_kwargs)
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/modin/utils.py", line 500, in wrapper
    result = func(*args, **kwargs)
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/modin/pandas/groupby.py", line 280, in <lambda>
    return self._apply_agg_function(lambda df: df.apply(func, *args, **kwargs))
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/pandas/core/groupby/groupby.py", line 1253, in apply
    result = self._python_apply_general(f, self._selected_obj)
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/pandas/core/groupby/groupby.py", line 1287, in _python_apply_general
    keys, values, mutated = self.grouper.apply(f, data, self.axis)
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/pandas/core/groupby/ops.py", line 783, in apply
    result_values, mutated = splitter.fast_apply(f, sdata, group_keys)
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/pandas/core/groupby/ops.py", line 1328, in fast_apply
    return libreduction.apply_frame_axis0(sdata, f, names, starts, ends)
  File "pandas/_libs/reduction.pyx", line 381, in pandas._libs.reduction.apply_frame_axis0
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/pandas/core/groupby/groupby.py", line 1237, in f
    return func(g, *args, **kwargs)
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/modin/utils.py", line 500, in wrapper
    result = func(*args, **kwargs)
  File "<ipython-input-2-5658305649cc>", line 3, in printcol
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/pandas/core/frame.py", line 3455, in __getitem__
    indexer = self.columns.get_loc(key)
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 3363, in get_loc
    raise KeyError(key) from err
KeyError: 'A'

During handling of the above exception, another exception occurred:

ray::deploy_ray_func() (pid=19062, ip=172.31.12.170)
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 3361, in get_loc
    return self._engine.get_loc(casted_key)
  File "pandas/_libs/index.pyx", line 76, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 108, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 5198, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 5206, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'A'

The above exception was the direct cause of the following exception:

ray::deploy_ray_func() (pid=19062, ip=172.31.12.170)
  File "python/ray/_raylet.pyx", line 490, in ray._raylet.execute_task
  File "python/ray/_raylet.pyx", line 497, in ray._raylet.execute_task
  File "python/ray/_raylet.pyx", line 501, in ray._raylet.execute_task
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/ray/util/tracing/tracing_helper.py", line 330, in _function_with_tracing
    return function(*args, **kwargs)
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/modin/engines/ray/pandas_on_ray/frame/axis_partition.py", line 207, in deploy_ray_func
    result = func(*args)
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/modin/engines/base/frame/axis_partition.py", line 369, in deploy_func_between_two_axis_partitions
    result = func(lt_frame, rt_frame, **kwargs)
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/modin/engines/base/frame/data.py", line 1153, in _map_reduce_func
    series_result = func(df, *args, **kwargs)
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/modin/backends/pandas/query_compiler.py", line 2845, in <lambda>
    df, by, drop, partition_idx
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/modin/backends/pandas/query_compiler.py", line 2838, in groupby_agg_builder
    return compute_groupby(df.copy(), drop, partition_idx)
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/modin/backends/pandas/query_compiler.py", line 2752, in compute_groupby
    result = agg_func(grouped_df, **agg_kwargs)
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/modin/utils.py", line 500, in wrapper
    result = func(*args, **kwargs)
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/modin/pandas/groupby.py", line 280, in <lambda>
    return self._apply_agg_function(lambda df: df.apply(func, *args, **kwargs))
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/pandas/core/groupby/groupby.py", line 1253, in apply
    result = self._python_apply_general(f, self._selected_obj)
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/pandas/core/groupby/groupby.py", line 1287, in _python_apply_general
    keys, values, mutated = self.grouper.apply(f, data, self.axis)
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/pandas/core/groupby/ops.py", line 783, in apply
    result_values, mutated = splitter.fast_apply(f, sdata, group_keys)
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/pandas/core/groupby/ops.py", line 1328, in fast_apply
    return libreduction.apply_frame_axis0(sdata, f, names, starts, ends)
  File "pandas/_libs/reduction.pyx", line 381, in pandas._libs.reduction.apply_frame_axis0
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/pandas/core/groupby/groupby.py", line 1237, in f
    return func(g, *args, **kwargs)
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/modin/utils.py", line 500, in wrapper
    result = func(*args, **kwargs)
  File "<ipython-input-2-5658305649cc>", line 3, in printcol
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/pandas/core/frame.py", line 3455, in __getitem__
    indexer = self.columns.get_loc(key)
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 3363, in get_loc
    raise KeyError(key) from err
KeyError: 'A'
2021-09-16 10:05:22,874	ERROR worker.py:79 -- Unhandled error (suppress with RAY_IGNORE_UNHANDLED_ERRORS=1): ray::deploy_ray_func() (pid=19062, ip=172.31.12.170)
  File "pandas/_libs/index.pyx", line 76, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 108, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 5198, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 5206, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'A'

The above exception was the direct cause of the following exception:

ray::deploy_ray_func() (pid=19062, ip=172.31.12.170)
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/modin/backends/pandas/query_compiler.py", line 2834, in groupby_agg_builder
    return compute_groupby(df, drop, partition_idx)
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/modin/backends/pandas/query_compiler.py", line 2752, in compute_groupby
    result = agg_func(grouped_df, **agg_kwargs)
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/modin/utils.py", line 500, in wrapper
    result = func(*args, **kwargs)
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/modin/pandas/groupby.py", line 280, in <lambda>
    return self._apply_agg_function(lambda df: df.apply(func, *args, **kwargs))
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/pandas/core/groupby/groupby.py", line 1253, in apply
    result = self._python_apply_general(f, self._selected_obj)
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/pandas/core/groupby/groupby.py", line 1287, in _python_apply_general
    keys, values, mutated = self.grouper.apply(f, data, self.axis)
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/pandas/core/groupby/ops.py", line 783, in apply
    result_values, mutated = splitter.fast_apply(f, sdata, group_keys)
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/pandas/core/groupby/ops.py", line 1328, in fast_apply
    return libreduction.apply_frame_axis0(sdata, f, names, starts, ends)
  File "pandas/_libs/reduction.pyx", line 381, in pandas._libs.reduction.apply_frame_axis0
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/pandas/core/groupby/groupby.py", line 1237, in f
    return func(g, *args, **kwargs)
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/modin/utils.py", line 500, in wrapper
    result = func(*args, **kwargs)
  File "<ipython-input-2-5658305649cc>", line 3, in printcol
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/pandas/core/frame.py", line 3455, in __getitem__
    indexer = self.columns.get_loc(key)
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 3363, in get_loc
    raise KeyError(key) from err
KeyError: 'A'

During handling of the above exception, another exception occurred:

ray::deploy_ray_func() (pid=19062, ip=172.31.12.170)
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 3361, in get_loc
    return self._engine.get_loc(casted_key)
  File "pandas/_libs/index.pyx", line 76, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 108, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 5198, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 5206, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'A'

The above exception was the direct cause of the following exception:

ray::deploy_ray_func() (pid=19062, ip=172.31.12.170)
  File "python/ray/_raylet.pyx", line 490, in ray._raylet.execute_task
  File "python/ray/_raylet.pyx", line 497, in ray._raylet.execute_task
  File "python/ray/_raylet.pyx", line 501, in ray._raylet.execute_task
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/ray/util/tracing/tracing_helper.py", line 330, in _function_with_tracing
    return function(*args, **kwargs)
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/modin/engines/ray/pandas_on_ray/frame/axis_partition.py", line 207, in deploy_ray_func
    result = func(*args)
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/modin/engines/base/frame/axis_partition.py", line 369, in deploy_func_between_two_axis_partitions
    result = func(lt_frame, rt_frame, **kwargs)
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/modin/engines/base/frame/data.py", line 1153, in _map_reduce_func
    series_result = func(df, *args, **kwargs)
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/modin/backends/pandas/query_compiler.py", line 2845, in <lambda>
    df, by, drop, partition_idx
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/modin/backends/pandas/query_compiler.py", line 2838, in groupby_agg_builder
    return compute_groupby(df.copy(), drop, partition_idx)
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/modin/backends/pandas/query_compiler.py", line 2752, in compute_groupby
    result = agg_func(grouped_df, **agg_kwargs)
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/modin/utils.py", line 500, in wrapper
    result = func(*args, **kwargs)
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/modin/pandas/groupby.py", line 280, in <lambda>
    return self._apply_agg_function(lambda df: df.apply(func, *args, **kwargs))
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/pandas/core/groupby/groupby.py", line 1253, in apply
    result = self._python_apply_general(f, self._selected_obj)
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/pandas/core/groupby/groupby.py", line 1287, in _python_apply_general
    keys, values, mutated = self.grouper.apply(f, data, self.axis)
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/pandas/core/groupby/ops.py", line 783, in apply
    result_values, mutated = splitter.fast_apply(f, sdata, group_keys)
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/pandas/core/groupby/ops.py", line 1328, in fast_apply
    return libreduction.apply_frame_axis0(sdata, f, names, starts, ends)
  File "pandas/_libs/reduction.pyx", line 381, in pandas._libs.reduction.apply_frame_axis0
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/pandas/core/groupby/groupby.py", line 1237, in f
    return func(g, *args, **kwargs)
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/modin/utils.py", line 500, in wrapper
    result = func(*args, **kwargs)
  File "<ipython-input-2-5658305649cc>", line 3, in printcol
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/pandas/core/frame.py", line 3455, in __getitem__
    indexer = self.columns.get_loc(key)
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 3363, in get_loc
    raise KeyError(key) from err
KeyError: 'A'

Issue Analytics

  • State:open
  • Created 2 years ago
  • Comments:15 (7 by maintainers)

github_iconTop GitHub Comments

2reactions
dchigarevcommented, Sep 22, 2021

It appears that the root cause of this issue is similar to the problem reported in #2511. When we’re applying a function in groupby, we expect that the applied function obtains the whole group frame:

   A  B  KEY |
0  .  .  .   | first group
1  .  .  .   |
-------------|
2  .  .  .   |
3  .  .  .   | second group
4  .  .  .   |

In pandas, the history of calls of the applied function would be:

fn(first_group)
fn(second_group)

When a frame is converted to Modin, it’s being split into partitions, in the related reproducer before doing the failing groupby.apply the frame split is:

  part1 | part2|
   A  B | KEY  |
0  .  . | .    | first group
1  .  . | .    |
--------|------|
2  .  . | .    |
3  .  . | .    | second group
4  .  . | .    |

As you can see, now we got two parts and only one of them contains the “A” column. When doing groupby.apply Modin does not concatenate partition along rows, so we get 4 calls of the applied fn here:

fn(part1_first_group)
fn(part1_second_group)
fn(part2_first_group)
fn(part2_second_group)

Since only “part1” contains the “A” column, we get a KeyError for every “part2”.

This is certainly a bug, Modin’s users don’t have to care about partitioning, but unfortunately in this case there’s no way of not being informed about them for now.

Unfortunately, I can’t suggest a workaround for except of defaulting to pandas for this particular operation (@devin-petersohn maybe you can?):

df = df._default_to_pandas(lambda df: df.groupby("KEY").apply(printcol, column="A"))

@modin-project/modin-core BTW, maybe we should replace our current groupby.apply implementation to default-to-pandas since it just doesn’t work as intended?

1reaction
dchigarevcommented, Sep 22, 2021

😄 oh, well…

but generally, both our answers seem to be correct.

Answering the question: “does it matter for the partitioning that the ‘KEY’ column was inserted after frame’s construction”, the answer is YES (see my comment).

Answering the question: “does it matter for the ‘apply’ that the ‘KEY’ column was inserted after”, the answer is NO, because the error happens due to a bug (see Devin’s comment). (correct if I’m wrong)

Read more comments on GitHub >

github_iconTop Results From Across the Web

Group by key error for dask dataframe map_partitions
I'm trying to run a dask job on a big dataframe. Essentially there are two groups within the dataframe, and I'm performing a...
Read more >
dask.dataframe.groupby - Dask documentation
This class allows users to define their own custom aggregation in terms of operations on Pandas dataframes in a map-reduce style. You need...
Read more >
pyspark.sql module - Apache Spark
DataFrame A distributed collection of data grouped into named columns. ... Sets a name for the application, which will be shown in the...
Read more >
Implementing efficient UD(A)Fs with PySpark - inovex GmbH
The general problem of accessing data frames from different programming ... the number of partitions, like 100, or even group by a column, ......
Read more >
Efficient UD(A)Fs with PySpark - Florian Wilhelm's blog
The general problem of accessing data frames from different programming ... the number of partitions, like 100, or even group by a column, ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found