df.groupby('index_column_name') results in a key error. In pandas it doesn't
See original GitHub issueSystem information
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04): I use arch btw
- Modin version (
modin.__version__
): 0.8.0 - Python version: 3.8
- Code we can use to reproduce:
Pandas (works fine):
import pandas as pd
from pandas import util
df= util.testing.makeMixedDataFrame()
print(df.head())
df = df.to_numpy()
df = pd.DataFrame(df)
df.columns = ["A", "B", "C", "D"]
df = df.set_index("C")
df.groupby("C")
Modin (gives error):
import modin.pandas as pd
from pandas import util
df= util.testing.makeMixedDataFrame()
print(df.head())
df = df.to_numpy()
df = pd.DataFrame(df)
df.columns = ["A", "B", "C", "D"]
df = df.set_index("C")
df.groupby("C") # <- Key Error! (scroll below)
Describe the problem
Using groupby with index column name in Pandas does not give a key error and works fine. However in Modin this results in a key error
Source code / logs
KeyError Traceback (most recent call last)
in
7 df.columns = ["A", "B", "C", "D"]
8 df = df.set_index("C")
----> 9 df.groupby("C")
~/anaconda3/envs/recnn/lib/python3.8/site-packages/modin/pandas/dataframe.py in groupby(self, by, axis, level, as_index, sort, group_keys, squeeze, observed)
434 pass
435 else:
--> 436 by = self.__getitem__(by)._query_compiler
437 elif isinstance(by, Series):
438 drop = by._parent is self
~/anaconda3/envs/recnn/lib/python3.8/site-packages/modin/pandas/base.py in __getitem__(self, key)
3458 return self._getitem_slice(indexer)
3459 else:
-> 3460 return self._getitem(key)
3461
3462 def _getitem_slice(self, key):
~/anaconda3/envs/recnn/lib/python3.8/site-packages/modin/pandas/dataframe.py in _getitem(self, key)
2422 # return self._getitem_multilevel(key)
2423 else:
-> 2424 return self._getitem_column(key)
2425
2426 def _getitem_column(self, key):
~/anaconda3/envs/recnn/lib/python3.8/site-packages/modin/pandas/dataframe.py in _getitem_column(self, key)
2426 def _getitem_column(self, key):
2427 if key not in self.keys():
-> 2428 raise KeyError("{}".format(key))
2429 s = DataFrame(
2430 query_compiler=self._query_compiler.getitem_column_array([key])
KeyError: 'C'
Issue Analytics
- State:
- Created 3 years ago
- Reactions:1
- Comments:10 (6 by maintainers)
Top Results From Across the Web
KeyError from pandas DataFrame groupby - Stack Overflow
This is very important, as I opened, copied & saved the exactly content using Notepad++, and there won't be such problem with the...
Read more >KeyError Pandas – How To Fix - Data Independent
Pandas KeyError - This annoying error means that Pandas can not find your column name in your dataframe. Here's how to fix this...
Read more >How to Fix KeyError in Pandas (With Example) - Statology
This error occurs when you attempt to access some column in a pandas DataFrame that does not exist. Typically this error occurs when...
Read more >How to Fix: KeyError in Pandas - GeeksforGeeks
Pandas KeyError occurs when we try to access some column/row label in our DataFrame that doesn't exist. Usually, this error occurs when you ......
Read more >What's new in 1.3.0 (July 2, 2021) - Pandas
Constructing a DataFrame or Series with the data argument being a Python iterable that is not a NumPy ndarray consisting of NumPy scalars...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
I reopened the issue because the fix doesn’t fully resolve the problem.
Ran into the issue today that I feel like this has not been completely resolved.
Using modin 0.15.1
basically is df = mpd.DataFrame({“A”:[1,2,3], “B”:[1,4,5]})
running this: df.groupby(“A”).apply(lambda x: x.loc[:, “B”])
would give a KeyError on “B”