MultiIndex .loc fails on modin and works on pandas
See original GitHub issueSystem information
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04): macOS 10.14
- Modin installed from (source or binary): binary
- Modin version: 0.4.0
- Pandas version: 0.24.1
- Python version: 3.7.0
- Exact command to reproduce:
# import modin.pandas as pd
DF2 = pd.read_csv(
'https://github.com/lemeb/Best_of_Times_study/raw/8c0c42296a6eb01e09817ad6be0ec827ee273808/blah.csv',
header=[0,1,2,3], index_col=0)
DF2.loc[1] # Will work
DF2.loc[1, 'Presidents'] # Will fail (see log 1)
DF2.loc[1, ('Presidents', 'Pure mentions', 'IND', 'all')] # Will fail (see log 2)
Describe the problem
I have a fairly large dataset, with a four-level MultiIndex. I tried to use modin to speed things up, but a fairly simple .loc
command doesn’t work, even though it works with a vanilla pandas implementation.
The code that I provided works under vanilla pandas, but doesn’t under modin.
Source code / logs
Log 1
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-7-f7a945237ab9> in <module>()
3 header=[0,1,2,3], index_col=0)
4 DF2.loc[1]
----> 5 DF2.loc[1, 'Presidents']
~/anaconda3/lib/python3.7/site-packages/modin/pandas/indexing.py in __getitem__(self, key)
232 row_lookup, col_lookup = self._compute_lookup(row_loc, col_loc)
233 ndim = self._expand_dim(row_lookup, col_lookup, ndim)
--> 234 result = super(_LocIndexer, self).__getitem__(row_lookup, col_lookup, ndim)
235 return result
236
~/anaconda3/lib/python3.7/site-packages/modin/pandas/indexing.py in __getitem__(self, row_lookup, col_lookup, ndim)
163 single_axis = 1 if self.col_scaler else 0
164 return SeriesView(
--> 165 qc_view.squeeze(ndim=1, axis=single_axis),
166 self.df,
167 (row_lookup, col_lookup),
~/anaconda3/lib/python3.7/site-packages/modin/data_management/query_compiler/pandas_query_compiler.py in squeeze(self, ndim, axis)
2781 if axis is None:
2782 axis = 0 if self.data.shape[1] > 1 else 1
-> 2783 squeezed = pandas.Series(to_squeeze.squeeze(axis))
2784 scaler_axis = self.columns if axis else self.index
2785 non_scaler_axis = self.index if axis else self.columns
~/anaconda3/lib/python3.7/site-packages/pandas/core/series.py in __init__(self, data, index, dtype, name, copy, fastpath)
260 else:
261 data = sanitize_array(data, index, dtype, copy,
--> 262 raise_cast_failure=True)
263
264 data = SingleBlockManager(data, index, fastpath=True)
~/anaconda3/lib/python3.7/site-packages/pandas/core/internals/construction.py in sanitize_array(data, index, dtype, copy, raise_cast_failure)
658 raise Exception('Data must be 1-dimensional')
659 else:
--> 660 subarr = com.asarray_tuplesafe(data, dtype=dtype)
661
662 # This is to prevent mixed-type Series getting all casted to
~/anaconda3/lib/python3.7/site-packages/pandas/core/common.py in asarray_tuplesafe(values, dtype)
238 # Avoid building an array of arrays:
239 # TODO: verify whether any path hits this except #18819 (invalid)
--> 240 values = [tuple(x) for x in values]
241 result = construct_1d_object_array_from_listlike(values)
242
~/anaconda3/lib/python3.7/site-packages/pandas/core/common.py in <listcomp>(.0)
238 # Avoid building an array of arrays:
239 # TODO: verify whether any path hits this except #18819 (invalid)
--> 240 values = [tuple(x) for x in values]
241 result = construct_1d_object_array_from_listlike(values)
242
TypeError: 'int' object is not iterable
Log 2
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-6-a1a1b4249d92> in <module>()
3 header=[0,1,2,3], index_col=0)
4 DF2.loc[1]
----> 5 DF2.loc[1, ('Presidents', 'Pure mentions', 'IND', 'all')]
~/anaconda3/lib/python3.7/site-packages/modin/pandas/indexing.py in __getitem__(self, key)
231 self._handle_enlargement(row_loc, col_loc)
232 row_lookup, col_lookup = self._compute_lookup(row_loc, col_loc)
--> 233 ndim = self._expand_dim(row_lookup, col_lookup, ndim)
234 result = super(_LocIndexer, self).__getitem__(row_lookup, col_lookup, ndim)
235 return result
~/anaconda3/lib/python3.7/site-packages/modin/pandas/indexing.py in _expand_dim(self, row_lookup, col_lookup, ndim)
299 """
300 many_rows = len(row_lookup) > 1
--> 301 many_cols = len(col_lookup) > 1
302
303 if ndim == 0 and (many_rows or many_cols):
TypeError: object of type 'builtin_function_or_method' has no len()
Issue Analytics
- State:
- Created 5 years ago
- Comments:5 (3 by maintainers)
Top Results From Across the Web
python - Splicing MultiIndex (.loc not working) - Stack Overflow
Solution 2: swaplevel and loc To use loc['cre'] , 'cre' needs to be in the first level of the multiindex. Swapping levels fixes...
Read more >Support for Multi-index - I want to contribute! - Modin Discuss
It seems that current design of Modin DataFrame “wrapper” (the modin.pandas.DataFrame thingy) is rather rigid in terms of indexing - it seems to...
Read more >Pandas reset index - How to reset the index and convert the ...
pandas.reset_index in Python is used to reset the current index of a dataframe to default indexing (0 to number of rows minus 1)...
Read more >Working with MultiIndex in pandas DataFrame
In this article, I will explain working on MultiIndex pandas ... If you have column names the same as Index, you will get...
Read more >[Solved]-(KeyError): MultiIndex Slicing requires the index to be fully ...
[Solved]-(KeyError): MultiIndex Slicing requires the index to be fully lexsorted tuple ... Why is this caused by a list, but not by a...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Hi @lemeb, thanks for posting this!
I see what the issue is and I will get it fixed as soon as I can. We are still working through some things related to MultiIndex, and I appreciate the report.
Done. #523
Thanks!