Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

MultiIndex .loc fails on modin and works on pandas

See original GitHub issue

System information

OS Platform and Distribution (e.g., Linux Ubuntu 16.04): macOS 10.14
Modin installed from (source or binary): binary
Modin version: 0.4.0
Pandas version: 0.24.1
Python version: 3.7.0
Exact command to reproduce:

# import modin.pandas as pd
DF2 = pd.read_csv(
    'https://github.com/lemeb/Best_of_Times_study/raw/8c0c42296a6eb01e09817ad6be0ec827ee273808/blah.csv', 
    header=[0,1,2,3], index_col=0)

DF2.loc[1] # Will work
DF2.loc[1, 'Presidents'] # Will fail (see log 1)
DF2.loc[1, ('Presidents', 'Pure mentions', 'IND', 'all')] # Will fail (see log 2)

Describe the problem

I have a fairly large dataset, with a four-level MultiIndex. I tried to use modin to speed things up, but a fairly simple .loc command doesn’t work, even though it works with a vanilla pandas implementation.

The code that I provided works under vanilla pandas, but doesn’t under modin.

Source code / logs

Log 1

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-7-f7a945237ab9> in <module>()
      3     header=[0,1,2,3], index_col=0)
      4 DF2.loc[1]
----> 5 DF2.loc[1, 'Presidents']

~/anaconda3/lib/python3.7/site-packages/modin/pandas/indexing.py in __getitem__(self, key)
    232         row_lookup, col_lookup = self._compute_lookup(row_loc, col_loc)
    233         ndim = self._expand_dim(row_lookup, col_lookup, ndim)
--> 234         result = super(_LocIndexer, self).__getitem__(row_lookup, col_lookup, ndim)
    235         return result
    236 

~/anaconda3/lib/python3.7/site-packages/modin/pandas/indexing.py in __getitem__(self, row_lookup, col_lookup, ndim)
    163             single_axis = 1 if self.col_scaler else 0
    164             return SeriesView(
--> 165                 qc_view.squeeze(ndim=1, axis=single_axis),
    166                 self.df,
    167                 (row_lookup, col_lookup),

~/anaconda3/lib/python3.7/site-packages/modin/data_management/query_compiler/pandas_query_compiler.py in squeeze(self, ndim, axis)
   2781             if axis is None:
   2782                 axis = 0 if self.data.shape[1] > 1 else 1
-> 2783             squeezed = pandas.Series(to_squeeze.squeeze(axis))
   2784             scaler_axis = self.columns if axis else self.index
   2785             non_scaler_axis = self.index if axis else self.columns

~/anaconda3/lib/python3.7/site-packages/pandas/core/series.py in __init__(self, data, index, dtype, name, copy, fastpath)
    260             else:
    261                 data = sanitize_array(data, index, dtype, copy,
--> 262                                       raise_cast_failure=True)
    263 
    264                 data = SingleBlockManager(data, index, fastpath=True)

~/anaconda3/lib/python3.7/site-packages/pandas/core/internals/construction.py in sanitize_array(data, index, dtype, copy, raise_cast_failure)
    658             raise Exception('Data must be 1-dimensional')
    659         else:
--> 660             subarr = com.asarray_tuplesafe(data, dtype=dtype)
    661 
    662     # This is to prevent mixed-type Series getting all casted to

~/anaconda3/lib/python3.7/site-packages/pandas/core/common.py in asarray_tuplesafe(values, dtype)
    238         # Avoid building an array of arrays:
    239         # TODO: verify whether any path hits this except #18819 (invalid)
--> 240         values = [tuple(x) for x in values]
    241         result = construct_1d_object_array_from_listlike(values)
    242 

~/anaconda3/lib/python3.7/site-packages/pandas/core/common.py in <listcomp>(.0)
    238         # Avoid building an array of arrays:
    239         # TODO: verify whether any path hits this except #18819 (invalid)
--> 240         values = [tuple(x) for x in values]
    241         result = construct_1d_object_array_from_listlike(values)
    242 

TypeError: 'int' object is not iterable

Log 2

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-6-a1a1b4249d92> in <module>()
      3     header=[0,1,2,3], index_col=0)
      4 DF2.loc[1]
----> 5 DF2.loc[1, ('Presidents', 'Pure mentions', 'IND', 'all')]

~/anaconda3/lib/python3.7/site-packages/modin/pandas/indexing.py in __getitem__(self, key)
    231         self._handle_enlargement(row_loc, col_loc)
    232         row_lookup, col_lookup = self._compute_lookup(row_loc, col_loc)
--> 233         ndim = self._expand_dim(row_lookup, col_lookup, ndim)
    234         result = super(_LocIndexer, self).__getitem__(row_lookup, col_lookup, ndim)
    235         return result

~/anaconda3/lib/python3.7/site-packages/modin/pandas/indexing.py in _expand_dim(self, row_lookup, col_lookup, ndim)
    299         """
    300         many_rows = len(row_lookup) > 1
--> 301         many_cols = len(col_lookup) > 1
    302 
    303         if ndim == 0 and (many_rows or many_cols):

TypeError: object of type 'builtin_function_or_method' has no len()

Issue Analytics

State:
Created 5 years ago
Comments:5 (3 by maintainers)

Top GitHub Comments

1reaction

devin-petersohncommented, Mar 20, 2019

Hi @lemeb, thanks for posting this!

I see what the issue is and I will get it fixed as soon as I can. We are still working through some things related to MultiIndex, and I appreciate the report.

0reactions

lemebcommented, Mar 30, 2019

Done. #523

Thanks!