question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

MultiIndex .loc fails on modin and works on pandas

See original GitHub issue

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): macOS 10.14
  • Modin installed from (source or binary): binary
  • Modin version: 0.4.0
  • Pandas version: 0.24.1
  • Python version: 3.7.0
  • Exact command to reproduce:
# import modin.pandas as pd
DF2 = pd.read_csv(
    'https://github.com/lemeb/Best_of_Times_study/raw/8c0c42296a6eb01e09817ad6be0ec827ee273808/blah.csv', 
    header=[0,1,2,3], index_col=0)

DF2.loc[1] # Will work
DF2.loc[1, 'Presidents'] # Will fail (see log 1)
DF2.loc[1, ('Presidents', 'Pure mentions', 'IND', 'all')] # Will fail (see log 2)

Describe the problem

I have a fairly large dataset, with a four-level MultiIndex. I tried to use modin to speed things up, but a fairly simple .loc command doesn’t work, even though it works with a vanilla pandas implementation.

The code that I provided works under vanilla pandas, but doesn’t under modin.

Source code / logs

Log 1

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-7-f7a945237ab9> in <module>()
      3     header=[0,1,2,3], index_col=0)
      4 DF2.loc[1]
----> 5 DF2.loc[1, 'Presidents']

~/anaconda3/lib/python3.7/site-packages/modin/pandas/indexing.py in __getitem__(self, key)
    232         row_lookup, col_lookup = self._compute_lookup(row_loc, col_loc)
    233         ndim = self._expand_dim(row_lookup, col_lookup, ndim)
--> 234         result = super(_LocIndexer, self).__getitem__(row_lookup, col_lookup, ndim)
    235         return result
    236 

~/anaconda3/lib/python3.7/site-packages/modin/pandas/indexing.py in __getitem__(self, row_lookup, col_lookup, ndim)
    163             single_axis = 1 if self.col_scaler else 0
    164             return SeriesView(
--> 165                 qc_view.squeeze(ndim=1, axis=single_axis),
    166                 self.df,
    167                 (row_lookup, col_lookup),

~/anaconda3/lib/python3.7/site-packages/modin/data_management/query_compiler/pandas_query_compiler.py in squeeze(self, ndim, axis)
   2781             if axis is None:
   2782                 axis = 0 if self.data.shape[1] > 1 else 1
-> 2783             squeezed = pandas.Series(to_squeeze.squeeze(axis))
   2784             scaler_axis = self.columns if axis else self.index
   2785             non_scaler_axis = self.index if axis else self.columns

~/anaconda3/lib/python3.7/site-packages/pandas/core/series.py in __init__(self, data, index, dtype, name, copy, fastpath)
    260             else:
    261                 data = sanitize_array(data, index, dtype, copy,
--> 262                                       raise_cast_failure=True)
    263 
    264                 data = SingleBlockManager(data, index, fastpath=True)

~/anaconda3/lib/python3.7/site-packages/pandas/core/internals/construction.py in sanitize_array(data, index, dtype, copy, raise_cast_failure)
    658             raise Exception('Data must be 1-dimensional')
    659         else:
--> 660             subarr = com.asarray_tuplesafe(data, dtype=dtype)
    661 
    662     # This is to prevent mixed-type Series getting all casted to

~/anaconda3/lib/python3.7/site-packages/pandas/core/common.py in asarray_tuplesafe(values, dtype)
    238         # Avoid building an array of arrays:
    239         # TODO: verify whether any path hits this except #18819 (invalid)
--> 240         values = [tuple(x) for x in values]
    241         result = construct_1d_object_array_from_listlike(values)
    242 

~/anaconda3/lib/python3.7/site-packages/pandas/core/common.py in <listcomp>(.0)
    238         # Avoid building an array of arrays:
    239         # TODO: verify whether any path hits this except #18819 (invalid)
--> 240         values = [tuple(x) for x in values]
    241         result = construct_1d_object_array_from_listlike(values)
    242 

TypeError: 'int' object is not iterable

Log 2

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-6-a1a1b4249d92> in <module>()
      3     header=[0,1,2,3], index_col=0)
      4 DF2.loc[1]
----> 5 DF2.loc[1, ('Presidents', 'Pure mentions', 'IND', 'all')]

~/anaconda3/lib/python3.7/site-packages/modin/pandas/indexing.py in __getitem__(self, key)
    231         self._handle_enlargement(row_loc, col_loc)
    232         row_lookup, col_lookup = self._compute_lookup(row_loc, col_loc)
--> 233         ndim = self._expand_dim(row_lookup, col_lookup, ndim)
    234         result = super(_LocIndexer, self).__getitem__(row_lookup, col_lookup, ndim)
    235         return result

~/anaconda3/lib/python3.7/site-packages/modin/pandas/indexing.py in _expand_dim(self, row_lookup, col_lookup, ndim)
    299         """
    300         many_rows = len(row_lookup) > 1
--> 301         many_cols = len(col_lookup) > 1
    302 
    303         if ndim == 0 and (many_rows or many_cols):

TypeError: object of type 'builtin_function_or_method' has no len()

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:5 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
devin-petersohncommented, Mar 20, 2019

Hi @lemeb, thanks for posting this!

I see what the issue is and I will get it fixed as soon as I can. We are still working through some things related to MultiIndex, and I appreciate the report.

0reactions
lemebcommented, Mar 30, 2019

Done. #523

Thanks!

Read more comments on GitHub >

github_iconTop Results From Across the Web

python - Splicing MultiIndex (.loc not working) - Stack Overflow
Solution 2: swaplevel and loc​​ To use loc['cre'] , 'cre' needs to be in the first level of the multiindex. Swapping levels fixes...
Read more >
Support for Multi-index - I want to contribute! - Modin Discuss
It seems that current design of Modin DataFrame “wrapper” (the modin.pandas.DataFrame thingy) is rather rigid in terms of indexing - it seems to...
Read more >
Pandas reset index - How to reset the index and convert the ...
pandas.reset_index in Python is used to reset the current index of a dataframe to default indexing (0 to number of rows minus 1)...
Read more >
Working with MultiIndex in pandas DataFrame
In this article, I will explain working on MultiIndex pandas ... If you have column names the same as Index, you will get...
Read more >
[Solved]-(KeyError): MultiIndex Slicing requires the index to be fully ...
[Solved]-(KeyError): MultiIndex Slicing requires the index to be fully lexsorted tuple ... Why is this caused by a list, but not by a...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found