question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

defaulting to pandas on a reindex causes a raise

See original GitHub issue

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): python 3.8.3-slim (docker)
  • Modin version (modin.__version__): 0.7.4
  • Python version: 3.8.3
  • Code we can use to reproduce:

Describe the problem

I have a specific case where I’m building a timeseries dataframe and then backfilling some data.

to do this, I turn two columns into a multiindex. then a create a new index from all of the values I’d like to see backfilled.

then I set_index and reindex on the new “fuller” index.

I looked through the modin code and it looks like it doesnt support reindexing on a multiindex, which is totally understandable. but then what happens is that it appears to default to the original pandas incorrectly. I THINK its because some defaults are being set at the top of the reindex method (as part of the check) but then these defaults are being passed to the baseline pandas when normally it would get Nones (e.g. axis or index parameters).

I tried to simulate my case with some dummy code. not sure if it makes it clearer or is more confusing 😃

Source code / logs

>>> df = pandas.DataFrame({"foo": [1,2,3,4], "bar": ["a", "b", "c", "d"], "waldo": [11, 12, 13, 14]})
UserWarning: Distributing <class 'dict'> object. This may take some time.
>>> df = df.set_index(["foo", "bar"])
>>> df
         waldo
foo bar       
1   a       11
2   b       12
3   c       13
4   d       14
>>> new_index = pandas.MultiIndex.from_product([["a", "b", "c"], ["d", "e", "f"]])
>>> df.reindex(new_index)
UserWarning: `DataFrame.reindex` defaulting to pandas implementation.
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/orenmazor/.pyenv/versions/3.8.2/Python.framework/Versions/3.8/lib/python3.8/site-packages/modin/pandas/base.py", line 2038, in reindex
    return self._default_to_pandas(
  File "/Users/orenmazor/.pyenv/versions/3.8.2/Python.framework/Versions/3.8/lib/python3.8/site-packages/modin/pandas/base.py", line 251, in _default_to_pandas
    result = getattr(getattr(pandas, self.__name__), op)(
  File "/Users/orenmazor/.pyenv/versions/3.8.2/Python.framework/Versions/3.8/lib/python3.8/site-packages/pandas/util/_decorators.py", line 227, in wrapper
    return func(*args, **kwargs)
  File "/Users/orenmazor/.pyenv/versions/3.8.2/Python.framework/Versions/3.8/lib/python3.8/site-packages/pandas/core/frame.py", line 3851, in reindex
    axes = validate_axis_style_args(self, args, kwargs, "labels", "reindex")
  File "/Users/orenmazor/.pyenv/versions/3.8.2/Python.framework/Versions/3.8/lib/python3.8/site-packages/pandas/util/_validators.py", line 260, in validate_axis_style_args
    raise TypeError(msg)
TypeError: Cannot specify both 'axis' and any of 'index' or 'columns'.
>>> ```

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:5 (3 by maintainers)

github_iconTop GitHub Comments

2reactions
devin-petersohncommented, Jan 18, 2021

@drPytho thanks for the bump, here is the current workaround:

# some modin.pandas dataframe `df`
result = df._default_to_pandas("reindex", new_labels)

Hope this helps (in the short term). I will make sure this gets fixed for the next release.

0reactions
dchigarevcommented, Feb 15, 2021

This issue was fixed by #2660

Read more comments on GitHub >

github_iconTop Results From Across the Web

Pandas reset index is not taking effect - Stack Overflow
reset_index by default does not modify the DataFrame; it returns a new DataFrame with the reset index. If you want to modify the...
Read more >
pandas.DataFrame.reindex — pandas 1.5.2 documentation
Create a new index and reindex the dataframe. By default values in the new index that do not have corresponding records in the...
Read more >
pandas.DataFrame.reindex — pandas 0.21.0 documentation
Create a new index and reindex the dataframe. By default values in the new index that do not have corresponding records in the...
Read more >
MultiIndex / advanced indexing — pandas 1.5.2 documentation
0: Index will become the default index type for numeric types in the future instead of Int64Index , Float64Index and UInt64Index and those...
Read more >
pandas.read_csv — pandas 1.5.2 documentation
Lines with too many fields (e.g. a csv line with too many commas) will by default cause an exception to be raised, and...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found