Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

pd.Series.ffill() raise the error: AttributeError: 'numpy.ndarray' object has no attribute 'ffill'

See original GitHub issue

System information

OS Platform and Distribution - WIndows10:
Modin version (0.15.0+7.g4ec7f634):
Python version 3.9.12:
Code we can use to reproduce:

  import modin
  import modin.pandas as pd
  from distributed import Client
  
  import numpy as np
  
  
  if __name__ == '__main__':
      if pd.__name__ == 'modin.pandas':
          client = Client(n_workers=3)
          print(modin.__version__)

    df = pd.DataFrame(
        dict(
            a=[1, 2, None, None, None, ],
            b=(1, None, 3, 4, 5,),
        )
    )

    df.a.ffill(inplace=True)
    print(df)

    df['tr_id'] = 0

    df.tr_id = np.where(
        (df.b <= 4),
        3,
        None,
    )

    df.tr_id.ffill(inplace=True)

Describe the problem

I have the output:

0.15.0+7.g4ec7f634
UserWarning: Ray execution environment not yet initialized. Initializing...
To remove this warning, run the following python code before doing dataframe operations:

    import ray
    ray.init()

UserWarning: Distributing <class 'dict'> object. This may take some time.
     a    b
0  1.0  1.0
1  2.0  NaN
2  2.0  3.0
3  2.0  4.0
4  2.0  5.0
Traceback (most recent call last):
  File "d:\OD\OneDrive\Projects\Chud_Amaz\Soft_in_dev\moduled_way_OOP\modin_test_ffill.py", line 33, in <module>
    df.tr_id.ffill(inplace=True)
AttributeError: 'numpy.ndarray' object has no attribute 'ffill'

Problem

When new column was filled with np.where - method ffill does not work

Issue Analytics

State:
Created a year ago
Reactions:1
Comments:5 (3 by maintainers)

Top GitHub Comments

2reactions

mvashishthacommented, Jun 16, 2022

May beput resulting numpy in Series on right hand?

@VasilijKolomiets unfortunately it turns out that that won’t work in Modin either 😢 Modin assigns the right-hand side as-is to the attribute instead of pointing the attribute to the the new column.

import modin.pandas as pd

df = pd.DataFrame([[1]], columns=['col0'])
df.col0 = pd.Series([3])
df.iloc[0, 0] = 4
# BUG: col0 is unchanged!!!
assert df.col0.equals(df['col0'])

What you can do instead is use __setitem__ instead of __setattr__, e.g. df['tr_id'] = np.where(... instead of df.tr_id = np.where(.... This works:

import modin.pandas as pd

df = pd.DataFrame([[1]], columns=['col0'])
df['col0'] = pd.Series([3])
df.iloc[0, 0] = 4
assert df.col0.equals(df['col0'])

Meanwhile, @pyrito will work on a PR that should fix all the bugs identified here.

2reactions

mvashishthacommented, Jun 15, 2022

In the snippet I posted above, the Modin dataframe’s __setattr__ calls __setitem__ to modify the col0 column in place. It then calls object.__setattr__(self, key, value), which re-assigns the col0 property to exactly the value that was passed in, i.e. the list. I think a fairly simple fix would be to call object.__setattr__(self, key, self.__getitem__(key)) in the case here where we call __setitem__.

I can’t take this on right now, so I’ll leave it unassigned.

Top Results From Across the Web

'numpy.ndarray' object has no attribute 'fillna' - Stack Overflow

(M - 3) is getting interpreted as a numpy.ndarray . This implies that M is defined somewhere as a numpy.ndarray . Test it...

AttributeError: 'numpy.ndarray' object has no attribute 'columns'

The problem is that train_test_split(X, y, ...) returns numpy arrays and not pandas dataframes. Numpy arrays have no attribute named columns.

AttributeError: 'numpy.ndarray' object has no attribute 'columns ...

This looks amazing but I can't run it on my dataset. I get the following error: runfile('E:/Machine Learning Projects/ML ...

Python:AttributeError 'numpy.ndarray' object has no attribute ...

This is an error often encountered while doing data analysis such as machine learning using Python,Numpy,Pandas. I often forget, so I am writing ......

pandas.Series.shift — pandas 1.5.2 documentation

If freq is passed (in this case, the index must be date or datetime, or it will raise a NotImplementedError ), the index...