result_type behaviour in apply function is different from Pandas
See original GitHub issueSystem information
- OS Platform and Distribution: Linux, Ubuntu 20.04
- Modin version: Latest development master branch.
- Python version: 3.8.10
- Code we can use to reproduce:
import pandas
import modin.pandas as pd
import numpy as np
import ray
ray.init()
data = np.random.randint(0, 5, size=(5, 10))
df_modin = pd.DataFrame(data)
df_pandas = pandas.DataFrame(data)
df_new = df_modin.apply(np.square, result_type="reduce")
df_new2 = df_pandas.apply(np.square, result_type="reduce")
print(df_new)
print(type(df_new))
print(df_new2)
print(type(df_new2))
Describe the problem
The result_type = "reduce"
argument for a function that returns a dataframe, e.g., np.square, doesn’t have any effect in Pandas. However in Modin, it returns entire resulting dataframe as a Series.
Issue Analytics
- State:
- Created a year ago
- Comments:5 (5 by maintainers)
Top Results From Across the Web
Difference between map, applymap and apply methods in ...
apply () method applies functions along an axis, either column-wise or row-wise. When we create a function to use with df.
Read more >pandas.DataFrame.apply — pandas 1.5.2 documentation
Apply a function along an axis of the DataFrame. ... By default ( result_type=None ), the final return type is inferred from the...
Read more >Pandas DataFrame apply() Examples - DigitalOcean
Pandas DataFrame apply () function is used to apply a function along an axis of the DataFrame. The function syntax is: def apply(...
Read more >Why pandas apply method is slow, and how Terality ...
While processing data with pandas, it is quite common to perform a user-defined function on every row of a DataFrame.
Read more >Pandas DataFrame: apply() function - w3resource
By default (result_type=None), the final return type is inferred from the return type of the applied function. Otherwise, it depends on the ...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Let’s put numpy universal functions like
np.sqrt
aside because they seem to have special behavior withresult_type
(not sure whether it’s a bug): https://github.com/pandas-dev/pandas/issues/49190for code below dataframe is
Here’s my understanding so far:
Pandas behavior
When axis = 0
When result_type=None
When result_type=reduce
Every kind of function I can think of returns a series (including every one above). Note I have already excluded the numpy universal functions, which seem to be the only exception.
When result_type=expand
Every kind of function I can think of behaves the exact same way as when
result_type
isNone
. I filed https://github.com/pandas-dev/pandas/issues/49196 for this.When result_type=broadcast
result is always a dataframe, as documentation says
When axis = 1
result_type='reduce'
has no effect (see https://github.com/pandas-dev/pandas/issues/49188).result_type=broadcast
seems to be same as axis=0 (see also https://github.com/pandas-dev/pandas/issues/49188).result_type='expand'
does seem to have an effect.Conclusion: what to do in Modin
Also cc @dchigarev who wrote the most recent version of apply result type inference.