DataFrame.transform operates per-element when transform function is type-annotated
See original GitHub issueDataFrame.transform
is behaving differently than its pandas equivalent when its return type is annotated. Consider the example below.
>>> import databricks.koalas as ks
>>> import numpy as np
>>> import pandas as pd
>>>
>>> pdf = pd.DataFrame([[1, 2, 3, 4], [2, 3, 4, 5]], columns=("a", "b", "c", "d"))
>>> kdf = ks.DataFrame([[1, 2, 3, 4], [2, 3, 4, 5]], columns=("a", "b", "c", "d"))
>>>
>>> def normalize(v):
... return v / sum(v)
...
>>> pdf.transform(normalize)
a b c d
0 0.333333 0.4 0.428571 0.444444
1 0.666667 0.6 0.571429 0.555556
>>> kdf.transform(normalize)
a b c d
0 0.333333 0.4 0.428571 0.444444
1 0.666667 0.6 0.571429 0.555556
>>>
>>> def typed_normalize(v) -> ks.Series[np.float64]:
... return v / sum(v)
...
>>> pdf.transform(typed_normalize)
a b c d
0 0.333333 0.4 0.428571 0.444444
1 0.666667 0.6 0.571429 0.555556
>>> kdf.transform(typed_normalize)
a b c d
0 1.0 1.0 1.0 1.0
1 1.0 1.0 1.0 1.0
Koalas version: 0.24.0 Numpy version: 1.18.0 Pandas version: 0.25.3
Issue Analytics
- State:
- Created 4 years ago
- Comments:6 (3 by maintainers)
Top Results From Across the Web
pandas.DataFrame.transform — pandas 1.5.2 documentation
Function to use for transforming the data. If a function, must either work when passed a DataFrame or when passed to DataFrame.apply.
Read more >Difference between apply() and transform() in Pandas
(1) transform() works with function, a string function, a list of functions, and a dict. However, apply() is only allowed with function.
Read more >Understanding the Transform Function in Pandas
The transform function in pandas can be a useful tool for combining and analyzing data.
Read more >Python | Pandas DataFrame.transform - GeeksforGeeks
transform () function call func on self producing a DataFrame with transformed values and that has the same axis length as self. Syntax: ......
Read more >Pandas DataFrame: transform() function - w3resource
func. Function to use for transforming the data. If a function, must either work when passed a DataFrame or when passed to DataFrame.apply....
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Thanks for the explanation Hyukjin, I understand the reasoning better now. Looking forward to a revisit in the future, I think it could be a very user friendly feature!
If we think about this case only, I think it can make sense to pass Koalas series but if we think about other APIs like
groupby.apply()
, it’s difficult to pass Koalas series as is.Probably we should revisit about this later. For now, I am sure you can work around via just looping Series in the Frame.