question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

DataFrame.transform operates per-element when transform function is type-annotated

See original GitHub issue

DataFrame.transform is behaving differently than its pandas equivalent when its return type is annotated. Consider the example below.

>>> import databricks.koalas as ks
>>> import numpy as np
>>> import pandas as pd
>>>
>>> pdf = pd.DataFrame([[1, 2, 3, 4], [2, 3, 4, 5]], columns=("a", "b", "c", "d"))
>>> kdf = ks.DataFrame([[1, 2, 3, 4], [2, 3, 4, 5]], columns=("a", "b", "c", "d"))
>>>
>>> def normalize(v):
...     return v / sum(v)
...
>>> pdf.transform(normalize)
          a    b         c         d
0  0.333333  0.4  0.428571  0.444444
1  0.666667  0.6  0.571429  0.555556
>>> kdf.transform(normalize)
          a    b         c         d                                            
0  0.333333  0.4  0.428571  0.444444
1  0.666667  0.6  0.571429  0.555556
>>>
>>> def typed_normalize(v) -> ks.Series[np.float64]:
...     return v / sum(v)
...
>>> pdf.transform(typed_normalize)
          a    b         c         d
0  0.333333  0.4  0.428571  0.444444
1  0.666667  0.6  0.571429  0.555556
>>> kdf.transform(typed_normalize)
     a    b    c    d                                                           
0  1.0  1.0  1.0  1.0
1  1.0  1.0  1.0  1.0

Koalas version: 0.24.0 Numpy version: 1.18.0 Pandas version: 0.25.3

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:6 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
chunyangcommented, Jan 9, 2020

Thanks for the explanation Hyukjin, I understand the reasoning better now. Looking forward to a revisit in the future, I think it could be a very user friendly feature!

1reaction
HyukjinKwoncommented, Jan 9, 2020

If we think about this case only, I think it can make sense to pass Koalas series but if we think about other APIs like groupby.apply(), it’s difficult to pass Koalas series as is.

Probably we should revisit about this later. For now, I am sure you can work around via just looping Series in the Frame.

Read more comments on GitHub >

github_iconTop Results From Across the Web

pandas.DataFrame.transform — pandas 1.5.2 documentation
Function to use for transforming the data. If a function, must either work when passed a DataFrame or when passed to DataFrame.apply.
Read more >
Difference between apply() and transform() in Pandas
(1) transform() works with function, a string function, a list of functions, and a dict. However, apply() is only allowed with function.
Read more >
Understanding the Transform Function in Pandas
The transform function in pandas can be a useful tool for combining and analyzing data.
Read more >
Python | Pandas DataFrame.transform - GeeksforGeeks
transform () function call func on self producing a DataFrame with transformed values and that has the same axis length as self. Syntax: ......
Read more >
Pandas DataFrame: transform() function - w3resource
func. Function to use for transforming the data. If a function, must either work when passed a DataFrame or when passed to DataFrame.apply....
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found