DataFrame.GroupBy.apply()
See original GitHub issueI’m continuously missing this behavior when converting my pandas scripts to koalas:
df = pd.DataFrame({"timestamp":[0.0, 0.5, 1.0, 0.0, 0.5],
"car_id": ['A','A','A','B','B'],
"battery_charge": [100, 90, 80,100,90]
})
print(df)
def calc_battery_usage(x):
start_bat = x.sort_values('timestamp').iloc[0]['battery_charge']
stop_bat = x.sort_values('timestamp').iloc[-1]['battery_charge']
return start_bat - stop_bat
print(df.groupby('car_id').apply(calc_battery_usage))
That prints out:
timestamp car_id battery_charge
0 0.0 A 100
1 0.5 A 90
2 1.0 A 80
3 0.0 B 100
4 0.5 B 90
car_id
A 20
B 10
On koalas I just get:
PandasNotImplementedError: The method `pd.groupby.GroupBy.apply()` is not implemented yet.
I can take over and start implementing it myself, but would probably need some general guidelines on how to do it (probably using pysparks pandas_udf).
Issue Analytics
- State:
- Created 4 years ago
- Comments:12 (11 by maintainers)
Top Results From Across the Web
pandas.core.groupby.GroupBy.apply
Apply function func group-wise and combine the results together. The function passed to apply must take a dataframe as its first argument and...
Read more >Apply function to pandas groupby - python - Stack Overflow
The .agg() method here takes a function that is applied to all values of the groupby object. Share.
Read more >How to Apply Function to Pandas Groupby - Statology
This tutorial explains how to use the groupby() and apply() functions together in pandas, including an example.
Read more >pandas: Advanced groupby(), apply() and MultiIndex
pandas : Advanced groupby(), apply() and MultiIndex. Series.apply(): apply a function call across a vector. The function is called with each value in...
Read more >Apply Operations To Groups In Pandas - GeeksforGeeks
Often data analysis requires data to be broken into groups to perform various operations on these groups. The GroupBy function in Pandas employs ......
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Hi @patryk-oleniuk! I think it’d great to have groupby apply in koalas.
Spark’s groupby apply and Pandas’s groupby apply are pretty similar. However, two main difference that I can think of are:
As a start, I’d probably try this something like:
And just wrap it with Spark’s groupby apply + pandas_udf
Does that make sense?
Thank you guys for the replies. I will surely check out all of the suggestions by tomorrow evening…!!