Aggregate strings on a groupby operation
See original GitHub issueHi.
I need to group a dataframe and aggregate the strings as a list or a set. I have many string columns in the dataframe. How can I do that in vaex? This is the functionality I’m looking for:
d
A B
0 1 This
1 2 is
2 3 a
3 4 random
4 1 string
5 2 !
d.groupby('A')['B'].apply(list)
A
1 [This, string]
2 [is, !]
3 [a]
4 [random]
dtype: object
Thanks!
Issue Analytics
- State:
- Created 3 years ago
- Reactions:2
- Comments:18 (8 by maintainers)
Top Results From Across the Web
Concatenate strings from several rows using Pandas groupby
The aggregation functionality provided by the agg() function allows multiple statistics to be calculated per group in one calculation. df.groupby(['name' ...
Read more >Pandas: How to Concatenate Strings from Using GroupBy
This tutorial explains how to concatenate strings in a pandas DataFrame from using a GroupBy.
Read more >Concatenate strings from several rows using Pandas groupby
Group the data using Dataframe.groupby() method whose attributes you need to concatenate. · Concatenate the string by using the join function and ...
Read more >pandas.core.groupby.DataFrameGroupBy.agg
Aggregate using callable, string, dict, or list of string/callables ... If a function, must either work when passed a DataFrame or when passed...
Read more >How to concatenate text as aggregation in a Pandas groupby
Aggregating values in a groupby is fairly easy. But did you know you can also concatenate strings as aggregation?
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Is this use case supported at this point? I don’t really see how to use AggregatorDescriptorMulti to do this task.
I understand but first step is creating the “dict” object right? Then the aggregation begins. Unless you already have data in that form (via pyarrow structs).
But please create a new issue on this, since it is going away from the original topic of this thread.