Support for Aggregations on Array Items
See original GitHub issueUse case: I’d like to be able to map functions (sum(), avg(), etc) across array values from multiple rows and return the results in the same order.
Feature description: There doesn’t seem to be a performant way to use aggregate functions (example: sum()) across an array field across each individual index from multiple rows while returning the results in the same order. I’ve tried playing with unnest and array_agg, but it jumbles all the rows together and seems like performance would suck if you had larger arrays. (1,000+ items in each array per row)
Example table:
event_time | int_array_col
2021-10-19 18:12:26 | [1, 0, 1, 4, 2, 1, 0, 0, 1]
2021-10-19 18:22:41 | [1, 1, 1, 1, 0, 0, 0, 0, 1]
2021-10-19 18:32:13 | [0, 0, 0, 1, 1, 1, 0, 0, 1]
2021-10-19 18:43:17 | [2, 2, 2, 2, 0, 0, 0, 0, 1]
Desired query: SELECT array_map(sum(), int_array_col) from doc.my_table;
Output:
[4, 3, 4, 8, 3, 2, 0, 0, 4]
Issue Analytics
- State:
- Created 2 years ago
- Comments:7 (7 by maintainers)
Top Results From Across the Web
Elasticsearch aggregation on array items - Stack Overflow
Create buckets using terms aggregation for each fieldId , in your case we'll get two of them: 1 and 2 . Run terms...
Read more >Working with Arrays in the Aggregation Pipeline | Studio 3T
When building an aggregate statement, you might need to work with one or more array fields and their individual elements. To help with...
Read more >Using aggregation functions with arrays - Amazon Athena
To add values within an array, use SUM , as in the following example. To aggregate multiple rows within an array, use array_agg...
Read more >Documentation: 9.5: Aggregate Functions - PostgreSQL
Aggregate functions compute a single result from a set of input values. ... array_agg(expression), any array type, same as argument data type, input...
Read more >AGGREGATE function - Microsoft Support
The AGGREGATE function can apply different aggregate functions to a list or database with the option to ignore hidden rows and error values....
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
If it isn’t too many rows an UDF might be a (temporary) solution:
I suppose. Quick test with 1 mio rows of 10 values takes a few seconds already. With 100.000 it is still instant.
Not aware of any such functionality in place yet.