question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Support for Aggregations on Array Items

See original GitHub issue

Use case: I’d like to be able to map functions (sum(), avg(), etc) across array values from multiple rows and return the results in the same order.

Feature description: There doesn’t seem to be a performant way to use aggregate functions (example: sum()) across an array field across each individual index from multiple rows while returning the results in the same order. I’ve tried playing with unnest and array_agg, but it jumbles all the rows together and seems like performance would suck if you had larger arrays. (1,000+ items in each array per row)

Example table:

event_time          | int_array_col
2021-10-19 18:12:26 | [1, 0, 1, 4, 2, 1, 0, 0, 1]
2021-10-19 18:22:41 | [1, 1, 1, 1, 0, 0, 0, 0, 1]
2021-10-19 18:32:13 | [0, 0, 0, 1, 1, 1, 0, 0, 1]
2021-10-19 18:43:17 | [2, 2, 2, 2, 0, 0, 0, 0, 1]

Desired query: SELECT array_map(sum(), int_array_col) from doc.my_table;

Output:

[4, 3, 4, 8, 3, 2, 0, 0, 4]

Issue Analytics

  • State:open
  • Created 2 years ago
  • Comments:7 (7 by maintainers)

github_iconTop GitHub Comments

2reactions
proddatacommented, Oct 21, 2021

If it isn’t too many rows an UDF might be a (temporary) solution:

cr> CREATE TABLE arr_sum (arr ARRAY(INTEGER));                                                                                                                                                                                

cr> INSERT INTO arr_sum VALUES ([1,2,3]),([0,1,2]);      
CREATE OR REPLACE FUNCTION udf_arr_sum(arr ARRAY(ARRAY(INTEGER)))  
        RETURNS ARRAY(INTEGER)  
        LANGUAGE JAVASCRIPT  
        AS  
        'function udf_arr_sum(arr){  
            let res = [... arr[0]]; 
     
            for(let i = 1; i < arr.length; i++){ 
                for(let j = 0; j < res.length; j++){ 
                    res[j] = res[j] + arr[i][j]; 
                } 
            } 
          return res; 
          }'  
         ;                                                                                                                                                                                                                    
CREATE OK, 1 row affected  (0.127 sec)
 SELECT udf_arr_sum(array_agg(arr)) FROM arr_sum;                                                                                                                                                                          
+---------------------------------+
| doc.udf_arr_sum(array_agg(arr)) |
+---------------------------------+
| [1, 3, 5]                       |
+---------------------------------+ 
0reactions
proddatacommented, Oct 27, 2021

@proddata Would this still work at 100,000 rows?

I suppose. Quick test with 1 mio rows of 10 values takes a few seconds already. With 100.000 it is still instant.

Is there any way to profile UDF functions to know how much memory they’re using?

Not aware of any such functionality in place yet.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Elasticsearch aggregation on array items - Stack Overflow
Create buckets using terms aggregation for each fieldId , in your case we'll get two of them: 1 and 2 . Run terms...
Read more >
Working with Arrays in the Aggregation Pipeline | Studio 3T
When building an aggregate statement, you might need to work with one or more array fields and their individual elements. To help with...
Read more >
Using aggregation functions with arrays - Amazon Athena
To add values within an array, use SUM , as in the following example. To aggregate multiple rows within an array, use array_agg...
Read more >
Documentation: 9.5: Aggregate Functions - PostgreSQL
Aggregate functions compute a single result from a set of input values. ... array_agg(expression), any array type, same as argument data type, input...
Read more >
AGGREGATE function - Microsoft Support
The AGGREGATE function can apply different aggregate functions to a list or database with the option to ignore hidden rows and error values....
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found