question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Array functions and order preservation

See original GitHub issue

Arrays are an ordered data structure. For array functions that generate new arrays based on existing ones, the documentation isn’t clear on whether there’s any order that can be assumed for the result.

Functions of interest:

  • array_intersect
  • array_except
  • array_union
  • array_distinct

Based on my testing, order seems to be preserved. For the one or more input arrays being operated on, output ordering is based on the input arrays’ order, with the first array’s order having priority over the second’s, etc.

e.g., for array_intersect(array['a', 'b', 'c'], array['d', 'c', 'b']) the result is consistently ['b', 'c'], and not ['c', 'b']. Can this behavior be relied upon?

The following also lack any explicit statement in the docs, but it would be highly surprising if we couldn’t assume the order of the returned array is based on the input array(s):

  • array_remove
  • concat
  • combinations
  • filter

Issue Analytics

  • State:open
  • Created 2 years ago
  • Comments:5 (2 by maintainers)

github_iconTop GitHub Comments

1reaction
zeencommented, Apr 12, 2021

It’d be great if the docs explicitly stated that order is undefined.

If order is undefined, but we needed to preserve it in a query, then:

-- undefined order without dups:
array_intersect(a, b)
array_except(a, b)
array_union(a, b)
-- defined order with dups:
filter(a, x -> contains(array_intersect(a, b), x)
filter(concat(a, b), x -> contains(array_union(a, b), x)
concat(a, b)

To distinct results in an ordered manner, something like this would work:

reduce(
    a,
    null,
    (s, x) -> IF(
        s IS null,
        array[x],
        if(contains(s, x), s, s || x)
    ),
    (s) -> s
)

Unless the optimizer is sufficiently smart so as to not create a brand new array on every iteration of the reduce, the above might be O(N^2)…

The following might be more efficient for large arrays, but gets pretty verbose:

transform(
    array_sort(
        transform(
            map_entries(multimap_from_entries(
                zip(a, sequence(1, cardinality(a))) -- get (value, index) pairs
            )), -- group indices by value into (value, array[index1, index2, ...]) pairs
            x -> (array_min(x.field1), x.field0) -- turn into (lowest_index, value) pairs
        )
    ), -- sort by lowest_index
    x -> x.field1 -- extract value from tuple
)
0reactions
rongrongcommented, Apr 12, 2021

We’d appreciate it if you would like to contribute to a documentation clarification! Thanks!

Read more comments on GitHub >

github_iconTop Results From Across the Web

Does Pyspark `array` function preserve order? - Stack Overflow
The way you are creating array, it will not change Order. Order will be same in all case. It will keep same order...
Read more >
Array functions | BigQuery - Google Cloud
Subqueries are unordered, so the elements of the output ARRAY are not guaranteed to preserve any order in the source table for the...
Read more >
Array Functions - Manual - PHP
Ignores indexes of array. Example: array_diff, array_intersect. [prefix] u - will do comparison with user defined function. Letter u can be used twice...
Read more >
How to preserve the original order of elements in an unnested ...
WITH ORDINALITY in Postgres 9.4 or later. The query can now simply be: SELECT * FROM regexp_split_to_table('I think Postgres is nifty', ...
Read more >
Sorting and Related Functions - Julia Documentation
Sorting and Related Functions. Julia has an extensive, flexible API for sorting and interacting with already-sorted arrays of values.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found