question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Feature request numpy.unique axis=0

See original GitHub issue

Feature request

I want to go from

a = np.array([[1, 1, 1, 0, 0, 0],
              [0, 1, 1, 1, 0, 0],
              [0, 1, 1, 1, 0, 0],
              [1, 1, 1, 0, 0, 0],
              [1, 1, 1, 1, 1, 0]])

to

array([[1, 1, 1, 0, 0, 0],
       [0, 1, 1, 1, 0, 0],
       [1, 1, 1, 1, 1, 0]])

Using numpy I would use

unique_rows = np.unique(a, axis=0)

however, the axis parameter is not supported yet. I tried some workarounds based on this SO question, but I havn’t figured out a sollution yet.

Issue Analytics

  • State:open
  • Created 2 years ago
  • Comments:8 (5 by maintainers)

github_iconTop GitHub Comments

3reactions
rishi-kulkarnicommented, Dec 22, 2021

I wrote this a while back, which gets the job done.


@nb.jit(nopython=True, cache=True)
def nb_unique(input_data, axis=0):
    """2D np.unique(a, return_index=True, return_counts=True)
    
    Parameters
    ----------
    input_data : 2D numeric array
    axis : int, optional
        axis along which to identify unique slices, by default 0
    Returns
    -------
    2D array
        unique rows (or columns) from the input array
    1D array of ints
        indices of unique rows (or columns) in input array
    1D array of ints
        number of instances of each unique row
    """

    # don't want to sort original data
    if axis == 1:
        data = input_data.T.copy()

    else:
        data = input_data.copy()

    # so we can remember the original indexes of each row
    orig_idx = np.array([i for i in range(data.shape[0])])

    # sort our data AND the original indexes
    for i in range(data.shape[1] - 1, -1, -1):
        sorter = data[:, i].argsort(kind="mergesort")

        # mergesort to keep associations
        data = data[sorter]
        orig_idx = orig_idx[sorter]
    # get original indexes
    idx = [0]

    if data.shape[1] > 1:
        bool_idx = ~np.all((data[:-1] == data[1:]), axis=1)
        additional_uniques = np.nonzero(bool_idx)[0] + 1

    else:
        additional_uniques = np.nonzero(~(data[:-1] == data[1:]))[0] + 1

    idx = np.append(idx, additional_uniques)
    # get counts for each unique row
    counts = np.append(idx[1:], data.shape[0])
    counts = counts - idx
    return data[idx], orig_idx[idx], counts

Someday, when I figure out how to avoid writing 8 different implementations for various combinations of np.uniques returns, I’ll contribute this back into numba.

1reaction
gmarkallcommented, Dec 16, 2021

Many thanks for the request! A runnable MWR to test for the feature in future is:

import numpy as np
from numba import njit

a = np.array([[1, 1, 1, 0, 0, 0],
              [0, 1, 1, 1, 0, 0],
              [0, 1, 1, 1, 0, 0],
              [1, 1, 1, 0, 0, 0],
              [1, 1, 1, 1, 1, 0]])

@njit
def f(x):
    return np.unique(a, axis=0)

print(f(a))

which presently gives:

numba.core.errors.TypingError: Failed in nopython mode pipeline (step: nopython frontend)
No implementation of function Function(<function unique at 0x7fb93b057dc0>) found for signature:
 
 >>> unique(readonly array(int64, 2d, C), axis=Literal[int](0))
 
There are 2 candidate implementations:
  - Of which 2 did not match due to:
  Overload in function 'np_unique': File: numba/np/arrayobj.py: Line 2088.
    With argument(s): '(readonly array(int64, 2d, C), axis=int64)':
   Rejected as the implementation raised a specific error:
     TypingError: got an unexpected keyword argument 'axis'
  raised from /home/gmarkall/numbadev/numba/numba/core/typing/templates.py:791

During: resolving callee type: Function(<function unique at 0x7fb93b057dc0>)
During: typing of call at /home/gmarkall/numbadev/issues/7663/repro.py (13)


File "repro.py", line 13:
def f(x):
    return np.unique(a, axis=0)
    ^
Read more comments on GitHub >

github_iconTop Results From Across the Web

Numpy unique 2D sub-array [duplicate] - Stack Overflow
This is a new feature in the upcoming 1.13, as np.unique(a, axis=0) . You could simply copy the new implementation and use it...
Read more >
numpy.unique — NumPy v1.25.dev0 Manual
Returns the sorted unique elements of an array. There are three optional outputs in addition to the unique elements: the indices of the...
Read more >
Numpy Unique, Explained - Sharp Sight
Now that we have our array, let's get the unique rows and unique columns. To get the unique rows, we set axis =...
Read more >
np.unique along a axis numpy Code Example - Code Grepper
As of NumPy 1.13, one can simply choose the axis for selection of unique values ... can do: import numpy as np unique_rows...
Read more >
Supported NumPy features - Numba
NumPy arrays provide an efficient storage method for homogeneous sets of data. NumPy dtypes provide type information useful when compiling, and the regular, ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found