Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Feature request numpy.unique axis=0

See original GitHub issue

Feature request

I want to go from

a = np.array([[1, 1, 1, 0, 0, 0],
              [0, 1, 1, 1, 0, 0],
              [0, 1, 1, 1, 0, 0],
              [1, 1, 1, 0, 0, 0],
              [1, 1, 1, 1, 1, 0]])

array([[1, 1, 1, 0, 0, 0],
       [0, 1, 1, 1, 0, 0],
       [1, 1, 1, 1, 1, 0]])

Using numpy I would use

unique_rows = np.unique(a, axis=0)

however, the axis parameter is not supported yet. I tried some workarounds based on this SO question, but I havn’t figured out a sollution yet.

Issue Analytics

State:
Created 2 years ago
Comments:8 (5 by maintainers)

Top GitHub Comments

3reactions

rishi-kulkarnicommented, Dec 22, 2021

I wrote this a while back, which gets the job done.


@nb.jit(nopython=True, cache=True)
def nb_unique(input_data, axis=0):
    """2D np.unique(a, return_index=True, return_counts=True)
    
    Parameters
    ----------
    input_data : 2D numeric array
    axis : int, optional
        axis along which to identify unique slices, by default 0
    Returns
    -------
    2D array
        unique rows (or columns) from the input array
    1D array of ints
        indices of unique rows (or columns) in input array
    1D array of ints
        number of instances of each unique row
    """

    # don't want to sort original data
    if axis == 1:
        data = input_data.T.copy()

    else:
        data = input_data.copy()

    # so we can remember the original indexes of each row
    orig_idx = np.array([i for i in range(data.shape[0])])

    # sort our data AND the original indexes
    for i in range(data.shape[1] - 1, -1, -1):
        sorter = data[:, i].argsort(kind="mergesort")

        # mergesort to keep associations
        data = data[sorter]
        orig_idx = orig_idx[sorter]
    # get original indexes
    idx = [0]

    if data.shape[1] > 1:
        bool_idx = ~np.all((data[:-1] == data[1:]), axis=1)
        additional_uniques = np.nonzero(bool_idx)[0] + 1

    else:
        additional_uniques = np.nonzero(~(data[:-1] == data[1:]))[0] + 1

    idx = np.append(idx, additional_uniques)
    # get counts for each unique row
    counts = np.append(idx[1:], data.shape[0])
    counts = counts - idx
    return data[idx], orig_idx[idx], counts

Someday, when I figure out how to avoid writing 8 different implementations for various combinations of np.uniques returns, I’ll contribute this back into numba.

1reaction

gmarkallcommented, Dec 16, 2021

Many thanks for the request! A runnable MWR to test for the feature in future is:

import numpy as np
from numba import njit

a = np.array([[1, 1, 1, 0, 0, 0],
              [0, 1, 1, 1, 0, 0],
              [0, 1, 1, 1, 0, 0],
              [1, 1, 1, 0, 0, 0],
              [1, 1, 1, 1, 1, 0]])

@njit
def f(x):
    return np.unique(a, axis=0)

print(f(a))

which presently gives:

numba.core.errors.TypingError: Failed in nopython mode pipeline (step: nopython frontend)
No implementation of function Function(<function unique at 0x7fb93b057dc0>) found for signature:
 
 >>> unique(readonly array(int64, 2d, C), axis=Literal[int](0))
 
There are 2 candidate implementations:
  - Of which 2 did not match due to:
  Overload in function 'np_unique': File: numba/np/arrayobj.py: Line 2088.
    With argument(s): '(readonly array(int64, 2d, C), axis=int64)':
   Rejected as the implementation raised a specific error:
     TypingError: got an unexpected keyword argument 'axis'
  raised from /home/gmarkall/numbadev/numba/numba/core/typing/templates.py:791

During: resolving callee type: Function(<function unique at 0x7fb93b057dc0>)
During: typing of call at /home/gmarkall/numbadev/issues/7663/repro.py (13)


File "repro.py", line 13:
def f(x):
    return np.unique(a, axis=0)
    ^

Top Results From Across the Web

Numpy unique 2D sub-array [duplicate] - Stack Overflow

This is a new feature in the upcoming 1.13, as np.unique(a, axis=0) . You could simply copy the new implementation and use it...

numpy.unique — NumPy v1.25.dev0 Manual

Returns the sorted unique elements of an array. There are three optional outputs in addition to the unique elements: the indices of the...

Numpy Unique, Explained - Sharp Sight

Now that we have our array, let's get the unique rows and unique columns. To get the unique rows, we set axis =...

np.unique along a axis numpy Code Example - Code Grepper

As of NumPy 1.13, one can simply choose the axis for selection of unique values ... can do: import numpy as np unique_rows...

Supported NumPy features - Numba

NumPy arrays provide an efficient storage method for homogeneous sets of data. NumPy dtypes provide type information useful when compiling, and the regular, ......