Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Convenience function to turn an Awkward Array into a NumPy array in anyway that it can

See original GitHub issue

Currently it seems a bit cumbersome to create a contiguous numpy array (after padding and filling - e.g. for input into ML models) from records with fields of different numeric types (e.g. int and float or float and double). I’m looking for a similar behaviour like .values or .to_numpy() in pandas:

>>> df = pd.DataFrame({"a" : [1, 2, 3], "b" : [1.1, 2.2, 3.3]})
>>> df.dtypes
a      int64
b    float64
dtype: object
>>> df.to_numpy()
array([[1. , 1.1],
       [2. , 2.2],
       [3. , 3.3]])
>>> df.to_numpy().dtype
dtype('float64')`

There are two obstacles when trying this with awkward:

When i call ak.fill_none this will result in a union type that can’t be converted to numpy e.g.

>>> import awkward1 as ak
>>> array = ak.zip({"a" : [[1, 2], [], [3, 4, 5]], "b" : [[1.1, 2.2], [], [3.3, 4.4, 5.5]]})
>>> ak.fill_none(ak.pad_none(array, 2, clip=True), 0)
<Array [[{a: 1, b: 1.1}, ... a: 4, b: 4.4}]] type='3 * 2 * union[{"a": int64, "b...'>
>>> padded = ak.fill_none(ak.pad_none(array, 2, clip=True), 0)
>>> padded
<Array [[{a: 1, b: 1.1}, ... a: 4, b: 4.4}]] type='3 * 2 * union[{"a": int64, "b...'>
>>> ak.type(padded)
3 * 2 * union[{"a": int64, "b": float64}, int64]

When i have a record that can be converted to numpy it will result in a structured numpy array which i will still have to cast to a consistent dtype for many ML applications

I believe @nsmith- also ran into this when trying to show the padding and filling features of awkward in his tutorial on NanoEvents yesterday.

Not sure how to best implement convenience functions for this, but maybe one could add extra options to ak.fill_none and ak.to_numpy roughly like the following (+figure out how to deal with nested records)

def new_fill_none(array, value, cast_value=False, **kwargs):
    if cast_value and len(ak.keys(array)) > 0:
        # having this as a fill value won't result in a union array
        value = {k : value for k in ak.keys(array)}
    return ak.fill_none(array, value, **kwargs)

def new_to_numpy(array, consistent_dtype=None, **kwargs):
    np_array = ak.to_numpy(array, **kwargs)
    if consistent_dtype is not None:
        if len(ak.keys(array)) == 0:
            raise ValueError("Can't use `consistent_dtype` when array has no fields")
        np_array = np_array.astype(
            [(k, consistent_dtype) for k in ak.keys(array)], copy=False
        ).view((consistent_dtype, len(ak.keys(array))))
    return np_array

>>> import awkward1 as ak
>>> array = ak.zip({"a" : [[1, 2], [], [3, 4, 5]], "b" : [[1.1, 2.2], [], [3.3, 4.4, 5.5]]})
>>> new_to_numpy(new_fill_none(ak.pad_none(array, 2, clip=True), 0, cast_value=True), consistent_dtype="float64")
array([[[1. , 1.1],
        [2. , 2.2]],

       [[0. , 0. ],
        [0. , 0. ]],

       [[3. , 3.3],
        [4. , 4.4]]])

Issue Analytics

State:
Created 3 years ago
Comments:5 (5 by maintainers)

Top GitHub Comments

1reaction

jpivarskicommented, Jul 20, 2020

Isn’t the fact that
In [9]: ak.fill_none(ak.pad_none(array.a, 2, clip=True), 0.)
Out[9]: <Array [[1, 2], [0, 0], [3, 4]] type='3 * 2 * float64'>
casts the integers in array.a into floats a bug?

@nsmith- No, that’s intentional:

>>> ak.fill_none(ak.Array([1, 2, None, 4]), 3)
<Array [1, 2, 3, 4] type='4 * int64'>
>>> ak.fill_none(ak.Array([1, 2, None, 4]), 3.0)
<Array [1, 2, 3, 4] type='4 * float64'>

What’s happening here is that Nones are first replaced by a temporary UnionArray that combines whatever is in the array with whatever the replacement value is: union[int64, int64] and union[int64, float64] in the two cases above. Then we attempt to simplify the temporary UnionArray. Unions of two numeric types can be unified to a numeric type, which is the broadest of the numeric choices: int64 and float64 in the two cases above. It is equivalent to the type unification that NumPy performs when concatenating:

>>> np.concatenate([np.array([1, 2, 3]), np.array([4])])
array([1, 2, 3, 4])
>>> np.concatenate([np.array([1, 2, 3]), np.array([4.0])])
array([1., 2., 3., 4.])

(In fact, ak.concatenate calls does this through a UnionArray simplify, too. The PR #337 that you motivated by finding NumPy dtype bugs ensures that we now use exactly the same unification rules as NumPy.)

In @nikoladze’s case, the UnionArray of records and numbers (zero) could not be simplified.

0reactions

jpivarskicommented, Dec 12, 2022

In case you’re wondering what all of this is about, I’m going through all of our open issues from oldest to newest to decide what should be done with them, post-2.0.

In this case, @nikoladze’s array can be converted to NumPy if you pay attention to all the details of which axis needs to be padded and with some numeric fill value (i.e. don’t try to fill missing records with a number). There ought to be a function to make some reasonable choices (apply standardized rules) to turn anything rectilinear with a given fill value that is by default 0. Maybe another function argument to choose between clipping to the smallest list length versus padding to the longest (the latter is the default).

The point of this is to remember that sometimes, we don’t care about structure and don’t want to think about it: we just want a NumPy array somehow. This would be a good function to develop with ak.transform; the hardest part might be naming it…

Top Results From Across the Web

How to convert to/from NumPy - Awkward Array

The function for NumPy → Awkward conversion is ak.from_numpy() . np_array = np.array([1.1, 2.2, 3.3, 4.4, 5.5, 6.6, 7.7, 8.8, 9.9]) np_array.

ak.Array — Awkward Array 2.0.0 documentation

Arrays can be used in Numba: they can be passed as arguments to a Numba-compiled function or returned as return values. The only...

Building arrays of a specified dtype #328 - scikit-hep/awkward

I think this is a good solution. Using Numba to make pieces of an Awkward array or indexes to slice an Awkward array...

Reshape Array in Array in Array - python - Stack Overflow

The problem is that the lists have different lengths. NumPy doesn't have any functions that will help us here because it deals entirely...

Scikit-HEP/awkward-array - Gitter

NumPy matrix multiplication treats the left and right hand sides as individual matrices: >>> np.zeros((5, 2)) @ np.zeros((2 ...

Troubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.

Start Free

Top Related Reddit Thread

No results found

Top Related Tweet

No results found

Top Related Dev.to Post

No results found

Convenience function to turn an Awkward Array into a NumPy array in anyway that it can

Issue Analytics

Top GitHub Comments

Top Results From Across the Web

Top Related Medium Post

Top Related StackOverflow Question

Troubleshoot Live Code

Top Related Reddit Thread

Top Related Hackernoon Post

Top Related Tweet

Top Related Dev.to Post

Top Related Hashnode Post

Should Awkward Arrays be usable as Pandas columns?

Add a "path" to ak.with_field and array["outer", "inner", "new_field"] = new_field syntax to ak.Array/Record.