Convenience function to turn an Awkward Array into a NumPy array in anyway that it can
See original GitHub issueCurrently it seems a bit cumbersome to create a contiguous numpy array (after padding and filling - e.g. for input into ML models) from records with fields of different numeric types (e.g. int and float or float and double). I’m looking for a similar behaviour like .values or .to_numpy() in pandas:
>>> df = pd.DataFrame({"a" : [1, 2, 3], "b" : [1.1, 2.2, 3.3]})
>>> df.dtypes
a int64
b float64
dtype: object
>>> df.to_numpy()
array([[1. , 1.1],
[2. , 2.2],
[3. , 3.3]])
>>> df.to_numpy().dtype
dtype('float64')`
There are two obstacles when trying this with awkward:
- When i call
ak.fill_nonethis will result in a union type that can’t be converted to numpy e.g.
>>> import awkward1 as ak
>>> array = ak.zip({"a" : [[1, 2], [], [3, 4, 5]], "b" : [[1.1, 2.2], [], [3.3, 4.4, 5.5]]})
>>> ak.fill_none(ak.pad_none(array, 2, clip=True), 0)
<Array [[{a: 1, b: 1.1}, ... a: 4, b: 4.4}]] type='3 * 2 * union[{"a": int64, "b...'>
>>> padded = ak.fill_none(ak.pad_none(array, 2, clip=True), 0)
>>> padded
<Array [[{a: 1, b: 1.1}, ... a: 4, b: 4.4}]] type='3 * 2 * union[{"a": int64, "b...'>
>>> ak.type(padded)
3 * 2 * union[{"a": int64, "b": float64}, int64]
- When i have a record that can be converted to numpy it will result in a structured numpy array which i will still have to cast to a consistent dtype for many ML applications
I believe @nsmith- also ran into this when trying to show the padding and filling features of awkward in his tutorial on NanoEvents yesterday.
Not sure how to best implement convenience functions for this, but maybe one could add extra options to ak.fill_none and ak.to_numpy roughly like the following (+figure out how to deal with nested records)
def new_fill_none(array, value, cast_value=False, **kwargs):
if cast_value and len(ak.keys(array)) > 0:
# having this as a fill value won't result in a union array
value = {k : value for k in ak.keys(array)}
return ak.fill_none(array, value, **kwargs)
def new_to_numpy(array, consistent_dtype=None, **kwargs):
np_array = ak.to_numpy(array, **kwargs)
if consistent_dtype is not None:
if len(ak.keys(array)) == 0:
raise ValueError("Can't use `consistent_dtype` when array has no fields")
np_array = np_array.astype(
[(k, consistent_dtype) for k in ak.keys(array)], copy=False
).view((consistent_dtype, len(ak.keys(array))))
return np_array
>>> import awkward1 as ak
>>> array = ak.zip({"a" : [[1, 2], [], [3, 4, 5]], "b" : [[1.1, 2.2], [], [3.3, 4.4, 5.5]]})
>>> new_to_numpy(new_fill_none(ak.pad_none(array, 2, clip=True), 0, cast_value=True), consistent_dtype="float64")
array([[[1. , 1.1],
[2. , 2.2]],
[[0. , 0. ],
[0. , 0. ]],
[[3. , 3.3],
[4. , 4.4]]])
Issue Analytics
- State:
- Created 3 years ago
- Comments:5 (5 by maintainers)
Top Results From Across the Web
How to convert to/from NumPy - Awkward Array
The function for NumPy → Awkward conversion is ak.from_numpy() . np_array = np.array([1.1, 2.2, 3.3, 4.4, 5.5, 6.6, 7.7, 8.8, 9.9]) np_array.
Read more >ak.Array — Awkward Array 2.0.0 documentation
Arrays can be used in Numba: they can be passed as arguments to a Numba-compiled function or returned as return values. The only...
Read more >Building arrays of a specified dtype #328 - scikit-hep/awkward
I think this is a good solution. Using Numba to make pieces of an Awkward array or indexes to slice an Awkward array...
Read more >Reshape Array in Array in Array - python - Stack Overflow
The problem is that the lists have different lengths. NumPy doesn't have any functions that will help us here because it deals entirely...
Read more >Scikit-HEP/awkward-array - Gitter
NumPy matrix multiplication treats the left and right hand sides as individual matrices: >>> np.zeros((5, 2)) @ np.zeros((2 ...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found

@nsmith- No, that’s intentional:
What’s happening here is that Nones are first replaced by a temporary UnionArray that combines whatever is in the array with whatever the replacement value is:
union[int64, int64]andunion[int64, float64]in the two cases above. Then we attempt tosimplifythe temporary UnionArray. Unions of two numeric types can be unified to a numeric type, which is the broadest of the numeric choices:int64andfloat64in the two cases above. It is equivalent to the type unification that NumPy performs when concatenating:(In fact, ak.concatenate calls does this through a UnionArray
simplify, too. The PR #337 that you motivated by finding NumPy dtype bugs ensures that we now use exactly the same unification rules as NumPy.)In @nikoladze’s case, the UnionArray of records and numbers (zero) could not be
simplified.In case you’re wondering what all of this is about, I’m going through all of our open issues from oldest to newest to decide what should be done with them, post-2.0.
In this case, @nikoladze’s array can be converted to NumPy if you pay attention to all the details of which
axisneeds to be padded and with some numeric fill value (i.e. don’t try to fill missing records with a number). There ought to be a function to make some reasonable choices (apply standardized rules) to turn anything rectilinear with a given fill value that is by default0. Maybe another function argument to choose between clipping to the smallest list length versus padding to the longest (the latter is the default).The point of this is to remember that sometimes, we don’t care about structure and don’t want to think about it: we just want a NumPy array somehow. This would be a good function to develop with
ak.transform; the hardest part might be naming it…