Removing ufunc-broadcasting across record fields
See original GitHub issueCurrently, all ufuncs are broadcasted across all fields of a record:
>>> ak_array = ak.Array([{"x": 1, "y": 1.1}, {"x": 2, "y": 2.2}, {"x": 3, "y": 3.3}])
>>> ak_array
<Array [{x: 1, y: 1.1}, ... {x: 3, y: 3.3}] type='3 * {"x": int64, "y": float64}'>
>>> ak_array + 1
<Array [{x: 2, y: 2.1}, ... {x: 4, y: 4.3}] type='3 * {"x": int64, "y": float64}'>
This is causing some confusion because the fields of a record have qualitatively different meanings. Some are trigger booleans, some are momenta, some are ML-derived isolation variables, some are strings…
>>> ak.Array(["HAL"]) + 1 # should this even work?
<Array [[73, 66, 77]] type='1 * var * uint8'>
>>> [chr(x) for x in (ak.Array(["HAL"]) + 1)[0].tolist()]
['I', 'B', 'M']
Furthermore, when @henryiii is writing vector, he has to distinguish between LorentzVector + LorentzVector accidentally working because they’re Cartesian (but not preserving their Lorentzness) and getting the wrong answer because they’re not Cartesian. Even though the + behavior is defined, due to the fact that they are records, he has to be sure to override every case.
I think there would be fewer surprises for both users and developers if broadcasting a ufunc through a record were an error (withe a nice error message). Custom behaviors for specialized records, like LorentzVectors, would still be possible to define, as they are now, but instead of replacing wrong behavior, they’d be replacing no behavior.
Note that NumPy does not define such an operation on structured arrays:
>>> np_array = np.array([(1, 1.1), (2, 2.2), (3, 3.3)], [("x", int), ("y", float)])
>>> np_array
array([(1, 1.1), (2, 2.2), (3, 3.3)], dtype=[('x', '<i8'), ('y', '<f8')])
>>> np_array + 1
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: invalid type promotion
Although it does work for Pandas:
>>> df = pd.DataFrame({"x": [1, 2, 3], "y": [1.1, 2.2, 3.3]})
>>> df
x y
0 1 1.1
1 2 2.2
2 3 3.3
>>> df + 1
x y
0 2 2.1
1 3 3.2
2 4 4.3
it is not our intention to generalize from Pandas, only NumPy.
This would also affect ufuncs that return booleans, like comparison operators. For these, the argument isn’t as strong. Maybe we want
>>> ak_array > 1
<Array [{x: False, y: True}, ... y: True}] type='3 * {"x": bool, "y": bool}'>
to work, maybe we don’t.
I’m considering removing ufuncs-through-records for all ufuncs, without affecting the custom ufunc behavior that can be assigned to any record with a name. (I’m not considering ufuncs-on-strings right now, though that’s something to think about.) Does anyone have a strong argument about that?
(I suppose this needs a deprecation cycle, though it would be a little difficult getting a warning into the middle of the broadcast-and-apply. I’m tempted to remove it all at once, like a band-aid…)
Issue Analytics
- State:
- Created 3 years ago
- Comments:11 (11 by maintainers)

Top Related StackOverflow Question
I’m calling it a bug because I’ve just defined the current behavior as wrong. (Even though I’ve presented it in talks.)
(A motivator for being short with these things is that the list of issues is a lot longer than I thought, and I have only until December 1 to make this
awkward==1.0.0.)Looks like the correct behavior to me. 😃