question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Removing ufunc-broadcasting across record fields

See original GitHub issue

Currently, all ufuncs are broadcasted across all fields of a record:

>>> ak_array = ak.Array([{"x": 1, "y": 1.1}, {"x": 2, "y": 2.2}, {"x": 3, "y": 3.3}])
>>> ak_array
<Array [{x: 1, y: 1.1}, ... {x: 3, y: 3.3}] type='3 * {"x": int64, "y": float64}'>

>>> ak_array + 1
<Array [{x: 2, y: 2.1}, ... {x: 4, y: 4.3}] type='3 * {"x": int64, "y": float64}'>

This is causing some confusion because the fields of a record have qualitatively different meanings. Some are trigger booleans, some are momenta, some are ML-derived isolation variables, some are strings…

>>> ak.Array(["HAL"]) + 1                      # should this even work?
<Array [[73, 66, 77]] type='1 * var * uint8'>

>>> [chr(x) for x in (ak.Array(["HAL"]) + 1)[0].tolist()]
['I', 'B', 'M']

Furthermore, when @henryiii is writing vector, he has to distinguish between LorentzVector + LorentzVector accidentally working because they’re Cartesian (but not preserving their Lorentzness) and getting the wrong answer because they’re not Cartesian. Even though the + behavior is defined, due to the fact that they are records, he has to be sure to override every case.

I think there would be fewer surprises for both users and developers if broadcasting a ufunc through a record were an error (withe a nice error message). Custom behaviors for specialized records, like LorentzVectors, would still be possible to define, as they are now, but instead of replacing wrong behavior, they’d be replacing no behavior.

Note that NumPy does not define such an operation on structured arrays:

>>> np_array = np.array([(1, 1.1), (2, 2.2), (3, 3.3)], [("x", int), ("y", float)])
>>> np_array
array([(1, 1.1), (2, 2.2), (3, 3.3)], dtype=[('x', '<i8'), ('y', '<f8')])

>>> np_array + 1
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: invalid type promotion

Although it does work for Pandas:

>>> df = pd.DataFrame({"x": [1, 2, 3], "y": [1.1, 2.2, 3.3]})
>>> df
   x    y
0  1  1.1
1  2  2.2
2  3  3.3
>>> df + 1
   x    y
0  2  2.1
1  3  3.2
2  4  4.3

it is not our intention to generalize from Pandas, only NumPy.

This would also affect ufuncs that return booleans, like comparison operators. For these, the argument isn’t as strong. Maybe we want

>>> ak_array > 1
<Array [{x: False, y: True}, ... y: True}] type='3 * {"x": bool, "y": bool}'>

to work, maybe we don’t.

I’m considering removing ufuncs-through-records for all ufuncs, without affecting the custom ufunc behavior that can be assigned to any record with a name. (I’m not considering ufuncs-on-strings right now, though that’s something to think about.) Does anyone have a strong argument about that?

(I suppose this needs a deprecation cycle, though it would be a little difficult getting a warning into the middle of the broadcast-and-apply. I’m tempted to remove it all at once, like a band-aid…)

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:11 (11 by maintainers)

github_iconTop GitHub Comments

1reaction
jpivarskicommented, Oct 30, 2020

I’m calling it a bug because I’ve just defined the current behavior as wrong. (Even though I’ve presented it in talks.)

(A motivator for being short with these things is that the list of issues is a lot longer than I thought, and I have only until December 1 to make this awkward==1.0.0.)

0reactions
lgraycommented, Nov 3, 2020

Looks like the correct behavior to me. 😃

Read more comments on GitHub >

github_iconTop Results From Across the Web

Removing ufunc-broadcasting across record fields · Issue #457
Yeah - I agree that a good policy is to not allow broadcasting ufuncs through records unless that ufunc has been defined for...
Read more >
Universal functions (ufunc) — NumPy v1.24 Manual
Accepts a boolean array which is broadcast together with the operands. ... A single axis over which a generalized ufunc should operate.
Read more >
NumPy manual contents
Basic Iteration · Iterating over all but one axis · Iterating over multiple arrays · Broadcasting over multiple arrays.
Read more >
ak.Array — Awkward Array 2.0.0 documentation
If the Awkward Array has only one “branch” of nested lists (i.e. different record fields do not have different-length lists, but a single...
Read more >
Release Notes — numba 0.14.0 documentation
(Note: Improper alignment of dtype fields will cause an exception to be raised.) ... Add support for ufunc array broadcasting; Add support for...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found