Performance of ufuncs
See original GitHub issueRaised by @tamasgal in https://github.com/scikit-hep/uproot4/issues/90#issuecomment-689459702:
Another problem, which I will post in a future issue that working with these arrays is far from the numpy performance (doing things like
arr > 0.5takes ~2ms for 100 entries, while in numpy/Julia/C it should be more around a few hundred ns), but that’s another story.
Original dataset: http://131.188.167.67:8889/doubly_jagged.root
import uproot4
import awkward1 as ak
trks = uproot4.open("uproot-issue-90.root:E/Evt/trks")
array = trks["trks.rec_stages"].array()
ak.to_parquet(array, "awkward-issue-442.parquet")
array
# <Array [[[1, 2, 5, 3, 5, 4], ... 1], [1], [1]]] type='145028 * var * var * int64'>
As Parquet (faster to download and read into Awkward): https://drive.google.com/file/d/1JbiFaBaouH_amUxvGnsSHegYAQjRTJ8u/view?usp=sharing As Pickle (larger, but retains structure: Parquet adds option-type, which complicates the performance analysis): https://drive.google.com/file/d/1KnYebahkvLK29ZggISGROHxjcpCUdO0H/view?usp=sharing
The basic idea of performance in Awkward Array is that we don’t worry about the constant-time metadata manipulation but should worry about the linear-time scaling. In particular, computing array > 3 of a doubly jagged array is pure Python: it unwraps the doubly jagged structure and calls NumPy’s own np.greater on the inner flat content.
Because of the constant-time unwrapping, it shouldn’t be surprising that the Awkward version doesn’t start scaling until the array is at least 1000 entries or so. What is surprising is that the linear scaling for Awkward doesn’t line up with the linear scaling for NumPy, because in theory, it isn’t doing anything other than calling NumPy.

So this is a quandary.
Issue Analytics
- State:
- Created 3 years ago
- Comments:5 (2 by maintainers)

Top Related StackOverflow Question
Looking at your issue, I don’t see how they’re related. They might be related. I need to do a quick profile of this because there’s a lot of linear time here that isn’t due to NumPy, and I can’t imagine what it might be.
Awesome, thanks for both the implementation and the fix!
…and of course for the detailed description of the whole problem