question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Performance of ufuncs

See original GitHub issue

Raised by @tamasgal in https://github.com/scikit-hep/uproot4/issues/90#issuecomment-689459702:

Another problem, which I will post in a future issue that working with these arrays is far from the numpy performance (doing things like arr > 0.5 takes ~2ms for 100 entries, while in numpy/Julia/C it should be more around a few hundred ns), but that’s another story.

Original dataset: http://131.188.167.67:8889/doubly_jagged.root

import uproot4
import awkward1 as ak
trks = uproot4.open("uproot-issue-90.root:E/Evt/trks")
array = trks["trks.rec_stages"].array()
ak.to_parquet(array, "awkward-issue-442.parquet")
array
# <Array [[[1, 2, 5, 3, 5, 4], ... 1], [1], [1]]] type='145028 * var * var * int64'>

As Parquet (faster to download and read into Awkward): https://drive.google.com/file/d/1JbiFaBaouH_amUxvGnsSHegYAQjRTJ8u/view?usp=sharing As Pickle (larger, but retains structure: Parquet adds option-type, which complicates the performance analysis): https://drive.google.com/file/d/1KnYebahkvLK29ZggISGROHxjcpCUdO0H/view?usp=sharing

The basic idea of performance in Awkward Array is that we don’t worry about the constant-time metadata manipulation but should worry about the linear-time scaling. In particular, computing array > 3 of a doubly jagged array is pure Python: it unwraps the doubly jagged structure and calls NumPy’s own np.greater on the inner flat content.

Because of the constant-time unwrapping, it shouldn’t be surprising that the Awkward version doesn’t start scaling until the array is at least 1000 entries or so. What is surprising is that the linear scaling for Awkward doesn’t line up with the linear scaling for NumPy, because in theory, it isn’t doing anything other than calling NumPy.

quick-plot

So this is a quandary.

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:5 (2 by maintainers)

github_iconTop GitHub Comments

1reaction
jpivarskicommented, Sep 9, 2020

Looking at your issue, I don’t see how they’re related. They might be related. I need to do a quick profile of this because there’s a lot of linear time here that isn’t due to NumPy, and I can’t imagine what it might be.

0reactions
tamasgalcommented, Sep 9, 2020

Awesome, thanks for both the implementation and the fix!

…and of course for the detailed description of the whole problem

Read more comments on GitHub >

github_iconTop Results From Across the Web

Performance of ufuncs · Issue #442 · scikit-hep/awkward
The basic idea of performance in Awkward Array is that we don't worry about the constant-time metadata manipulation but should worry about the ......
Read more >
NumPy ufuncs — The Magic Behind Vectorized Functions
NumPy powers the performance, under the hood, of many daily ... Learn about NumPy universal functions (ufuncs) and how to create them.
Read more >
Custom ufuncs – Performant Numpy - GitHub Pages
We have seen that Numpy provides a lot of operations written in compiled languages that we can use to escape from the performance...
Read more >
Optimizing Iterator/UFunc Performance — NumPy v1.14 Manual
In order to get the best performance from UFunc calls, the pattern of memory reads should be as regular as possible.
Read more >
NumPy ufuncs are 2x faster in one axis over the other
I was doing some computation, and measured the performance of ufuncs like np.cumsum over different axes, to make the code more performant.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found