Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

BUG: Large overhead in some random functions

See original GitHub issue

This issue is extracted from gitter (by @andyfaff – I saw it only randomly, so creating this before we forget).

Some random generator functions such as random() have a huge overhead compared to their legacy RandomState versions:

rs = np.random.RandomState()
rg = np.random.default_rng()

%timeit rs.random()                                       
# 432 ns ± 9.18 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

%timeit rg.random()                                       
# 5.19 µs ± 61.6 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

The reason for this is the support of the dtype= keyword argument. A secondary reason (and maybe second speed issue) is that np.dtype.name is a very slow operations (it could plausibly be cached).

However, the solution here will be to simply avoid the whole np.dtype(dtype).name construct as much as possible and maybe adding a fastpath for the case when dtype is not used. np.dtype(dtype).type is np.float64 may be a solution, or dtype is np.float64 or np.dtype(dtype) is np.float64 to speed up the default branch.

Checking that we have benchmarks – or adding simple small array ones in a first commit – for this would be good when this is fixed.

Issue Analytics

State:
Created 4 years ago
Comments:6 (6 by maintainers)

Top GitHub Comments

1reaction

eric-wiesercommented, Jan 28, 2020

avoid the whole np.dtype(dtype).name construct as much as possible

Agreed, hopefully the slowness is enough to motivate people not to use it 😉

0reactions

przembcommented, Feb 5, 2020

@mattip Thank you so much 😃 I tried to follow mentioned comments, in case of any potential improvements please let me know. PR: https://github.com/numpy/numpy/pull/15511

Top Results From Across the Web

102037: CPU overhead from inlists much larger in 8.0.22

Bug #102037, CPU overhead from inlists much larger in 8.0.22 ... The "random-points" test uses a large inlist with 100 and 1000 entries ......

Minimizing overhead with parallel functions in R [closed]

To limit overhead for moderately large N, it is almost always better to use mc.preschedule = TRUE (i.e. split the work in as...

Overhead (computing) - Wikipedia

In computer science, overhead is any combination of excess or indirect computation time, memory, bandwidth, or other resources that are required to perform ......

GWP-ASan: Sampling heap memory error detection in-the-wild

The allocator limits itself to a fixed amount of memory to control memory overhead and samples allocation to the debug allocator to reduce...

Finding and Understanding Bugs in C Compilers - CS @ Utah

generates programs that cover a large subset of C while avoiding the ... generating the types of parameters to a new function). Random....