question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

ENH: Reduce overhead of configurable data allocation strategy (NEP49)

See original GitHub issue

Proposed new feature or change:

In NEP49 a configurable allocator has been introduced in numpy (implemented in https://github.com/numpy/numpy/pull/17582). This mechanism introduces some overhead for operations on small arrays and scalars. A benchmark with np.sqrt shows that the overhead can be in the 5-10% range.

Benchmark details

We compare fast_handler_test_compare (numpy main with two performance related PRs included) with fast_handler_test (the same, but with hard-coded allocator)

Benchmark

import numpy as np
import math
import time
from numpy import sqrt
print(np.__version__)

w=np.float64(1.1)
wf=1.1
array=np.random.rand(2)

niter=1_200_000

for kk in range(3):
    t0=time.perf_counter()
    for ii in range(niter):
        _=sqrt(w)
    dt=time.perf_counter()-t0
    t0=time.perf_counter()
    for ii in range(niter):
        _=sqrt(wf)
    dt2=time.perf_counter()-t0
    t0=time.perf_counter()
    for ii in range(niter):
        _=sqrt(array)
    dt3=time.perf_counter()-t0
    print(f'loop {kk}: {dt} {dt2} {dt3}')

Results of fast_handler_test_compare

1.23.0.dev0+1185.gf16125e86
loop 0: 0.7580233269982273 0.7543466200004332 0.5045701469971391
loop 1: 0.7591422369987413 0.7547550320014125 0.5020621660005418
loop 2: 0.7476994270000432 0.7537849910004297 0.5018936799970106

Results of fast_handler_test_compare (allocator overhead removed)

1.23.0.dev0+1186.gbb76538a1
loop 0: 0.6839246829986223 0.6962255100006587 0.4676538419989811
loop 1: 0.6820040509992396 0.6967140100023244 0.468011064996972
loop 2: 0.6811004699993646 0.6971791299984034 0.4678809920005733

The allocator is retrieved for every numpy array or scalar constructed, which matters for small arrays and scalars. The overhead is in two ways:

  1. In methods like PyDataMem_UserNEW the allocator is retrieved via a PyCapsule which performs some run-time checks
  2. In PyDataMem_GetHandler there is a call to PyContextVar_Get which is expensive.

The first item can be addressed by replacing the attribute PyObject *mem_handler in PyArrayObject_fields (which is currently a PyCapsule) by a PyDataMem_Handler*. (unless this is exposed to the public API)

About the second item: the PyContextVar_Get calls _PyThreadState_GET internally. So perhaps the allocator can depend on the thread? Maybe we can introduce a mechanism that skips this part if there is only a single allocator (e.g. when PyDataMem_SetHandler has never been called).

@mattip As the author of NEP49, can you comment on this?

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:5 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
mattipcommented, May 11, 2022
  1. The capsule is exposed via PyDataMem_GetHandler and PyDataMem_SetHandler. We could contrive a way to reduce the overhead, at the expense of making the code more complicated.

  2. The need for PyContextVar_Get was discussed on the mailing list and summarized in this comment to the PR.

0reactions
sebergcommented, Jun 11, 2022

Going to close this issue for now, seems we have settled on not worrying about this for now. If anyone ever comes back here even though its closed, maybe that will be a reason to reconsider 😉.

Read more comments on GitHub >

github_iconTop Results From Across the Web

NEP 49 — Data allocation strategies - NumPy
This NEP proposes a mechanism to override the memory management strategy used for ndarray->data with user-provided alternatives. This allocation ...
Read more >
Allocation Strategies for Data-Oriented Architectures
In this thesis, we develop novel allocation strategies for data-oriented systems based on graph partitioning algorithms. Therefore, we show that systems ...
Read more >
New England Power Company - Mass.gov
Important extension or reduction of transmission or distribution system: State territory added or relinquished and date operations.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found