prange generates unnecessary python interactions when indexing a 2d view
See original GitHub issueIndexing a 2d view in a prange
will generate some unnecessary python interactions related to the GIL.
Here is a reproducing example. LMK if I can help with anything.
#cython: boundscheck=False
#cython: language_level=3
from cython.parallel import prange
# Indexing a 2d view in a prange loop: generates lots of python interactions.
def sum_2d_a(float[:, :] view_2d):
cdef:
float out = 0
int row_idx = 0
for row_idx in prange(view_2d.shape[0], nogil=True):
out += sum_1d_a(view_2d[row_idx, :])
# lots of python interactions at the end of the loop
return out
cdef float sum_1d_a(float[:] view_1d) nogil:
return 3.4 # irrelevant code
# workaround: pass the whole 2d view and the row index. No interactions in this case.
def sum_2d_b(float[:, :] view_2d):
cdef:
float out = 0
int row_idx = 0
for row_idx in prange(view_2d.shape[0], nogil=True):
out += sum_1d_b(view_2d, row_idx)
# no python interaction
return out
cdef float sum_1d_b(float[:, :] view_2d, int row_idx) nogil:
return 3.4 # irrelevant code
version : 0.29.6
Issue Analytics
- State:
- Created 4 years ago
- Comments:29 (27 by maintainers)
Top Results From Across the Web
Cython for NumPy users
They can be indexed by C integers, thus allowing fast access to the NumPy array data. Here is how to declare a memoryview...
Read more >Advanced Indexing - Python Like You Mean It
Accessing the contents of an array via advanced indexing always returns a copy of those contents, whereas basic indexing returns a view.
Read more >Before we start - Introduction to Working with MRI Data in Python
Jupyter Notebook is a great tool to code in and interact with Python. ... Let's download some example DICOM data to see what...
Read more >Fast, optimized 'for' pixel loops with OpenCV and Python
Learn how construct fast and efficient 'for' loops and loop over all pixels in an image using Python, Cython, and OpenCV.
Read more >schrodinger.application.matsci.reordergui — Schrödinger Python ...
SListWidget): """ A QListWidget that interacts with a structure picture to ... :param bool generate: Whether to regenerate the 2D picture """ if...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
I can well imagine that that’s a use case for something like scikit-learn, thanks for the insights.
Atomic ints are not costly per se, they are often implemented in hardware and thus fairly cheap in comparison to many other forms of locking mechanisms. But they do represent a form of locking, and that’s still not entirely for free, and (as with all such mechanisms) gets worse under congestion. With very small work packets like in your case, they’re probably getting close to saturation.
I’m wondering, #2227 has been pending for a while now. Would it be a good idea to tweak the semantics of for-loop iteration by axis to return borrowed slices, with their lifetime bound to the loop iteration? I think that would also help your use case.
yes, I’ll keep an eye on #2227 and the PR
Thanks a lot for your time and help @scoder