question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Cython and memviews creation

See original GitHub issue

Not an issue, just a Cython-related PSA that we need to keep in mind when reviewing PRs:

We shouldn’t create 1d views for each sample, this is slow:

cdef float X[:, :] = ...  # big 2d view
for i in range(n_samples):  # same with prange, same with or without the GIL
	f(X[i])

do this instead, or use pointers, at least for now:

for i in range(n_samples):
	f(X, i)  # and work on X[i, :] in f's code

This is valid for any pattern that generates lots of views so looping over features might not be a good idea either if we expect lots of features. There might be a “fix” in https://github.com/cython/cython/issues/2227 / cython/cython#3617

The reason is that there’s a significant overhead when creating all these 1d views, which comes from Cython internal ref-counting (details at https://github.com/cython/cython/issues/2987). In the hist-GBDT prediction code, this overhead amounts for more than 30% of the runtime so it’s not negligible.

Note that:

CC @scikit-learn/core-devs

Issue Analytics

  • State:open
  • Created 3 years ago
  • Reactions:9
  • Comments:5 (4 by maintainers)

github_iconTop GitHub Comments

2reactions
da-woodscommented, Jun 6, 2020

FYI I don’t think https://github.com/cython/cython/pull/3617 will really help with speed here - it just makes

for xi in X:
 f(xi)

equivalent (speed-wise) to

for i in range(n_samples):
 f(X[i])

You’d still be better off using your second version if you really need the best performance.

1reaction
glemaitrecommented, May 27, 2020

Shall we pin this issue (at least to easily come back to the discussion)?

Read more comments on GitHub >

github_iconTop Results From Across the Web

Typed Memoryviews — Cython 3.0.0a11 documentation
A memoryview can be used in any context (function parameters, module-level, cdef class attribute, etc) and can be obtained from nearly any object...
Read more >
Cython typed memoryviews: what they really are?
If the data is owned by a Python object then memview holds a reference to that and ensures the Python object that holds...
Read more >
Memoryview Benchmarks - Pythonic Perambulations
Cython + memviews (no slicing): 2.45 ms. So what have we learned here? First of all, typed memoryviews are fast. Blazing fast. If...
Read more >
creating a memoryview from scratch
You can use a cython array, e.g.. from cython cimport view my_array = view.array(shape=(10, 2), ... cdef int[:,:] memview = <int[:m,:n]> pointer
Read more >
Dynamic arrays: allocate memory
Dynamic arrays: allocate memory · Accept either via memory view · Creating dynamic arrays with Cython itself · Bonus: memview also works with...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found