Update setting data pointers for Cython 3
See original GitHub issueNeed to update the following locations
- _libs/window/aggregations.pyx:
bufarr.data
- _libs/reduction.pyx:
chunk.data
just started failing.
cc @pandas-dev/pandas-core @pandas-dev/pandas-triage
if anyone has insights
Issue Analytics
- State:
- Created 3 years ago
- Comments:34 (33 by maintainers)
Top Results From Across the Web
Language Basics — Cython 3.0.0a11 documentation
The C code uses a variable which is a pointer to a structure of the specific type, something like struct MyExtensionTypeObject* . Here...
Read more >Cython modifiy pointer from c++ - python - Stack Overflow
This is a basic C pointer passing problem. str and s in Python both point to the same place, but they are different...
Read more >Speedup your existing Python project with Cython +30x
Step 3: Create the setup file .py and point to the .pyx file; Step 4: Go to the setup.py directory and run the...
Read more >Best Practices for passing numpy data pointer to C ?
to cython-users. Hi folks, We need to be able to pass the data pointer from a numpy array to C -- so that...
Read more >Accelerating Python on GPUs with nvc++ and Cython
This is because the GPU can only access data that is allocated in code compiled by nvc++ and the -stdpar option. In this...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Well, “rip it all out” is one way to fix it !
I spent some more time looking at what this is doing today and came up with the following notes (I apologize if I am saying really obvious things)
The
Slider
andBlockSlider
classes are implementing views into a numpy array by:The mutated array is then used to update “cached objects” in their calling class by updating the pandas side block manager details. I suspect that this is the source of the stats model issues mentioned above as the code is aggressively changing things underneath the eventual user-exposed objects.
The change that has broken things is than cython now disallows relpacing the guts of a numpy array (which seem fair!). My guess is that the performance gains come from both not memory thrashing and not falling back to the python layer. The cython docs says that when you do
[]
on a numpy array it falls back to python (I assume because the inputs are too variable) which is probably the source of the major performance regressions.I am not super clear how the numpy nbiter interface works, but it looks like it is focused on getting an iterator over single elements, or at least fixed steps through the array, where as for this code we need iteration over variable size windows.
It looks like the way to do this with memory views ( https://cython.readthedocs.io/en/latest/src/userguide/numpy_tutorial.html#efficient-indexing-with-memoryviews ) but those seem to require knowing what the type is up front.
My suspicion is that the right solution here is to do something like what @mattip suggested above and in
def move
use the pointers we have to the underlying data and fabricate new numpy arrays of just the sub-section that is needed.These classes appear to only be used internally to the reduction module so I do not think there are any back-compatibility with completely re-writing them.
attn @scoder for guidance on which of these methods (or one I do not see) is the best path.