Memoryviews and compression
See original GitHub issueCurrently there are some issues with how we handle memoryviews.
- We assume that
len(mv) == mv.nbytes
in several places. This is not the case for non-trivial shape or itemsize - We slice into memoryviews in at least one place (see
distributed/protocol/utils.py
), which is also not correct for non-trivial shape or itemsize - To support these concerns the Numpy serialization code currently always produces memoryviews that have strides
(1,)
. This loses important information that stops intelligent compression.
I think that ideally we would propagate itemsize and strides information on memoryviews until after we pass through compression. Then we might consider flattening memoryviews before they enter the network layer (tornado.iostream.IOStream.write
) or perhaps earlier. https://stackoverflow.com/questions/44486048/how-to-flatten-a-memoryview
Currently things seem safe but inefficient when itemsize might be useful.
Issue Analytics
- State:
- Created 6 years ago
- Comments:19 (19 by maintainers)
Top Results From Across the Web
unnecessary copying of memoryview in gzip.GzipFile.write?
BufferedWriter. if isinstance(data, memoryview): data ... zlib.crc32 and zlib.compress seem to be able to deal with memoryviews so the only ...
Read more >Typed Memoryviews — Cython 3.0.0a11 documentation
Typed memoryviews allow efficient access to memory buffers, such as those underlying NumPy arrays, without incurring any Python overhead.
Read more >Cython: optimize native Python memoryview - Stack Overflow
I have a function (from an external Python library) that returns a memoryview object that I want to process in Cython.
Read more >Less copies in Python with the buffer protocol and memoryviews
memoryview objects allow Python code to access the internal data of an object that supports the buffer protocol without copying.
Read more >Using memoryviews - Panda3D Manual
array in Python, as well as numpy arrays, so it's possible to seamlessly pass data between these and memoryviews. A memoryview can be...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
This was helpful. Thank you. It looks like this problem only arises when we are moving enough data around.
On Wed, Jun 14, 2017 at 3:42 AM, Simon Perkins notifications@github.com wrote:
We now pass through dtype information. We don’t yet pass through strides. I’ll wait on strides until we have a concrete application that needs N-d compression. Closing