question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Memoryviews and compression

See original GitHub issue

Currently there are some issues with how we handle memoryviews.

  1. We assume that len(mv) == mv.nbytes in several places. This is not the case for non-trivial shape or itemsize
  2. We slice into memoryviews in at least one place (see distributed/protocol/utils.py), which is also not correct for non-trivial shape or itemsize
  3. To support these concerns the Numpy serialization code currently always produces memoryviews that have strides (1,). This loses important information that stops intelligent compression.

I think that ideally we would propagate itemsize and strides information on memoryviews until after we pass through compression. Then we might consider flattening memoryviews before they enter the network layer (tornado.iostream.IOStream.write) or perhaps earlier. https://stackoverflow.com/questions/44486048/how-to-flatten-a-memoryview

Currently things seem safe but inefficient when itemsize might be useful.

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Comments:19 (19 by maintainers)

github_iconTop GitHub Comments

1reaction
mrocklincommented, Jun 14, 2017

This was helpful. Thank you. It looks like this problem only arises when we are moving enough data around.

On Wed, Jun 14, 2017 at 3:42 AM, Simon Perkins notifications@github.com wrote:

Note that there are some arrays with zero dimensions in them, but commenting them out did not prevent the problem occurring.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/dask/distributed/issues/1159#issuecomment-308347825, or mute the thread https://github.com/notifications/unsubscribe-auth/AASszGPDibfurpBLqDaZA7ZcMq-dHWYmks5sD47mgaJpZM4N2c4H .

0reactions
mrocklincommented, Jun 16, 2017

We now pass through dtype information. We don’t yet pass through strides. I’ll wait on strides until we have a concrete application that needs N-d compression. Closing

Read more comments on GitHub >

github_iconTop Results From Across the Web

unnecessary copying of memoryview in gzip.GzipFile.write?
BufferedWriter. if isinstance(data, memoryview): data ... zlib.crc32 and zlib.compress seem to be able to deal with memoryviews so the only ...
Read more >
Typed Memoryviews — Cython 3.0.0a11 documentation
Typed memoryviews allow efficient access to memory buffers, such as those underlying NumPy arrays, without incurring any Python overhead.
Read more >
Cython: optimize native Python memoryview - Stack Overflow
I have a function (from an external Python library) that returns a memoryview object that I want to process in Cython.
Read more >
Less copies in Python with the buffer protocol and memoryviews
memoryview objects allow Python code to access the internal data of an object that supports the buffer protocol without copying.
Read more >
Using memoryviews - Panda3D Manual
array in Python, as well as numpy arrays, so it's possible to seamlessly pass data between these and memoryviews. A memoryview can be...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found