question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Adding `.view(...)`/reinterpret `dtype` method

See original GitHub issue

In NumPy (and some other libraries) arrays have a method to view the data as another dtype. This is different from astype as this taking data that may not be typed like bytes or bytearray and applying different dtype metadata on top of it. As an example reinterpreting the data in this way can be useful particularly in distributed setting where the data goes through serialization/deserialization steps where metadata is extracted, sent along, and then reapply to the data. Though this can come up in other situations as well.

cc @rgommers @kgryte (since we discussed this briefly earlier)

Issue Analytics

  • State:open
  • Created 2 years ago
  • Comments:9 (9 by maintainers)

github_iconTop GitHub Comments

1reaction
rgommerscommented, Sep 16, 2021

Thanks, that is helpful. The “it has the wrong dtype” has come up in at least one other place I think, using DLPack to transfer bool arrays - those weren’t supported, so it was done as uint8.

I think the next step here is figure out how other array libraries do this (if they allow it).

1reaction
jakirkhamcommented, Sep 16, 2021

One could do np.asarray(memoryview(buf)).view(fmt) for example

Equivalent to np.asarray(memoryview(buf), dtype=fmt)?

Not if dtype=... means .astype(...). I think this gets back into our discussion earlier.

Maybe a short example helps? Imagine b is received over the wire along with relevant metadata. The data is three float32 numbers (IOW Out[3] is what we want).

In [1]: import numpy as np

In [2]: b = b"\x00\x00\x00\x00\x00\x00\x80?\x00\x00\x00@"

In [3]: np.asarray(memoryview(b)).view(np.float32)
Out[3]: array([0., 1., 2.], dtype=float32)

In [4]: np.asarray(memoryview(b), dtype=np.float32)
Out[4]: 
array([  0.,   0.,   0.,   0.,   0.,   0., 128.,  63.,   0.,   0.,   0.,   64.], dtype=float32)

I think I understand the use case, but there’s no way to get an array that’s untyped in the API, so the “reinterpret memory” use case seems quite niche.

In our usual case it is not so much that the data is untyped, but the type doesn’t necessarily match what it should. Taking the example above, we have…

In [6]: np.asarray(memoryview(b)).dtype
Out[6]: dtype('uint8')

IOW we often have something that is uint8 or int8.

And I expect that there will be libraries that don’t allow this kind of thing, because memory layout is an implementation detail not exposed to the user. So I’m leaning towards “out of scope” here.

For clarity, am not looking to manipulate the underlying memory in any way and don’t really care how it is represented. Am just trying to patch on the correct formatting. Another way to think of this would be altering the dtype DLPack might use. Suppose one could hack around with the DLPack representation before it goes through the protocol, but that feels a bit clumsy.

It seems like serialization falls under I/O, which is out of scope completely.

It is certainly useful in I/O contexts (communication, file I/O, etc.). Though am not really looking for the protocol to handle the I/O portion or even serialization. Just the ability to perform this cast.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Reinterpreting NumPy arrays as a different dtype
view method to create a view of the array with a different dtype . Does it in fact leave the data of the...
Read more >
numpy.ndarray.view — NumPy v1.24 Manual
view (dtype=some_dtype) constructs a view of the array's memory with a different data-type. This can cause a reinterpretation of the bytes of memory....
Read more >
[feature request] Reinterpret tensor as different dtype #29013
I'm back to thinking that modifying view is the simplest idea. If we added a new function, we'd basically be adding a third...
Read more >
Reinterpret PyTorch array as a different dtype
Dear all, I'm looking for a way of reinterpreting a PyTorch tensor as a ... Tensor(a.numpy().view(dtype=np.int32)) tensor([ 0., 1048576000., ...
Read more >
pandas.DataFrame.convert_dtypes
Changed in version 1.2: Starting with pandas 1.2, this method also converts float columns to the ... In the future, as new dtypes...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found