Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Copy Memory-mapped file into CuPy array

See original GitHub issue

Would it be possible to copy the output of mmap.mmap into a CuPy array of type void? I want to read in a binary file into the GPU.

What I can do now:

I can use the examples here and then use cp.asarray.

That’s not optimal, so I’l like to read the binary straight into GPU memory. 2. I’m able to do this with C++ here. But will require Cython binding and memory management handling for the rest of our CuPy functions.

Would it possible to implement something like #2 with CuPy.

Example 1. Copy data from file to mm as raw data. Then copy to CuPy array and cast?

with open(filename, "r+") as f:
    mm = mmap.mmap(
        f.fileno(),
        num_bytes,
        flags=mmap.MAP_PRIVATE,
        prot=mmap.PROT_READ,
    )

d_binary = cp.asarray(mm, dtype=cp.complex64)

Example 2. Copy data from file to mm as raw data. Then copy binary data to CuPy array? Once on the GPU, I can launch a kernel to reinterpret the data.

with open(filename, "r+") as f:
    mm = mmap.mmap(
        f.fileno(),
        num_bytes,
        flags=mmap.MAP_PRIVATE,
        prot=mmap.PROT_READ,
    )

d_binary = cp.asarray(mm)

Issue Analytics

State:
Created 3 years ago
Comments:17 (16 by maintainers)

Top GitHub Comments

4reactions

jakirkhamcommented, Jun 23, 2020

It might be worth exploring different mmap flags as well, @mnicely.

In particular there are some MAP_HUGE* flags, which use larger page sizes, which would allow the GPU to copy more data over to device at once and perform fewer copies for the same total amount of data. NumPy does something similar for memory it allocates, which winds up being pretty useful.

Another interesting option is MAP_LOCKED. This would allow one to page lock all of the memory, which effectively is like making the page size the entire block of memory and not allowing the system to unpage it. Though the man page suggests using mlock if one really wants to avoid page faults (which we would). I don’t see a Python implementation of this, but it should be accessible through ctypes or Cython.

Would pick either hugepages or page locking. I don’t think these would make sense together (though anyone please feel free to correct me if I’m wrong here).

Also make sure to unmap files once done with them (by using Python context managers or explicitly calling .close()). Otherwise large amounts of page cache will remain occupied and degrade overall program and/or system performance.

Python may not include all of these flags. So it is possible one would need to look in the corresponding Linux header and figure out the values for these flags. Don’t forget to bitwise-or flags since these are just being passed straight to C, which would interpret them that way.

2reactions

leofangcommented, Jun 12, 2020

That’s exactly right. Zero-copy is the most common reason to use mmap. Another way to wrap a mmap with a NumPy array is to do this:

mm = mmap.mmap(...)
arr = np.ndarray(..., buffer=mm, ...)

Top Results From Across the Web

numpy - How to use CUDA pinned "zero-copy" memory for a ...

So, now I am trying to figure out a way to use pinned 'zero-copy' memory to handle a memory mapped file which would...

cupy.load — CuPy 11.4.0 documentation

This function just calls numpy.load and then sends the arrays to the current device. NPZ file is converted to NpzFile object, which defers...

The mmap() copy-on-write trick: reducing memory usage of ...

Copying a NumPy array and modifying it doubles the memory usage. ... To use mmap() in this mode, we need a backing file....

How to use CUDA pinned “zero-copy” memory for a ... - Reddit

CuPy can't handle mmap memory. So, CuPy uses GPU memory directly in default. https://docs-cupy.chainer.org/en/stable/reference/generated/cupy.

chainer/develop-cupy - Gitter

All the functions have been implemented in cupy(or seems so from the docs). Sameroom ... In this case the memory mapped file was...

Troubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.

Start Free

Top Related Reddit Thread

No results found

Top Related Tweet

No results found

Top Related Dev.to Post

No results found

Copy Memory-mapped file into CuPy array

Issue Analytics

Top GitHub Comments

Top Results From Across the Web

Top Related Medium Post

Top Related StackOverflow Question

Troubleshoot Live Code

Top Related Reddit Thread

Top Related Hackernoon Post

Top Related Tweet

Top Related Dev.to Post

Top Related Hashnode Post

Reorganize CUB environment variables

Support Jitify ?