Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Chunk/memory management for CuPy-backed arrays

See original GitHub issue

What happened: I’m trying to understand what the best approach to chunking is for doing dask operations on CuPy-backed arrays, and whether or not the current behaviour is the expected behaviour or a bug. The calculation below is much slower on the GPU than on the CPU.

To test this you’ll need a GPU and an appropriate installation of CuPy. I am using a GTX 1080TI with 11 GB of memory and CuPy 9.0.0.

The following is a simple example where I try to use the GPU to sum a larger-than-memory array, with a chunk size of 1 GB:

import cupy as cp
import numpy as np
import dask.array as da

huge_array = da.ones(
    (5000, 5000, 200), 
    chunks=(5000, 5000, 5), 
    dtype=float)

huge_array.nbytes / 1e9 # 40 GB in size

np.prod(huge_array.chunksize, dtype=float) * huge_array.dtype.itemsize / 1e9 # chunk size of 1 GB

huge_array = huge_array.map_blocks(cp.asarray) # make it a CuPy-backed array

array_sum = da.sum(huge_array)
array_sum.compute()

Upon compute, I get the following warning after a little while: C:\Users\thomasaar\Miniconda3\envs\gpu2\lib\site-packages\cupy\_creation\from_data.py:66: PerformanceWarning: Using synchronous transfer as pinned memory (1000000000 bytes) could not be allocated. This generally occurs because of insufficient host memory. The original error was: cudaErrorMemoryAllocation: out of memory return _core.array(a, dtype, False, order)

What you expected to happen: I expected that the above example would run without memory errors. The chunks of the ones array should be discardable after summing, IMO, meaning that we shouldn’t end up in a situation where it can’t allocate the 1GB complained about in the warning above.

Some thoughts Does dask create the ones chunks on the GPU, or create it on the CPU first and then copy to the GPU? Could it be either of these that is the slow step? Or is CuPy not freeing up memory when it should?

Environment:

Dask version: ‘2021.04.1’
Python version: 3.9.4
Operating System: Windows 10
Install method (conda, pip, source): conda for all packages.

Issue Analytics

State:
Created 2 years ago
Comments:7 (5 by maintainers)

Top GitHub Comments

1reaction

pentschevcommented, Sep 21, 2021

To add to @quasiben 's answer, we have a setup_memory_pool function in Dask-CUDA benchmarks to do exactly that.

1reaction

quasibencommented, Sep 21, 2021

@astrophysaxist unfortunately, we don’t really have a good example

It can be controlled with a combination of Dask and CuPy: https://docs.dask.org/en/latest/configuration-reference.html#rmm +

client.run(cupy.cuda.set_allocator, rmm.rmm_cupy_allocator)

Or with Dask-CUDA directly:

from dask_cuda import LocalCUDACluster
cluster = LocalCUDACluster(rmm_pool_size="10GB", device_memory_limit=..., )

Can I ask you to file an issue on Dask-CUDA issue tracker: https://github.com/rapidsai/dask-cuda/

There we can find the way to best document how to use RMM with Dask

Top Results From Across the Web

Chapter 6 Memory management

C provides 4 functions for dynamic memory allocation: malloc, which takes an integer size, in bytes, and returns a pointer to a newly-allocated...

Array memory management

I am doing Multiprocessor programming using C. One requirement for us it that, we cannot keep allocating small chunks of memory. Memory can...

Dynamic Memory Management

Look at current chunk. Next chunk in memory == next chunk in list =>. Remove both chunks from list. Coalesce. Insert chunk into...

Contiguous chunks of memory

Arrays /strings: pointers and memory allocation ... claims that C-style arrays/strings required for efficiency) ... size of chunk can be set at.

Memory Management, C++ FAQ

But you cannot allocate an object with malloc() and free it using delete . Nor can you allocate with new and delete with...

Troubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.

Start Free

Top Related Reddit Thread

No results found

Top Related Tweet

No results found

Top Related Dev.to Post

No results found

Chunk/memory management for CuPy-backed arrays

Issue Analytics

Top GitHub Comments

Top Results From Across the Web

Top Related Medium Post

Top Related StackOverflow Question

Troubleshoot Live Code

Top Related Reddit Thread

Top Related Hackernoon Post

Top Related Tweet

Top Related Dev.to Post

Top Related Hashnode Post

Issue with Dask Distributed

latest dask `map_partitions` doesn't pass list as expected.