question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Chunk/memory management for CuPy-backed arrays

See original GitHub issue

What happened: I’m trying to understand what the best approach to chunking is for doing dask operations on CuPy-backed arrays, and whether or not the current behaviour is the expected behaviour or a bug. The calculation below is much slower on the GPU than on the CPU.

To test this you’ll need a GPU and an appropriate installation of CuPy. I am using a GTX 1080TI with 11 GB of memory and CuPy 9.0.0.

The following is a simple example where I try to use the GPU to sum a larger-than-memory array, with a chunk size of 1 GB:

import cupy as cp
import numpy as np
import dask.array as da

huge_array = da.ones(
    (5000, 5000, 200), 
    chunks=(5000, 5000, 5), 
    dtype=float)

huge_array.nbytes / 1e9 # 40 GB in size

np.prod(huge_array.chunksize, dtype=float) * huge_array.dtype.itemsize / 1e9 # chunk size of 1 GB

huge_array = huge_array.map_blocks(cp.asarray) # make it a CuPy-backed array

array_sum = da.sum(huge_array)
array_sum.compute()

Upon compute, I get the following warning after a little while: C:\Users\thomasaar\Miniconda3\envs\gpu2\lib\site-packages\cupy\_creation\from_data.py:66: PerformanceWarning: Using synchronous transfer as pinned memory (1000000000 bytes) could not be allocated. This generally occurs because of insufficient host memory. The original error was: cudaErrorMemoryAllocation: out of memory return _core.array(a, dtype, False, order)

What you expected to happen: I expected that the above example would run without memory errors. The chunks of the ones array should be discardable after summing, IMO, meaning that we shouldn’t end up in a situation where it can’t allocate the 1GB complained about in the warning above.

Some thoughts Does dask create the ones chunks on the GPU, or create it on the CPU first and then copy to the GPU? Could it be either of these that is the slow step? Or is CuPy not freeing up memory when it should?

Environment:

  • Dask version: ‘2021.04.1’
  • Python version: 3.9.4
  • Operating System: Windows 10
  • Install method (conda, pip, source): conda for all packages.

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:7 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
pentschevcommented, Sep 21, 2021

To add to @quasiben 's answer, we have a setup_memory_pool function in Dask-CUDA benchmarks to do exactly that.

1reaction
quasibencommented, Sep 21, 2021

@astrophysaxist unfortunately, we don’t really have a good example

It can be controlled with a combination of Dask and CuPy: https://docs.dask.org/en/latest/configuration-reference.html#rmm +

client.run(cupy.cuda.set_allocator, rmm.rmm_cupy_allocator)

Or with Dask-CUDA directly:

from dask_cuda import LocalCUDACluster
cluster = LocalCUDACluster(rmm_pool_size="10GB", device_memory_limit=..., )

Can I ask you to file an issue on Dask-CUDA issue tracker: https://github.com/rapidsai/dask-cuda/

There we can find the way to best document how to use RMM with Dask

Read more comments on GitHub >

github_iconTop Results From Across the Web

Chapter 6 Memory management
C provides 4 functions for dynamic memory allocation: malloc, which takes an integer size, in bytes, and returns a pointer to a newly-allocated...
Read more >
Array memory management
I am doing Multiprocessor programming using C. One requirement for us it that, we cannot keep allocating small chunks of memory. Memory can...
Read more >
Dynamic Memory Management
Look at current chunk. Next chunk in memory == next chunk in list =>. Remove both chunks from list. Coalesce. Insert chunk into...
Read more >
Contiguous chunks of memory
Arrays /strings: pointers and memory allocation ... claims that C-style arrays/strings required for efficiency) ... size of chunk can be set at.
Read more >
Memory Management, C++ FAQ
But you cannot allocate an object with malloc() and free it using delete . Nor can you allocate with new and delete with...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found