Using fft/ifft with large ndarray + managed allocator leads to "illegal memory access was encountered"
See original GitHub issueHello. I’m trying to apply the cupyx.scipy.fft.fft/ifft
routines to very large ndarrays. Here is the output of cupy.print_config()
:
OS : Linux-5.11.14-100.fc32.x86_64-x86_64-with-glibc2.2.5
CuPy Version : 9.0.0
NumPy Version : 1.20.2
SciPy Version : 1.6.2
Cython Build Version : 0.29.22
Cython Runtime Version : None
CUDA Root : /usr
CUDA Build Version : 11020
CUDA Driver Version : 11030
CUDA Runtime Version : 11020
cuBLAS Version : 11401
cuFFT Version : 10400
cuRAND Version : 10203
cuSOLVER Version : (11, 1, 0)
cuSPARSE Version : 11400
NVRTC Version : (11, 2)
Thrust Version : 101000
CUB Build Version : 101000
Jitify Build Version : 60e9e72
cuDNN Build Version : 8101
cuDNN Version : 8100
NCCL Build Version : None
NCCL Runtime Version : None
cuTENSOR Version : None
cuSPARSELt Build Version : None
Device 0 Name : NVIDIA GeForce RTX 2070 SUPER
Device 0 Compute Capability : 75
Device 0 PCI Bus ID : 0000:0A:00.0
My first problem was that a large ndarray (bigger than available GPU memory) could not be copied to my device (RTX 2070S) due to exceeding the available memory (which is expected/unsurprising). On Gitter, @leofang mentioned I could try using managed memory to handle such a case. During the experiment with the managed memory allocator, I encounter the error CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered
.
Here is some code to minimally reproduce the issue (you may need to adjust the ndarray size appropriately for your vmem): https://gist.github.com/sevagh/3017d7c2393d3fd184b290e54dcae441. Copied inline for convenience:
import numpy as np
import scipy.fft
import cupyx
import cupy
# managed memory to stream large ndarrays to cuda
cupy.cuda.set_allocator(cupy.cuda.MemoryPool(cupy.cuda.memory.malloc_managed).malloc)
if __name__ == '__main__':
# 10GB array, approx. won't fit in my 8GB gpu vmem
big_array = np.random.standard_normal((1000, 1000000))
print(f'size: {big_array.nbytes//1e9} GB')
print(f'shape: {big_array.shape}')
print(f'dtype: {big_array.dtype}')
print(cupy.show_config())
# perform giant 2D IFFT
big_array = cupyx.scipy.fft.ifft2(cupy.array(big_array))
Errors:
- To get the OOM behavior, you can comment out the
set_allocator
line:cupy.cuda.memory.OutOfMemoryError: Out of memory allocating 8,000,000,000 bytes (allocated so far: 0 bytes).
- this however isn’t surprising but expected - To get the illegal access behavior, keep the
set_allocator
line.
What’s interesting is that I tried a few different lines of code. Some succeed, some fail:
- Apply fft2 (forward fft) is OK
- Apply
cupy.asnumpy()
on the output of fft2 is not OK - leads to illegal memory access - Apply ifft2 (backward fft) is not OK - leads to illegal memory access
ifft2 stacktrace:
File "test.py", line 22, in <module>
big_array = cupyx.scipy.fft.ifft2(cupy.array(big_array))
File "/home/sevagh/venvs/museval-optimization-2/lib/python3.8/site-packages/cupyx/scipy/fft/_fft.py", line 214, in ifft2
return ifftn(x, s, axes, norm, overwrite_x, plan=plan)
File "/home/sevagh/venvs/museval-optimization-2/lib/python3.8/site-packages/cupyx/scipy/fft/_fft.py", line 285, in ifftn
return func(x, s, axes, norm, cufft.CUFFT_INVERSE, overwrite_x=overwrite_x,
File "/home/sevagh/venvs/museval-optimization-2/lib/python3.8/site-packages/cupy/fft/_fft.py", line 614, in _fftn
a = _exec_fftn(a, direction, value_type, norm=norm, axes=axes_sorted,
File "/home/sevagh/venvs/museval-optimization-2/lib/python3.8/site-packages/cupy/fft/_fft.py", line 514, in _exec_fftn
plan = _get_cufft_plan_nd(a.shape, fft_type, axes=axes, order=order,
File "/home/sevagh/venvs/museval-optimization-2/lib/python3.8/site-packages/cupy/fft/_fft.py", line 456, in _get_cufft_plan_nd
plan = cufft.PlanNd(*keys)
File "cupy/cuda/cufft.pyx", line 790, in cupy.cuda.cufft.PlanNd.__init__
File "cupy/cuda/cufft.pyx", line 169, in cupy.cuda.cufft.check_result
cupy.cuda.cufft.CuFFTError: CUFFT_INTERNAL_ERROR
Traceback (most recent call last):
File "cupy_backends/cuda/api/driver.pyx", line 260, in cupy_backends.cuda.api.driver.moduleUnload
File "cupy_backends/cuda/api/driver.pyx", line 125, in cupy_backends.cuda.api.driver.check_status
cupy_backends.cuda.api.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered
Exception ignored in: 'cupy.cuda.function.Module.__dealloc__'
Traceback (most recent call last):
File "cupy_backends/cuda/api/driver.pyx", line 260, in cupy_backends.cuda.api.driver.moduleUnload
File "cupy_backends/cuda/api/driver.pyx", line 125, in cupy_backends.cuda.api.driver.check_status
cupy_backends.cuda.api.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered
Issue Analytics
- State:
- Created 2 years ago
- Comments:19 (8 by maintainers)
Top GitHub Comments
Yeah I removed too much info - the processing is very repetitive, such that the cache is desired and useful (it does ~30 FFT + IFFT of the same size) per different music track.
However, I run the analysis in the outer for loop for multiple diverse music tracks, so after each track it’s necessary to clear the previous cache (or else run out of GPU memory).
For what it’s worth, I was able to work around the original issue by using smaller ndarrays (mono-channel audio instead of 2-channel), and not using the managed malloc. Instead, I play around with and clear the cupy FFT cache regularly:
It looks like the plan cache can grow surprisingly large if we try to do FFTs on different and huge ndarrays.