question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Using fft/ifft with large ndarray + managed allocator leads to "illegal memory access was encountered"

See original GitHub issue

Hello. I’m trying to apply the cupyx.scipy.fft.fft/ifft routines to very large ndarrays. Here is the output of cupy.print_config():

OS                           : Linux-5.11.14-100.fc32.x86_64-x86_64-with-glibc2.2.5
CuPy Version                 : 9.0.0
NumPy Version                : 1.20.2
SciPy Version                : 1.6.2
Cython Build Version         : 0.29.22
Cython Runtime Version       : None
CUDA Root                    : /usr
CUDA Build Version           : 11020
CUDA Driver Version          : 11030
CUDA Runtime Version         : 11020
cuBLAS Version               : 11401
cuFFT Version                : 10400
cuRAND Version               : 10203
cuSOLVER Version             : (11, 1, 0)
cuSPARSE Version             : 11400
NVRTC Version                : (11, 2)
Thrust Version               : 101000
CUB Build Version            : 101000
Jitify Build Version         : 60e9e72
cuDNN Build Version          : 8101
cuDNN Version                : 8100
NCCL Build Version           : None
NCCL Runtime Version         : None
cuTENSOR Version             : None
cuSPARSELt Build Version     : None
Device 0 Name                : NVIDIA GeForce RTX 2070 SUPER
Device 0 Compute Capability  : 75
Device 0 PCI Bus ID          : 0000:0A:00.0

My first problem was that a large ndarray (bigger than available GPU memory) could not be copied to my device (RTX 2070S) due to exceeding the available memory (which is expected/unsurprising). On Gitter, @leofang mentioned I could try using managed memory to handle such a case. During the experiment with the managed memory allocator, I encounter the error CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered.

Here is some code to minimally reproduce the issue (you may need to adjust the ndarray size appropriately for your vmem): https://gist.github.com/sevagh/3017d7c2393d3fd184b290e54dcae441. Copied inline for convenience:

import numpy as np
import scipy.fft
import cupyx
import cupy

# managed memory to stream large ndarrays to cuda
cupy.cuda.set_allocator(cupy.cuda.MemoryPool(cupy.cuda.memory.malloc_managed).malloc)


if __name__ == '__main__':
    # 10GB array, approx. won't fit in my 8GB gpu vmem
    big_array = np.random.standard_normal((1000, 1000000))

    print(f'size: {big_array.nbytes//1e9} GB')
    print(f'shape: {big_array.shape}')
    print(f'dtype: {big_array.dtype}')

    print(cupy.show_config())

    # perform giant 2D IFFT
    big_array = cupyx.scipy.fft.ifft2(cupy.array(big_array))

Errors:

  1. To get the OOM behavior, you can comment out the set_allocator line: cupy.cuda.memory.OutOfMemoryError: Out of memory allocating 8,000,000,000 bytes (allocated so far: 0 bytes). - this however isn’t surprising but expected
  2. To get the illegal access behavior, keep the set_allocator line.

What’s interesting is that I tried a few different lines of code. Some succeed, some fail:

  1. Apply fft2 (forward fft) is OK
  2. Apply cupy.asnumpy() on the output of fft2 is not OK - leads to illegal memory access
  3. Apply ifft2 (backward fft) is not OK - leads to illegal memory access

ifft2 stacktrace:

  File "test.py", line 22, in <module>
    big_array = cupyx.scipy.fft.ifft2(cupy.array(big_array))
  File "/home/sevagh/venvs/museval-optimization-2/lib/python3.8/site-packages/cupyx/scipy/fft/_fft.py", line 214, in ifft2
    return ifftn(x, s, axes, norm, overwrite_x, plan=plan)
  File "/home/sevagh/venvs/museval-optimization-2/lib/python3.8/site-packages/cupyx/scipy/fft/_fft.py", line 285, in ifftn
    return func(x, s, axes, norm, cufft.CUFFT_INVERSE, overwrite_x=overwrite_x,
  File "/home/sevagh/venvs/museval-optimization-2/lib/python3.8/site-packages/cupy/fft/_fft.py", line 614, in _fftn
    a = _exec_fftn(a, direction, value_type, norm=norm, axes=axes_sorted,
  File "/home/sevagh/venvs/museval-optimization-2/lib/python3.8/site-packages/cupy/fft/_fft.py", line 514, in _exec_fftn
    plan = _get_cufft_plan_nd(a.shape, fft_type, axes=axes, order=order,
  File "/home/sevagh/venvs/museval-optimization-2/lib/python3.8/site-packages/cupy/fft/_fft.py", line 456, in _get_cufft_plan_nd
    plan = cufft.PlanNd(*keys)
  File "cupy/cuda/cufft.pyx", line 790, in cupy.cuda.cufft.PlanNd.__init__
  File "cupy/cuda/cufft.pyx", line 169, in cupy.cuda.cufft.check_result
cupy.cuda.cufft.CuFFTError: CUFFT_INTERNAL_ERROR
Traceback (most recent call last):
  File "cupy_backends/cuda/api/driver.pyx", line 260, in cupy_backends.cuda.api.driver.moduleUnload
  File "cupy_backends/cuda/api/driver.pyx", line 125, in cupy_backends.cuda.api.driver.check_status
cupy_backends.cuda.api.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered
Exception ignored in: 'cupy.cuda.function.Module.__dealloc__'
Traceback (most recent call last):
  File "cupy_backends/cuda/api/driver.pyx", line 260, in cupy_backends.cuda.api.driver.moduleUnload
  File "cupy_backends/cuda/api/driver.pyx", line 125, in cupy_backends.cuda.api.driver.check_status
cupy_backends.cuda.api.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered

Issue Analytics

  • State:open
  • Created 2 years ago
  • Comments:19 (8 by maintainers)

github_iconTop GitHub Comments

1reaction
sevaghcommented, May 6, 2021

Yeah I removed too much info - the processing is very repetitive, such that the cache is desired and useful (it does ~30 FFT + IFFT of the same size) per different music track.

However, I run the analysis in the outer for loop for multiple diverse music tracks, so after each track it’s necessary to clear the previous cache (or else run out of GPU memory).

1reaction
sevaghcommented, May 6, 2021

For what it’s worth, I was able to work around the original issue by using smaller ndarrays (mono-channel audio instead of 2-channel), and not using the managed malloc. Instead, I play around with and clear the cupy FFT cache regularly:

mempool = cupy.get_default_memory_pool()

for item in items_to_process:
        ######
        do some expensive cupy fft here
        ######
        
        # cupy disable fft caching to free blocks
        fft_cache = get_plan_cache()
        fft_cache.set_size(0)
        
        mempool.free_all_blocks()
        
        # cupy reenable fft caching
        fft_cache.set_size(16)
        fft_cache.set_memsize(-1)

It looks like the plan cache can grow surprisingly large if we try to do FFTs on different and huge ndarrays.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Cuda illegal memory access error when using array indexes ...
The problem appears to be that the generated executable doesn't pass the variable ncell to the kernel correctly.
Read more >
Applications | SpringerLink
This chapter focuses on applications of FFT/IFFT in a number of diverse fields. In view of the extensive nature of their applications they ......
Read more >
A high performance four-parallel 128/64-point radix-24 FFT ...
The proposed FFT/IFFT processor was implemented using Xilinx Virtex-4 FPGA. Our proposed processor achieves a considerable performance, which ...
Read more >
Acceleration in Acoustic Wave Propagation Modelling using ...
Fortran code using the OpenMP/OpenACC hybrid technology, on our ... compiler provides advanced features of OpenACC such as managed memory and.
Read more >
an illegal memory access was encountered
A very very strange place is that when I use a smaller matrix (120×32400) to do a test, there is no error occured...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found