Error when using XtPlanNd for FP16 R2C transformation
See original GitHub issueDescription
Since XtPlanNd isn’t documented beyond this sample, I might just be holding it wrong (in particular, I had to figure out what last_axis
and last_size
are for). I tried to modify that example to do a real-to-complex transform as attached below.
When run, it gives this output:
Traceback (most recent call last):
File "./cupy_ft16.py", line 16, in <module>
plan = cp.cuda.cufft.XtPlanNd(shape[1:],
File "cupy/cuda/cufft.pyx", line 968, in cupy.cuda.cufft.XtPlanNd.__init__
File "cupy/cuda/cufft.pyx", line 1068, in cupy.cuda.cufft.XtPlanNd._sanity_checks
ValueError: size must be power of 2
I believe the issue is this check, which is okay for C2C and C2R, but (assuming I’ve supplied last_size correctly) in R2C it is failing because the last dimension of the output array is floor(n/2)+1, which is not a power of 2 even though the problem size is a power of 2.
On a semi-related note, those checks are also not testing this condition on FP16 transforms from the CUDA docs:
- Strides on the real part of real-to-complex and complex-to-real transforms are not supported
To Reproduce
#!/usr/bin/env python3
import cupy as cp
import numpy as np
shape = (1024, 65536) # input array shape
idtype = 'e' # numpy.float16
odtype = edtype = 'E' # = numpy.complex32 in the future
# store the output array as fp16 arrays twice as long, as complex32 is not yet available
a = cp.random.random(shape).astype(cp.float16)
out = cp.empty_like(a, shape=(shape[0], shape[1] + 2))
# FFT with cuFFT
plan = cp.cuda.cufft.XtPlanNd(shape[1:],
a.shape[1:], 1, a.shape[1], idtype,
(out.shape[1] // 2,), 1, (out.shape[1] // 2), odtype,
shape[0], edtype,
order='C', last_axis=-1, last_size=out.shape[-1] // 2)
plan.fft(a, out, cp.cuda.cufft.CUFFT_FORWARD)
# FFT with NumPy
a_np = cp.asnumpy(a).astype(np.float32) # upcast
out_np = np.fft.rfftn(a_np, axes=(-1,))
out_np = np.ascontiguousarray(out_np).astype(np.complex64) # downcast
out_np = out_np.view(np.float32)
out_np = out_np.astype(np.float16)
# don't worry about accruacy for now, as we probably lost a lot during casting
print('ok' if cp.mean(cp.abs(out - cp.asarray(out_np))) < 0.1 else 'not ok')
Installation
Wheel (pip install cupy-***
)
Environment
OS : Linux-5.17.5-76051705-generic-x86_64-with-glibc2.29
Python Version : 3.8.10
CuPy Version : 10.5.0
CuPy Platform : NVIDIA CUDA
NumPy Version : 1.21.5
SciPy Version : 1.8.1
Cython Build Version : 0.29.24
Cython Runtime Version : 0.29.21
CUDA Root : /usr/local/cuda
nvcc PATH : /usr/local/cuda/bin/nvcc
CUDA Build Version : 11040
CUDA Driver Version : 11060
CUDA Runtime Version : 11040
cuBLAS Version : (available)
cuFFT Version : 10502
cuRAND Version : 10205
cuSOLVER Version : (11, 2, 0)
cuSPARSE Version : (available)
NVRTC Version : (11, 4)
Thrust Version : 101201
CUB Build Version : 101201
Jitify Build Version : 4a37de0
cuDNN Build Version : (not loaded; try `import cupy.cuda.cudnn` first)
cuDNN Version : (not loaded; try `import cupy.cuda.cudnn` first)
NCCL Build Version : None
NCCL Runtime Version : None
cuTENSOR Version : None
cuSPARSELt Build Version : None
Device 0 Name : NVIDIA GeForce RTX 2060
Device 0 Compute Capability : 75
Device 0 PCI Bus ID : 0000:01:00.0
Additional Information
No response
Issue Analytics
- State:
- Created a year ago
- Comments:5 (4 by maintainers)
Top Results From Across the Web
Fast Fourier Transform with CuPy — CuPy 11.4.0 documentation
If an out-of-memory error happens, one may want to inspect, clear, or limit the plan cache. Note. The plans returned by get_fft_plan() are...
Read more >CuPy Documentation - Read the Docs
CuPy is a NumPy/SciPy-compatible array library for GPU-accelerated computing with Python. CuPy acts as a drop-in.
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@kmaehashi plz assign this to me and I’ll try to get this addressed for CuPy v11.0.
That’s right.
That’s only applicable to C2C/R2C. For C2R it’s the output size that should be power of 2; the input size is an odd number (in the transformed axis).
That’s another fair point. Perhaps I should just add default values to them, and do not show them in the examples. They’re needed for integrating with the high-level APIs.