Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Error when using XtPlanNd for FP16 R2C transformation

See original GitHub issue

Description

Since XtPlanNd isn’t documented beyond this sample, I might just be holding it wrong (in particular, I had to figure out what last_axis and last_size are for). I tried to modify that example to do a real-to-complex transform as attached below.

When run, it gives this output:

Traceback (most recent call last):
  File "./cupy_ft16.py", line 16, in <module>
    plan = cp.cuda.cufft.XtPlanNd(shape[1:],
  File "cupy/cuda/cufft.pyx", line 968, in cupy.cuda.cufft.XtPlanNd.__init__
  File "cupy/cuda/cufft.pyx", line 1068, in cupy.cuda.cufft.XtPlanNd._sanity_checks
ValueError: size must be power of 2

I believe the issue is this check, which is okay for C2C and C2R, but (assuming I’ve supplied last_size correctly) in R2C it is failing because the last dimension of the output array is floor(n/2)+1, which is not a power of 2 even though the problem size is a power of 2.

On a semi-related note, those checks are also not testing this condition on FP16 transforms from the CUDA docs:

Strides on the real part of real-to-complex and complex-to-real transforms are not supported

To Reproduce

#!/usr/bin/env python3

import cupy as cp
import numpy as np


shape = (1024, 65536)  # input array shape
idtype = 'e'  # numpy.float16
odtype = edtype = 'E'  # = numpy.complex32 in the future

# store the output array as fp16 arrays twice as long, as complex32 is not yet available
a = cp.random.random(shape).astype(cp.float16)
out = cp.empty_like(a, shape=(shape[0], shape[1] + 2))

# FFT with cuFFT
plan = cp.cuda.cufft.XtPlanNd(shape[1:],
                              a.shape[1:], 1, a.shape[1], idtype,
                              (out.shape[1] // 2,), 1, (out.shape[1] // 2), odtype,
                              shape[0], edtype,
                              order='C', last_axis=-1, last_size=out.shape[-1] // 2)

plan.fft(a, out, cp.cuda.cufft.CUFFT_FORWARD)

# FFT with NumPy
a_np = cp.asnumpy(a).astype(np.float32)  # upcast
out_np = np.fft.rfftn(a_np, axes=(-1,))
out_np = np.ascontiguousarray(out_np).astype(np.complex64)  # downcast
out_np = out_np.view(np.float32)
out_np = out_np.astype(np.float16)

# don't worry about accruacy for now, as we probably lost a lot during casting
print('ok' if cp.mean(cp.abs(out - cp.asarray(out_np))) < 0.1 else 'not ok')

Installation

Wheel (pip install cupy-***)

Environment

OS                           : Linux-5.17.5-76051705-generic-x86_64-with-glibc2.29
Python Version               : 3.8.10
CuPy Version                 : 10.5.0
CuPy Platform                : NVIDIA CUDA
NumPy Version                : 1.21.5
SciPy Version                : 1.8.1
Cython Build Version         : 0.29.24
Cython Runtime Version       : 0.29.21
CUDA Root                    : /usr/local/cuda
nvcc PATH                    : /usr/local/cuda/bin/nvcc
CUDA Build Version           : 11040
CUDA Driver Version          : 11060
CUDA Runtime Version         : 11040
cuBLAS Version               : (available)
cuFFT Version                : 10502
cuRAND Version               : 10205
cuSOLVER Version             : (11, 2, 0)
cuSPARSE Version             : (available)
NVRTC Version                : (11, 4)
Thrust Version               : 101201
CUB Build Version            : 101201
Jitify Build Version         : 4a37de0
cuDNN Build Version          : (not loaded; try `import cupy.cuda.cudnn` first)
cuDNN Version                : (not loaded; try `import cupy.cuda.cudnn` first)
NCCL Build Version           : None
NCCL Runtime Version         : None
cuTENSOR Version             : None
cuSPARSELt Build Version     : None
Device 0 Name                : NVIDIA GeForce RTX 2060
Device 0 Compute Capability  : 75
Device 0 PCI Bus ID          : 0000:01:00.0

Additional Information

No response

Issue Analytics

State:
Created a year ago
Comments:5 (4 by maintainers)

Top GitHub Comments

1reaction

leofangcommented, Jun 21, 2022

@kmaehashi plz assign this to me and I’ll try to get this addressed for CuPy v11.0.

0reactions

leofangcommented, Jun 10, 2022

I’m guessing last_size is only needed to compute the shape of the output if no output is provided?

That’s right.

I think the current checks are not very appropriate to R2C/C2R as you pointed out.

I’m actually wondering if checking only last_size is correct even for C2C. I haven’t tested it, but I would assume that all the transform dimensions (the first argument to XtPlanNd) would need to be powers of 2.

That’s only applicable to C2C/R2C. For C2R it’s the output size that should be power of 2; the input size is an odd number (in the transformed axis).

Let me further note that not much docstring was added for XtPlanNd because it was considered a low-level wrapper over cufftXtMakePlanMany and we expected advanced users to check out cuFFT documentation.

That’s fair, but last_axis and last_size are specific to cupy rather than arguments to cufftXtMakePlanMany.

That’s another fair point. Perhaps I should just add default values to them, and do not show them in the examples. They’re needed for integrating with the high-level APIs.