Support for half-precision complex numbers?
See original GitHub issueThis issue is mainly for tracking the upstream counterpart (https://github.com/numpy/numpy/issues/14753) for compatibility purposes. However, there are unique challenges and opportunities in CUDA that makes this support worth considering independently of upstream (for the moment).
Specifically, many scientific computing problems may have a portion (could be a significant one) in the computation that has high tolerance of low precision. For these applications, using half complex could further save both memory footprint and computing time. However, AFAIK all CUDA libraries (cuFFT, cuSPARSE, cuSOLVER, etc) do not support half complex, so in CuPy a type cast to single complex is a must if CUDA libraries are to be used. Using elementwise/reduction kernels is free of such a problem, though, so it’d be interesting to target them as the first step.
If I were to work on this, I would start from investigating whether thrust::complex
can support complex<__half>
or not, and build up the infrastructure gradually.
cc: @smarkesini
Issue Analytics
- State:
- Created 3 years ago
- Reactions:6
- Comments:14 (10 by maintainers)
Top GitHub Comments
You can get a better matrix multiply performance with TensorCore, but fp16 complex numbers are not supported by CUDA libraries except cublasLtMatmul() in cuBLASTLt. Even with cublasLtMatmul(), all matrices must be in planar complex format, so format conversion will be needed before and after the call.
I should point out that thrust will eventually be replacing its
complex
header with the libcu++ (https://github.com/NVIDIA/libcudacxx) implementation, so keep an eye on how that effort evolves.