warp-ctc make error ( identifier "__shfl_down" is undefined )
See original GitHub issueHi, @SeanNaren
I have a trouble with building warp-ctc.
After successful cmake
and then I type make
,
[ 11%] Building NVCC (Device) object CMakeFiles/warpctc.dir/src/warpctc_generated_reduce.cu.o
/home/jonghu/ds2/warp-ctc/src/reduce.cu(44): error: identifier "__shfl_down" is undefined
detected during:
instantiation of "T CTAReduce<NT, T, Rop>::reduce(int, T, CTAReduce<NT, T, Rop>::Storage &, int, Rop) [with NT=128, T=float, Rop=ctc_helper::add<float, float>]"
(76): here
instantiation of "void reduce_rows<NT,Iop,Rop,T>(Iop, Rop, const T *, T *, int, int) [with NT=128, Iop=ctc_helper::negate<float, float>, Rop=ctc_helper::add<float, float>, T=float]"
(124): here
instantiation of "void ReduceHelper::impl(Iof, Rof, const T *, T *, int, int, __nv_bool, cudaStream_t) [with T=float, Iof=ctc_helper::negate<float, float>, Rof=ctc_helper::add<float, float>]"
(139): here
instantiation of "ctcStatus_t reduce(Iof, Rof, const T *, T *, int, int, __nv_bool, cudaStream_t) [with T=float, Iof=ctc_helper::negate<float, float>, Rof=ctc_helper::add<float, float>]"
(149): here
/home/jonghu/ds2/warp-ctc/src/reduce.cu(44): error: identifier "__shfl_down" is undefined
detected during:
instantiation of "T CTAReduce<NT, T, Rop>::reduce(int, T, CTAReduce<NT, T, Rop>::Storage &, int, Rop) [with NT=128, T=float, Rop=ctc_helper::maximum<float, float>]"
(76): here
instantiation of "void reduce_rows<NT,Iop,Rop,T>(Iop, Rop, const T *, T *, int, int) [with NT=128, Iop=ctc_helper::identity<float, float>, Rop=ctc_helper::maximum<float, float>, T=float]"
(124): here
instantiation of "void ReduceHelper::impl(Iof, Rof, const T *, T *, int, int, __nv_bool, cudaStream_t) [with T=float, Iof=ctc_helper::identity<float, float>, Rof=ctc_helper::maximum<float, float>]"
(139): here
instantiation of "ctcStatus_t reduce(Iof, Rof, const T *, T *, int, int, __nv_bool, cudaStream_t) [with T=float, Iof=ctc_helper::identity<float, float>, Rof=ctc_helper::maximum<float, float>]"
(157): here
2 errors detected in the compilation of "/tmp/tmpxft_0000636d_00000000-13_reduce.compute_70.cpp1.ii".
CMake Error at warpctc_generated_reduce.cu.o.cmake:279 (message):
Error generating file
/home/jonghu/ds2/warp-ctc/build/CMakeFiles/warpctc.dir/src/./warpctc_generated_reduce.cu.o
CMakeFiles/warpctc.dir/build.make:337: recipe for target 'CMakeFiles/warpctc.dir/src/warpctc_generated_reduce.cu.o' failed
make[2]: *** [CMakeFiles/warpctc.dir/src/warpctc_generated_reduce.cu.o] Error 1
CMakeFiles/Makefile2:109: recipe for target 'CMakeFiles/warpctc.dir/all' failed
make[1]: *** [CMakeFiles/warpctc.dir/all] Error 2
Makefile:129: recipe for target 'all' failed
make: *** [all] Error 2
this error occurs.
I’ve researched some and found out that
__shfl_down()
is deprecated and deleted from high version device ( link ) so needs to be changed to __shfl_down_sync()
.
But when I change __shfl_down()
to __shfl_down_sync()
in warp-ctc/src/reduce.cu
,
[ 11%] Building NVCC (Device) object CMakeFiles/warpctc.dir/src/warpctc_generated_reduce.cu.o
/home/jonghu/ds2/warp-ctc/src/reduce.cu(44): error: no instance of overloaded function "__shfl_down_sync" matches the argument list
argument types are: (float, int)
detected during:
instantiation of "T CTAReduce<NT, T, Rop>::reduce(int, T, CTAReduce<NT, T, Rop>::Storage &, int, Rop) [with NT=128, T=float, Rop=ctc_helper::add<float, float>]"
(76): here
instantiation of "void reduce_rows<NT,Iop,Rop,T>(Iop, Rop, const T *, T *, int, int) [with NT=128, Iop=ctc_helper::negate<float, float>, Rop=ctc_helper::add<float, float>, T=float]"
(124): here
instantiation of "void ReduceHelper::impl(Iof, Rof, const T *, T *, int, int, __nv_bool, cudaStream_t) [with T=float, Iof=ctc_helper::negate<float, float>, Rof=ctc_helper::add<float, float>]"
(139): here
instantiation of "ctcStatus_t reduce(Iof, Rof, const T *, T *, int, int, __nv_bool, cudaStream_t) [with T=float, Iof=ctc_helper::negate<float, float>, Rof=ctc_helper::add<float, float>]"
(149): here
/home/jonghu/ds2/warp-ctc/src/reduce.cu(44): error: no instance of overloaded function "__shfl_down_sync" matches the argument list
argument types are: (float, int)
detected during:
instantiation of "T CTAReduce<NT, T, Rop>::reduce(int, T, CTAReduce<NT, T, Rop>::Storage &, int, Rop) [with NT=128, T=float, Rop=ctc_helper::maximum<float, float>]"
(76): here
instantiation of "void reduce_rows<NT,Iop,Rop,T>(Iop, Rop, const T *, T *, int, int) [with NT=128, Iop=ctc_helper::identity<float, float>, Rop=ctc_helper::maximum<float, float>, T=float]"
(124): here
instantiation of "void ReduceHelper::impl(Iof, Rof, const T *, T *, int, int, __nv_bool, cudaStream_t) [with T=float, Iof=ctc_helper::identity<float, float>, Rof=ctc_helper::maximum<float, float>]"
(139): here
instantiation of "ctcStatus_t reduce(Iof, Rof, const T *, T *, int, int, __nv_bool, cudaStream_t) [with T=float, Iof=ctc_helper::identity<float, float>, Rof=ctc_helper::maximum<float, float>]"
(157): here
2 errors detected in the compilation of "/tmp/tmpxft_000063c5_00000000-13_reduce.compute_70.cpp1.ii".
CMake Error at warpctc_generated_reduce.cu.o.cmake:279 (message):
Error generating file
/home/jonghu/ds2/warp-ctc/build/CMakeFiles/warpctc.dir/src/./warpctc_generated_reduce.cu.o
CMakeFiles/warpctc.dir/build.make:337: recipe for target 'CMakeFiles/warpctc.dir/src/warpctc_generated_reduce.cu.o' failed
make[2]: *** [CMakeFiles/warpctc.dir/src/warpctc_generated_reduce.cu.o] Error 1
CMakeFiles/Makefile2:109: recipe for target 'CMakeFiles/warpctc.dir/all' failed
make[1]: *** [CMakeFiles/warpctc.dir/all] Error 2
Makefile:129: recipe for target 'all' failed
make: *** [all] Error 2
this error occurs.
My GPU is GeForce RTX 2080 Ti which failed with CUDA version 9.0, 9.1, and 10.1. Is there a way to solve this issue?
Sincerely, Jonghu.
Issue Analytics
- State:
- Created 5 years ago
- Comments:12 (2 by maintainers)
Top Results From Across the Web
identifier "__shfl_down" is undefined for cuda-7.5
Warp shuffle intrinsics are only defined (only supported on) compute capability (cc) 3.0 architectures and higher.
Read more >Implementing block reduction operations (with no warp shfl)
Yes. the _sync versions of shuffle were not made available until CUDA 9.x ... error: identifier “__shfl_down” is undefined.
Read more >Lecture 4: warp shuffles, and reduction / scan operations
Warp shuffles are a faster mechanism for moving data between threads in the same warp. There are 4 variants: shfl up sync copy...
Read more >identifier "__shfl_down" is undefined for cuda-...anycodings
Warp shuffle intrinsics are only defined anycodings_gcc (only supported on) compute capability anycodings_gcc (cc) 3.0 architectures and ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
src/reduce.cu Line 44 to : shuff = __shfl_down_sync(0xFFFFFFFF, x, offset);
include/contrib/moderngpu/include/device/intrinsics.cuh Line 115 to : var = __shfl_up_sync(0xFFFFFFFF, var, delta, width); Line 125 to : p.x = __shfl_up_sync(0xFFFFFFFF, p.x, delta, width); Line 126 to : p.y = __shfl_up_sync(0xFFFFFFFF, p.y, delta, width); Line 143 to : “shfl.up.sync.b32 r0|p, %1, %2, %3, %4;” Line 158 to : “shfl.up.sync.b32 r0|p, %1, %2, %3, %4;”
works fine with CUDA 10.1
This is the correct solution by Oct. 2019.