question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

warp-ctc make error ( identifier "__shfl_down" is undefined )

See original GitHub issue

Hi, @SeanNaren

I have a trouble with building warp-ctc. After successful cmake and then I type make,

[ 11%] Building NVCC (Device) object CMakeFiles/warpctc.dir/src/warpctc_generated_reduce.cu.o
/home/jonghu/ds2/warp-ctc/src/reduce.cu(44): error: identifier "__shfl_down" is undefined
          detected during:
            instantiation of "T CTAReduce<NT, T, Rop>::reduce(int, T, CTAReduce<NT, T, Rop>::Storage &, int, Rop) [with NT=128, T=float, Rop=ctc_helper::add<float, float>]"
(76): here
            instantiation of "void reduce_rows<NT,Iop,Rop,T>(Iop, Rop, const T *, T *, int, int) [with NT=128, Iop=ctc_helper::negate<float, float>, Rop=ctc_helper::add<float, float>, T=float]"
(124): here
            instantiation of "void ReduceHelper::impl(Iof, Rof, const T *, T *, int, int, __nv_bool, cudaStream_t) [with T=float, Iof=ctc_helper::negate<float, float>, Rof=ctc_helper::add<float, float>]"
(139): here
            instantiation of "ctcStatus_t reduce(Iof, Rof, const T *, T *, int, int, __nv_bool, cudaStream_t) [with T=float, Iof=ctc_helper::negate<float, float>, Rof=ctc_helper::add<float, float>]"
(149): here

/home/jonghu/ds2/warp-ctc/src/reduce.cu(44): error: identifier "__shfl_down" is undefined
          detected during:
            instantiation of "T CTAReduce<NT, T, Rop>::reduce(int, T, CTAReduce<NT, T, Rop>::Storage &, int, Rop) [with NT=128, T=float, Rop=ctc_helper::maximum<float, float>]"
(76): here
            instantiation of "void reduce_rows<NT,Iop,Rop,T>(Iop, Rop, const T *, T *, int, int) [with NT=128, Iop=ctc_helper::identity<float, float>, Rop=ctc_helper::maximum<float, float>, T=float]"
(124): here
            instantiation of "void ReduceHelper::impl(Iof, Rof, const T *, T *, int, int, __nv_bool, cudaStream_t) [with T=float, Iof=ctc_helper::identity<float, float>, Rof=ctc_helper::maximum<float, float>]"
(139): here
            instantiation of "ctcStatus_t reduce(Iof, Rof, const T *, T *, int, int, __nv_bool, cudaStream_t) [with T=float, Iof=ctc_helper::identity<float, float>, Rof=ctc_helper::maximum<float, float>]"
(157): here

2 errors detected in the compilation of "/tmp/tmpxft_0000636d_00000000-13_reduce.compute_70.cpp1.ii".
CMake Error at warpctc_generated_reduce.cu.o.cmake:279 (message):
  Error generating file
  /home/jonghu/ds2/warp-ctc/build/CMakeFiles/warpctc.dir/src/./warpctc_generated_reduce.cu.o


CMakeFiles/warpctc.dir/build.make:337: recipe for target 'CMakeFiles/warpctc.dir/src/warpctc_generated_reduce.cu.o' failed
make[2]: *** [CMakeFiles/warpctc.dir/src/warpctc_generated_reduce.cu.o] Error 1
CMakeFiles/Makefile2:109: recipe for target 'CMakeFiles/warpctc.dir/all' failed
make[1]: *** [CMakeFiles/warpctc.dir/all] Error 2
Makefile:129: recipe for target 'all' failed
make: *** [all] Error 2

this error occurs.

I’ve researched some and found out that __shfl_down() is deprecated and deleted from high version device ( link ) so needs to be changed to __shfl_down_sync().

But when I change __shfl_down() to __shfl_down_sync() in warp-ctc/src/reduce.cu,

[ 11%] Building NVCC (Device) object CMakeFiles/warpctc.dir/src/warpctc_generated_reduce.cu.o
/home/jonghu/ds2/warp-ctc/src/reduce.cu(44): error: no instance of overloaded function "__shfl_down_sync" matches the argument list
            argument types are: (float, int)
          detected during:
            instantiation of "T CTAReduce<NT, T, Rop>::reduce(int, T, CTAReduce<NT, T, Rop>::Storage &, int, Rop) [with NT=128, T=float, Rop=ctc_helper::add<float, float>]"
(76): here
            instantiation of "void reduce_rows<NT,Iop,Rop,T>(Iop, Rop, const T *, T *, int, int) [with NT=128, Iop=ctc_helper::negate<float, float>, Rop=ctc_helper::add<float, float>, T=float]"
(124): here
            instantiation of "void ReduceHelper::impl(Iof, Rof, const T *, T *, int, int, __nv_bool, cudaStream_t) [with T=float, Iof=ctc_helper::negate<float, float>, Rof=ctc_helper::add<float, float>]"
(139): here
            instantiation of "ctcStatus_t reduce(Iof, Rof, const T *, T *, int, int, __nv_bool, cudaStream_t) [with T=float, Iof=ctc_helper::negate<float, float>, Rof=ctc_helper::add<float, float>]"
(149): here

/home/jonghu/ds2/warp-ctc/src/reduce.cu(44): error: no instance of overloaded function "__shfl_down_sync" matches the argument list
            argument types are: (float, int)
          detected during:
            instantiation of "T CTAReduce<NT, T, Rop>::reduce(int, T, CTAReduce<NT, T, Rop>::Storage &, int, Rop) [with NT=128, T=float, Rop=ctc_helper::maximum<float, float>]"
(76): here
            instantiation of "void reduce_rows<NT,Iop,Rop,T>(Iop, Rop, const T *, T *, int, int) [with NT=128, Iop=ctc_helper::identity<float, float>, Rop=ctc_helper::maximum<float, float>, T=float]"
(124): here
            instantiation of "void ReduceHelper::impl(Iof, Rof, const T *, T *, int, int, __nv_bool, cudaStream_t) [with T=float, Iof=ctc_helper::identity<float, float>, Rof=ctc_helper::maximum<float, float>]"
(139): here
            instantiation of "ctcStatus_t reduce(Iof, Rof, const T *, T *, int, int, __nv_bool, cudaStream_t) [with T=float, Iof=ctc_helper::identity<float, float>, Rof=ctc_helper::maximum<float, float>]"
(157): here

2 errors detected in the compilation of "/tmp/tmpxft_000063c5_00000000-13_reduce.compute_70.cpp1.ii".
CMake Error at warpctc_generated_reduce.cu.o.cmake:279 (message):
  Error generating file
  /home/jonghu/ds2/warp-ctc/build/CMakeFiles/warpctc.dir/src/./warpctc_generated_reduce.cu.o


CMakeFiles/warpctc.dir/build.make:337: recipe for target 'CMakeFiles/warpctc.dir/src/warpctc_generated_reduce.cu.o' failed
make[2]: *** [CMakeFiles/warpctc.dir/src/warpctc_generated_reduce.cu.o] Error 1
CMakeFiles/Makefile2:109: recipe for target 'CMakeFiles/warpctc.dir/all' failed
make[1]: *** [CMakeFiles/warpctc.dir/all] Error 2
Makefile:129: recipe for target 'all' failed
make: *** [all] Error 2

this error occurs.

My GPU is GeForce RTX 2080 Ti which failed with CUDA version 9.0, 9.1, and 10.1. Is there a way to solve this issue?

Sincerely, Jonghu.

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:12 (2 by maintainers)

github_iconTop GitHub Comments

62reactions
tq09mx5commented, Mar 8, 2019

src/reduce.cu Line 44 to : shuff = __shfl_down_sync(0xFFFFFFFF, x, offset);

include/contrib/moderngpu/include/device/intrinsics.cuh Line 115 to : var = __shfl_up_sync(0xFFFFFFFF, var, delta, width); Line 125 to : p.x = __shfl_up_sync(0xFFFFFFFF, p.x, delta, width); Line 126 to : p.y = __shfl_up_sync(0xFFFFFFFF, p.y, delta, width); Line 143 to : “shfl.up.sync.b32 r0|p, %1, %2, %3, %4;” Line 158 to : “shfl.up.sync.b32 r0|p, %1, %2, %3, %4;”

works fine with CUDA 10.1

4reactions
zhenglileicommented, Oct 15, 2019

src/reduce.cu Line 44 to : shuff = __shfl_down_sync(0xFFFFFFFF, x, offset);

include/contrib/moderngpu/include/device/intrinsics.cuh Line 115 to : var = __shfl_up_sync(0xFFFFFFFF, var, delta, width); Line 125 to : p.x = __shfl_up_sync(0xFFFFFFFF, p.x, delta, width); Line 126 to : p.y = __shfl_up_sync(0xFFFFFFFF, p.y, delta, width); Line 143 to : “shfl.up.sync.b32 r0|p, %1, %2, %3, %4;” Line 158 to : “shfl.up.sync.b32 r0|p, %1, %2, %3, %4;”

works fine with CUDA 10.1

This is the correct solution by Oct. 2019.

Read more comments on GitHub >

github_iconTop Results From Across the Web

identifier "__shfl_down" is undefined for cuda-7.5
Warp shuffle intrinsics are only defined (only supported on) compute capability (cc) 3.0 architectures and higher.
Read more >
Implementing block reduction operations (with no warp shfl)
Yes. the _sync versions of shuffle were not made available until CUDA 9.x ... error: identifier “__shfl_down” is undefined.
Read more >
Lecture 4: warp shuffles, and reduction / scan operations
Warp shuffles are a faster mechanism for moving data between threads in the same warp. There are 4 variants: shfl up sync copy...
Read more >
identifier "__shfl_down" is undefined for cuda-...anycodings
Warp shuffle intrinsics are only defined anycodings_gcc (only supported on) compute capability anycodings_gcc (cc) 3.0 architectures and ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found