question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Windows: Error in compiling CUB block reduction kernels

See original GitHub issue

From the new Windows CI https://ci.preferred.jp/cupy.win.cuda100/65528/#L96449 being introduced in #4362. Looks like Windows is not happy with CUB_NS_PREFIX (the line at which the first error was raised):

cupy/cub/cub/block/block_reduce.cuh(45): error: this declaration has no storage class or type specifier

Full log:

01:57:31.177306 STDOUT 1912]	_ TestCubReduction_param_31_{backend='block', order_and_axis=('F', None), shape=(10, 20, 30, 40)}.test_cub_argmin _
01:57:31.177306 STDOUT 1912]		
01:57:31.177306 STDOUT 1912]	self = <<cupy.testing._bundle.TestCubReduction_param_31_{backend='block', order_and_axis=('F', None), shape=(10, 20, 30, 40)} testMethod=test_cub_argmin>  parameter: {'backend': 'block', 'order_and_axis': ('F', None), 'shape': (10, 20, 30, 40)}>	
01:57:31.177306 STDOUT 1912]	xp = <module 'cupy' from 'C:\\Windows\\Temp\\flexci\\run-00135266\\work\\src\\cupy\\__init__.py'>	
01:57:31.177306 STDOUT 1912]	dtype = <class 'numpy.int8'>	
01:57:31.177306 STDOUT 1912]		
01:57:31.177480 STDOUT 1912]	    @testing.for_dtypes('bhilBHILefdFD')	
01:57:31.177480 STDOUT 1912]	    @testing.numpy_cupy_allclose(rtol=1E-5, contiguous_check=False)	
01:57:31.177480 STDOUT 1912]	    def test_cub_argmin(self, xp, dtype):	
01:57:31.177480 STDOUT 1912]	        _skip_cuda90(dtype)	
01:57:31.177480 STDOUT 1912]	        a = testing.shaped_random(self.shape, xp, dtype)	
01:57:31.177480 STDOUT 1912]	        if self.order == 'C':	
01:57:31.177480 STDOUT 1912]	            a = xp.ascontiguousarray(a)	
01:57:31.177480 STDOUT 1912]	        else:	
01:57:31.177480 STDOUT 1912]	            a = xp.asfortranarray(a)	
01:57:31.177480 STDOUT 1912]		
01:57:31.177480 STDOUT 1912]	        if xp is numpy:	
01:57:31.177480 STDOUT 1912]	            return a.argmin(axis=self.axis)	
01:57:31.177480 STDOUT 1912]		
01:57:31.177480 STDOUT 1912]	        # xp is cupy, first ensure we really use CUB	
01:57:31.177480 STDOUT 1912]	        ret = cupy.empty(())  # Cython checks return type, need to fool it	
01:57:31.177480 STDOUT 1912]	        if self.backend == 'device':	
01:57:31.177480 STDOUT 1912]	            func_name = 'cupy.core._routines_statistics.cub.'	
01:57:31.177480 STDOUT 1912]	            func_name += 'device_reduce'	
01:57:31.177480 STDOUT 1912]	            with testing.AssertFunctionIsCalled(func_name, return_value=ret):	
01:57:31.177480 STDOUT 1912]	                a.argmin(axis=self.axis)	
01:57:31.177480 STDOUT 1912]	        elif self.backend == 'block':	
01:57:31.177480 STDOUT 1912]	            # this is the only function we can mock; the rest is cdef'd	
01:57:31.177480 STDOUT 1912]	            func_name = 'cupy.core._cub_reduction.'	
01:57:31.177480 STDOUT 1912]	            func_name += '_SimpleCubReductionKernel_get_cached_function'	
01:57:31.177480 STDOUT 1912]	            func = _cub_reduction._SimpleCubReductionKernel_get_cached_function	
01:57:31.177480 STDOUT 1912]	            if self.axis is not None and len(self.shape) > 1:	
01:57:31.177480 STDOUT 1912]	                times_called = 1  # one pass	
01:57:31.177480 STDOUT 1912]	            else:	
01:57:31.177480 STDOUT 1912]	                times_called = 2  # two passes	
01:57:31.177480 STDOUT 1912]	            with testing.AssertFunctionIsCalled(	
01:57:31.178439 STDOUT 1912]	                    func_name, wraps=func, times_called=times_called):	
01:57:31.178439 STDOUT 1912]	>               a.argmin(axis=self.axis)	
01:57:31.178439 STDOUT 1912]		
01:57:31.178439 STDOUT 1912]	tests\cupy_tests\sorting_tests\test_search.py:230:	
01:57:31.178439 STDOUT 1912]	_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _	
01:57:31.178439 STDOUT 1912]	cupy\core\core.pyx:799: in cupy.core.core.ndarray.argmin	
01:57:31.178439 STDOUT 1912]	    cpdef ndarray argmin(self, axis=None, out=None, dtype=None,	
01:57:31.178439 STDOUT 1912]	cupy\core\core.pyx:816: in cupy.core.core.ndarray.argmin	
01:57:31.178439 STDOUT 1912]	    return _statistics._ndarray_argmin(self, axis, out, dtype, keepdims)	
01:57:31.178439 STDOUT 1912]	cupy\core\_routines_statistics.pyx:114: in cupy.core._routines_statistics._ndarray_argmin	
01:57:31.178439 STDOUT 1912]	    return _argmin(self, axis=axis, out=out, dtype=dtype, keepdims=keepdims)	
01:57:31.178439 STDOUT 1912]	cupy\core\_reduction.pyx:560: in cupy.core._reduction._SimpleReductionKernel.__call__	
01:57:31.178439 STDOUT 1912]	    return self._call(	
01:57:31.178439 STDOUT 1912]	cupy\core\_reduction.pyx:346: in cupy.core._reduction._AbstractReductionKernel._call	
01:57:31.178439 STDOUT 1912]	    cub_success = _cub_reduction._try_to_call_cub_reduction(	
01:57:31.178439 STDOUT 1912]	cupy\core\_cub_reduction.pyx:684: in cupy.core._cub_reduction._try_to_call_cub_reduction	
01:57:31.178439 STDOUT 1912]	    _launch_cub(	
01:57:31.178439 STDOUT 1912]	cupy\core\_cub_reduction.pyx:520: in cupy.core._cub_reduction._launch_cub	
01:57:31.178439 STDOUT 1912]	    _cub_two_pass_launch(	
01:57:31.178439 STDOUT 1912]	cupy\core\_cub_reduction.pyx:455: in cupy.core._cub_reduction._cub_two_pass_launch	
01:57:31.178439 STDOUT 1912]	    func = _SimpleCubReductionKernel_get_cached_function(	
01:57:31.178439 STDOUT 1912]	C:\Development\Python\Python37\lib\unittest\mock.py:951: in __call__	
01:57:31.178439 STDOUT 1912]	    return _mock_self._mock_call(*args, **kwargs)	
01:57:31.179437 STDOUT 1912]	C:\Development\Python\Python37\lib\unittest\mock.py:1026: in _mock_call	
01:57:31.179437 STDOUT 1912]	    return self._mock_wraps(*args, **kwargs)	
01:57:31.179437 STDOUT 1912]	cupy\_util.pyx:53: in cupy._util.memoize.decorator.ret	
01:57:31.179437 STDOUT 1912]	    result = f(*args, **kwargs)	
01:57:31.179437 STDOUT 1912]	cupy\core\_cub_reduction.pyx:227: in cupy.core._cub_reduction._SimpleCubReductionKernel_get_cached_function	
01:57:31.179437 STDOUT 1912]	    return _create_cub_reduction_function(	
01:57:31.179437 STDOUT 1912]	cupy\core\_cub_reduction.pyx:212: in cupy.core._cub_reduction._create_cub_reduction_function	
01:57:31.179437 STDOUT 1912]	    module = compile_with_cache(	
01:57:31.179437 STDOUT 1912]	cupy\core\core.pyx:1883: in cupy.core.core.compile_with_cache	
01:57:31.179437 STDOUT 1912]	    return cuda.compile_with_cache(	
01:57:31.179437 STDOUT 1912]	cupy\cuda\compiler.py:396: in compile_with_cache	
01:57:31.179437 STDOUT 1912]	    cache_in_memory, jitify)	
01:57:31.179437 STDOUT 1912]	cupy\cuda\compiler.py:474: in _compile_with_cache_cuda	
01:57:31.179437 STDOUT 1912]	    log_stream, cache_in_memory, jitify)	
01:57:31.179437 STDOUT 1912]	cupy\cuda\compiler.py:230: in compile_using_nvrtc	
01:57:31.179437 STDOUT 1912]	    name_expressions, log_stream, jitify)	
01:57:31.179437 STDOUT 1912]	cupy\cuda\compiler.py:206: in _compile	
01:57:31.179437 STDOUT 1912]	    source, options, cu_path)	
01:57:31.179437 STDOUT 1912]	_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _	
01:57:31.179437 STDOUT 1912]		
01:57:31.179437 STDOUT 1912]	source = 'C:\\Windows\\TEMP\\flexci\\run-00135266\\tmp\\tmp0td0_rkn\\6d6a38c4a67aa39c2facf257f11c1901_2.cubin.cu\n#include <cup...ge\n  }\n\n  if (_tid == 0) {\n      type_mid_out& out0 = *(_out0 + blockIdx.x);\n      POST_MAP(aggregate);\n  }\n}\n'	
01:57:31.180405 STDOUT 1912]	options = ('-DFIRST_PASS=1', '--std=c++11', '-DCUPY_USE_JITIFY', '-IC:\\Windows\\Temp\\flexci\\run-00135266\\work\\src\\cupy\\co...core\\include\\cupy\\_cuda\\cuda-10.0', '-IC:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v10.0\\include', ...)	
01:57:31.180405 STDOUT 1912]	cu_path = 'C:\\Windows\\TEMP\\flexci\\run-00135266\\tmp\\tmp0td0_rkn\\6d6a38c4a67aa39c2facf257f11c1901_2.cubin.cu'	
01:57:31.180405 STDOUT 1912]		
01:57:31.180405 STDOUT 1912]	    def _jitify_prep(source, options, cu_path):	
01:57:31.180405 STDOUT 1912]	        # TODO(leofang): refactor this?	
01:57:31.180405 STDOUT 1912]	        global _jitify_header_source_map_populated	
01:57:31.180405 STDOUT 1912]	        if not _jitify_header_source_map_populated:	
01:57:31.180405 STDOUT 1912]	            from cupy.core import core	
01:57:31.180405 STDOUT 1912]	            _jitify_header_source_map = core._get_header_source_map()	
01:57:31.180405 STDOUT 1912]	            _jitify_header_source_map_populated = True	
01:57:31.180405 STDOUT 1912]	        else:	
01:57:31.180405 STDOUT 1912]	            # this is already cached at the C++ level, so don't pass in anything	
01:57:31.180405 STDOUT 1912]	            _jitify_header_source_map = None	
01:57:31.180405 STDOUT 1912]		
01:57:31.180405 STDOUT 1912]	        # jitify requires the 1st line to be the program name	
01:57:31.180405 STDOUT 1912]	        old_source = source	
01:57:31.180405 STDOUT 1912]	        source = cu_path + '\n' + source	
01:57:31.180405 STDOUT 1912]		
01:57:31.180405 STDOUT 1912]	        # Upon failure, in addition to throw an error Jitify also prints the log	
01:57:31.180405 STDOUT 1912]	        # to stdout. In principle we could intercept that by hijacking stdout's	
01:57:31.180405 STDOUT 1912]	        # file descriptor (tested locally), but the problem is pytest also does	
01:57:31.180405 STDOUT 1912]	        # the same thing internally, causing strange errors when running the tests.	
01:57:31.180405 STDOUT 1912]	        # As a result, we currently maintain Jitify's default behavior for easy	
01:57:31.180405 STDOUT 1912]	        # debugging, and wait for the upstream to address this issue	
01:57:31.180405 STDOUT 1912]	        # (NVIDIA/jitify#79).	
01:57:31.180405 STDOUT 1912]		
01:57:31.180405 STDOUT 1912]	        try:	
01:57:31.181367 STDOUT 1912]	            name, options, headers, include_names = jitify(	
01:57:31.181367 STDOUT 1912]	                source, options, _jitify_header_source_map)	
01:57:31.181367 STDOUT 1912]	        except Exception as e:  # C++ could throw all kinds of errors	
01:57:31.181367 STDOUT 1912]	            cex = CompileException(str(e), old_source, cu_path, options, 'jitify')	
01:57:31.181367 STDOUT 1912]	            dump = _get_bool_env_variable(	
01:57:31.181367 STDOUT 1912]	                'CUPY_DUMP_CUDA_SOURCE_ON_ERROR', False)	
01:57:31.181367 STDOUT 1912]	            if dump:	
01:57:31.181367 STDOUT 1912]	                cex.dump(sys.stderr)	
01:57:31.181367 STDOUT 1912]	>           raise JitifyException(str(cex))	
01:57:31.181367 STDOUT 1912]	E           AssertionError: Only cupy raises error	
01:57:31.181367 STDOUT 1912]	E	
01:57:31.181367 STDOUT 1912]	E             File "C:\Windows\Temp\flexci\run-00135266\work\src\cupy\testing\helper.py", line 40, in _call_func	
01:57:31.181367 STDOUT 1912]	E               result = impl(self, *args, **kw)	
01:57:31.181367 STDOUT 1912]	E             File "C:\Windows\Temp\flexci\run-00135266\work\src\tests\cupy_tests\sorting_tests\test_search.py", line 230, in test_cub_argmin	
01:57:31.181367 STDOUT 1912]	E               a.argmin(axis=self.axis)	
01:57:31.181367 STDOUT 1912]	E             File "cupy\core\core.pyx", line 799, in cupy.core.core.ndarray.argmin	
01:57:31.181367 STDOUT 1912]	E               cpdef ndarray argmin(self, axis=None, out=None, dtype=None,	
01:57:31.181367 STDOUT 1912]	E             File "cupy\core\core.pyx", line 816, in cupy.core.core.ndarray.argmin	
01:57:31.181367 STDOUT 1912]	E               return _statistics._ndarray_argmin(self, axis, out, dtype, keepdims)	
01:57:31.181367 STDOUT 1912]	E             File "cupy\core\_routines_statistics.pyx", line 114, in cupy.core._routines_statistics._ndarray_argmin	
01:57:31.181367 STDOUT 1912]	E               return _argmin(self, axis=axis, out=out, dtype=dtype, keepdims=keepdims)	
01:57:31.181367 STDOUT 1912]	E             File "cupy\core\_reduction.pyx", line 560, in cupy.core._reduction._SimpleReductionKernel.__call__	
01:57:31.181367 STDOUT 1912]	E               return self._call(	
01:57:31.182344 STDOUT 1912]	E             File "cupy\core\_reduction.pyx", line 346, in cupy.core._reduction._AbstractReductionKernel._call	
01:57:31.182344 STDOUT 1912]	E               cub_success = _cub_reduction._try_to_call_cub_reduction(	
01:57:31.182344 STDOUT 1912]	E             File "cupy\core\_cub_reduction.pyx", line 684, in cupy.core._cub_reduction._try_to_call_cub_reduction	
01:57:31.182344 STDOUT 1912]	E               _launch_cub(	
01:57:31.182344 STDOUT 1912]	E             File "cupy\core\_cub_reduction.pyx", line 520, in cupy.core._cub_reduction._launch_cub	
01:57:31.182344 STDOUT 1912]	E               _cub_two_pass_launch(	
01:57:31.182344 STDOUT 1912]	E             File "cupy\core\_cub_reduction.pyx", line 455, in cupy.core._cub_reduction._cub_two_pass_launch	
01:57:31.182344 STDOUT 1912]	E               func = _SimpleCubReductionKernel_get_cached_function(	
01:57:31.182344 STDOUT 1912]	E             File "C:\Development\Python\Python37\lib\unittest\mock.py", line 951, in __call__	
01:57:31.182344 STDOUT 1912]	E               return _mock_self._mock_call(*args, **kwargs)	
01:57:31.182344 STDOUT 1912]	E             File "C:\Development\Python\Python37\lib\unittest\mock.py", line 1026, in _mock_call	
01:57:31.182344 STDOUT 1912]	E               return self._mock_wraps(*args, **kwargs)	
01:57:31.182344 STDOUT 1912]	E             File "cupy\_util.pyx", line 53, in cupy._util.memoize.decorator.ret	
01:57:31.182344 STDOUT 1912]	E               result = f(*args, **kwargs)	
01:57:31.182344 STDOUT 1912]	E             File "cupy\core\_cub_reduction.pyx", line 227, in cupy.core._cub_reduction._SimpleCubReductionKernel_get_cached_function	
01:57:31.182344 STDOUT 1912]	E               return _create_cub_reduction_function(	
01:57:31.182344 STDOUT 1912]	E             File "cupy\core\_cub_reduction.pyx", line 212, in cupy.core._cub_reduction._create_cub_reduction_function	
01:57:31.182344 STDOUT 1912]	E               module = compile_with_cache(	
01:57:31.182344 STDOUT 1912]	E             File "cupy\core\core.pyx", line 1883, in cupy.core.core.compile_with_cache	
01:57:31.182344 STDOUT 1912]	E               return cuda.compile_with_cache(	
01:57:31.182344 STDOUT 1912]	E             File "C:\Windows\Temp\flexci\run-00135266\work\src\cupy\cuda\compiler.py", line 396, in compile_with_cache	
01:57:31.182344 STDOUT 1912]	E               cache_in_memory, jitify)	
01:57:31.183317 STDOUT 1912]	E             File "C:\Windows\Temp\flexci\run-00135266\work\src\cupy\cuda\compiler.py", line 474, in _compile_with_cache_cuda	
01:57:31.183317 STDOUT 1912]	E               log_stream, cache_in_memory, jitify)	
01:57:31.183317 STDOUT 1912]	E             File "C:\Windows\Temp\flexci\run-00135266\work\src\cupy\cuda\compiler.py", line 230, in compile_using_nvrtc	
01:57:31.183317 STDOUT 1912]	E               name_expressions, log_stream, jitify)	
01:57:31.183317 STDOUT 1912]	E             File "C:\Windows\Temp\flexci\run-00135266\work\src\cupy\cuda\compiler.py", line 206, in _compile	
01:57:31.183317 STDOUT 1912]	E               source, options, cu_path)	
01:57:31.183317 STDOUT 1912]	E             File "C:\Windows\Temp\flexci\run-00135266\work\src\cupy\cuda\compiler.py", line 188, in _jitify_prep	
01:57:31.183317 STDOUT 1912]	E               raise JitifyException(str(cex))	
01:57:31.183317 STDOUT 1912]	E           Runtime compilation failed	
01:57:31.183317 STDOUT 1912]		
01:57:31.183317 STDOUT 1912]	cupy\cuda\compiler.py:188: AssertionError	
01:57:31.183317 STDOUT 1912]	---------------------------- Captured stdout call -----------------------------	
01:57:31.183317 STDOUT 1912]	---------------------------------------------------	
01:57:31.183317 STDOUT 1912]	--- JIT compile log for C:\Windows\TEMP\flexci\run-00135266\tmp\tmp0td0_rkn\6d6a38c4a67aa39c2facf257f11c1901_2.cubin.cu ---	
01:57:31.183317 STDOUT 1912]	---------------------------------------------------	
01:57:31.183317 STDOUT 1912]	cupy/complex/complex.h(94): warning: __device__ annotation is ignored on a function("complex") that is explicitly defaulted on its first declaration	
01:57:31.183317 STDOUT 1912]		
01:57:31.183317 STDOUT 1912]	cupy/complex/complex.h(101): warning: __device__ annotation is ignored on a function("complex") that is explicitly defaulted on its first declaration	
01:57:31.183317 STDOUT 1912]		
01:57:31.183317 STDOUT 1912]	cupy/cub/cub/block/block_reduce.cuh(45): error: this declaration has no storage class or type specifier	
01:57:31.183317 STDOUT 1912]		
01:57:31.183317 STDOUT 1912]	cupy/cub/cub/block/block_reduce.cuh(48): error: expected a ";"	
01:57:31.183317 STDOUT 1912]		
01:57:31.183317 STDOUT 1912]	cupy/cub/cub/block/block_reduce.cuh(149): warning: parsing restarts here after previous syntax error	
01:57:31.183317 STDOUT 1912]		
01:57:31.183317 STDOUT 1912]	cupy/cub/cub/block/block_reduce.cuh(217): error: identifier "BlockReduceAlgorithm" is undefined	
01:57:31.183317 STDOUT 1912]		
01:57:31.183317 STDOUT 1912]	cupy/cub/cub/block/block_reduce.cuh(217): error: identifier "BLOCK_REDUCE_WARP_REDUCTIONS" is undefined	
01:57:31.183317 STDOUT 1912]		
01:57:31.183317 STDOUT 1912]	cupy/cub/cub/block/block_reduce.cuh(220): error: identifier "CUB_PTX_ARCH" is undefined	
01:57:31.183317 STDOUT 1912]		
01:57:31.183317 STDOUT 1912]	cupy/cub/cub/block/block_reduce.cuh(236): error: BlockReduceWarpReductions is not a template	
01:57:31.183317 STDOUT 1912]		
01:57:31.183317 STDOUT 1912]	cupy/cub/cub/block/block_reduce.cuh(237): error: BlockReduceRakingCommutativeOnly is not a template	
01:57:31.183317 STDOUT 1912]		
01:57:31.183317 STDOUT 1912]	cupy/cub/cub/block/block_reduce.cuh(238): error: BlockReduceRaking is not a template	
01:57:31.183317 STDOUT 1912]		
01:57:31.183317 STDOUT 1912]	cupy/cub/cub/block/block_reduce.cuh(241): error: If is not a template	
01:57:31.183317 STDOUT 1912]		
01:57:31.183317 STDOUT 1912]	cupy/cub/cub/block/block_reduce.cuh(241): error: identifier "BLOCK_REDUCE_WARP_REDUCTIONS" is undefined	
01:57:31.183317 STDOUT 1912]		
01:57:31.183317 STDOUT 1912]	cupy/cub/cub/block/block_reduce.cuh(243): error: If is not a template	
01:57:31.183317 STDOUT 1912]		
01:57:31.183317 STDOUT 1912]	cupy/cub/cub/block/block_reduce.cuh(243): error: identifier "BLOCK_REDUCE_RAKING_COMMUTATIVE_ONLY" is undefined	
01:57:31.183317 STDOUT 1912]		
01:57:31.183317 STDOUT 1912]	cupy/cub/cub/block/block_reduce.cuh(243): error: type name is not allowed	
01:57:31.183317 STDOUT 1912]		
01:57:31.183317 STDOUT 1912]	cupy/cub/cub/block/block_reduce.cuh(248): error: name followed by "::" must be a class or namespace name	
01:57:31.183317 STDOUT 1912]		
01:57:31.183317 STDOUT 1912]	cupy/cub/cub/block/block_reduce.cuh(277): error: Uninitialized is not a template	
01:57:31.183317 STDOUT 1912]		
01:57:31.183317 STDOUT 1912]	cupy/cub/cub/block/block_reduce.cuh(277): error: not a class or struct name	
01:57:31.183317 STDOUT 1912]		
01:57:31.183317 STDOUT 1912]	cupy/cub/cub/block/block_reduce.cuh(605): error: expected a declaration	
01:57:31.183317 STDOUT 1912]		
01:57:31.183317 STDOUT 1912]	iterator(25): warning: parsing restarts here after previous syntax error	
01:57:31.183317 STDOUT 1912]		
01:57:31.183317 STDOUT 1912]	iterator(27): error: a template argument list is not allowed in a declaration of a primary template	
01:57:31.183317 STDOUT 1912]		
01:57:31.183317 STDOUT 1912]	iterator(34): error: expected a declaration	
01:57:31.183317 STDOUT 1912]		
01:57:31.183317 STDOUT 1912]	cupy/cub/cub/block/block_load.cuh(84): warning: this pragma must immediately precede a statement	
01:57:31.183317 STDOUT 1912]		
01:57:31.183317 STDOUT 1912]	cupy/cub/cub/block/block_load.cuh(113): warning: this pragma must immediately precede a statement	
01:57:31.183317 STDOUT 1912]		
01:57:31.183317 STDOUT 1912]	cupy/cub/cub/block/block_load.cuh(145): warning: this pragma must immediately precede a statement	
01:57:31.183317 STDOUT 1912]		
01:57:31.183317 STDOUT 1912]	cupy/cub/cub/block/block_load.cuh(187): warning: parsing restarts here after previous syntax error	
01:57:31.183317 STDOUT 1912]		
01:57:31.183317 STDOUT 1912]	cupy/cub/cub/block/block_load.cuh(190): error: identifier "Vector" is undefined	
01:57:31.183317 STDOUT 1912]		
01:57:31.183317 STDOUT 1912]	cupy/cub/cub/block/block_load.cuh(190): error: identifier "Vector" is undefined	
01:57:31.183317 STDOUT 1912]		
01:57:31.183317 STDOUT 1912]	cupy/cub/cub/block/block_load.cuh(190): error: identifier "block_ptr" is undefined	
01:57:31.184305 STDOUT 1912]		
01:57:31.184305 STDOUT 1912]	cupy/cub/cub/block/block_load.cuh(190): error: identifier "linear_tid" is undefined	
01:57:31.184305 STDOUT 1912]		
01:57:31.184305 STDOUT 1912]	cupy/cub/cub/block/block_load.cuh(190): error: identifier "VECTORS_PER_THREAD" is undefined	
01:57:31.184305 STDOUT 1912]		
01:57:31.184305 STDOUT 1912]	cupy/cub/cub/block/block_load.cuh(194): error: expected a declaration	
01:57:31.184305 STDOUT 1912]		
01:57:31.184305 STDOUT 1912]	cupy/cub/cub/block/block_load.cuh(200): warning: this pragma must immediately precede a statement	
01:57:31.184305 STDOUT 1912]		
01:57:31.184305 STDOUT 1912]	cupy/cub/cub/block/block_load.cuh(265): warning: this pragma must immediately precede a statement	
01:57:31.184305 STDOUT 1912]		
01:57:31.184305 STDOUT 1912]	cupy/cub/cub/block/block_load.cuh(296): warning: this pragma must immediately precede a statement	
01:57:31.184305 STDOUT 1912]		
01:57:31.184305 STDOUT 1912]	cupy/cub/cub/block/block_load.cuh(330): warning: this pragma must immediately precede a statement	
01:57:31.184305 STDOUT 1912]		
01:57:31.184305 STDOUT 1912]	cupy/cub/cub/block/block_load.cuh(374): warning: this pragma must immediately precede a statement	
01:57:31.184305 STDOUT 1912]		
01:57:31.184305 STDOUT 1912]	cupy/cub/cub/block/block_load.cuh(411): warning: this pragma must immediately precede a statement	
01:57:31.184305 STDOUT 1912]		
01:57:31.184305 STDOUT 1912]	cupy/cub/cub/block/block_load.cuh(447): warning: this pragma must immediately precede a statement	
01:57:31.184305 STDOUT 1912]		
01:57:31.184305 STDOUT 1912]	cupy/cub/cub/block/block_load.cuh(563): warning: parsing restarts here after previous syntax error	
01:57:31.184305 STDOUT 1912]		
01:57:31.184305 STDOUT 1912]	cupy/cub/cub/block/block_load.cuh(636): error: identifier "BlockLoadAlgorithm" is undefined	
01:57:31.184305 STDOUT 1912]		
01:57:31.184305 STDOUT 1912]	cupy/cub/cub/block/block_load.cuh(636): error: identifier "BLOCK_LOAD_DIRECT" is undefined	
01:57:31.184305 STDOUT 1912]		
01:57:31.184305 STDOUT 1912]	cupy/cub/cub/block/block_load.cuh(639): error: identifier "CUB_PTX_ARCH" is undefined	
01:57:31.184305 STDOUT 1912]		
01:57:31.184305 STDOUT 1912]	cupy/cub/cub/block/block_load.cuh(661): error: identifier "BlockLoadAlgorithm" is undefined	
01:57:31.184305 STDOUT 1912]		
01:57:31.184305 STDOUT 1912]	cupy/cub/cub/block/block_load.cuh(669): error: identifier "BLOCK_LOAD_DIRECT" is undefined	
01:57:31.184305 STDOUT 1912]		
01:57:31.184305 STDOUT 1912]	cupy/cub/cub/block/block_load.cuh(672): error: identifier "NullType" is undefined	
01:57:31.184305 STDOUT 1912]		
01:57:31.184305 STDOUT 1912]	cupy/cub/cub/block/block_load.cuh(722): error: identifier "BLOCK_LOAD_VECTORIZE" is undefined	
01:57:31.184305 STDOUT 1912]		
01:57:31.184305 STDOUT 1912]	cupy/cub/cub/block/block_load.cuh(722): error: class template "BlockLoad<InputT, BLOCK_DIM_X, ITEMS_PER_THREAD, ALGORITHM, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH>::LoadInternal" has already been defined	
01:57:31.184305 STDOUT 1912]		
01:57:31.184305 STDOUT 1912]	cupy/cub/cub/block/block_load.cuh(805): error: identifier "BLOCK_LOAD_TRANSPOSE" is undefined	
01:57:31.184305 STDOUT 1912]		
01:57:31.184305 STDOUT 1912]	cupy/cub/cub/block/block_load.cuh(805): error: class template "BlockLoad<InputT, BLOCK_DIM_X, ITEMS_PER_THREAD, ALGORITHM, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH>::LoadInternal" has already been defined	
01:57:31.184305 STDOUT 1912]		
01:57:31.184305 STDOUT 1912]	cupy/cub/cub/block/block_load.cuh(872): error: identifier "BLOCK_LOAD_WARP_TRANSPOSE" is undefined	
01:57:31.184305 STDOUT 1912]		
01:57:31.184305 STDOUT 1912]	cupy/cub/cub/block/block_load.cuh(872): error: class template "BlockLoad<InputT, BLOCK_DIM_X, ITEMS_PER_THREAD, ALGORITHM, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH>::LoadInternal" has already been defined	
01:57:31.184305 STDOUT 1912]		
01:57:31.184305 STDOUT 1912]	cupy/cub/cub/block/block_load.cuh(947): error: identifier "BLOCK_LOAD_WARP_TRANSPOSE_TIMESLICED" is undefined	
01:57:31.184305 STDOUT 1912]		
01:57:31.184305 STDOUT 1912]	cupy/cub/cub/block/block_load.cuh(947): error: class template "BlockLoad<InputT, BLOCK_DIM_X, ITEMS_PER_THREAD, ALGORITHM, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH>::LoadInternal" has already been defined	
01:57:31.184305 STDOUT 1912]		
01:57:31.184305 STDOUT 1912]	cupy/cub/cub/block/block_load.cuh(1055): error: Uninitialized is not a template	
01:57:31.184305 STDOUT 1912]		
01:57:31.184305 STDOUT 1912]	cupy/cub/cub/block/block_load.cuh(1055): error: not a class or struct name	
01:57:31.184305 STDOUT 1912]		
01:57:31.184305 STDOUT 1912]	cupy/cub/cub/block/block_load.cuh(1239): error: expected a declaration	
01:57:31.184305 STDOUT 1912]		
01:57:31.184305 STDOUT 1912]	C:\Windows\TEMP\flexci\run-00135266\tmp\tmp0td0_rkn\6d6a38c4a67aa39c2facf257f11c1901_2.cubin.cu(8): warning: parsing restarts here after previous syntax error	
01:57:31.184305 STDOUT 1912]		
01:57:31.184305 STDOUT 1912]	C:\Windows\TEMP\flexci\run-00135266\tmp\tmp0td0_rkn\6d6a38c4a67aa39c2facf257f11c1901_2.cubin.cu(99): error: identifier "type_in0_raw" is undefined	
01:57:31.184305 STDOUT 1912]		
01:57:31.184305 STDOUT 1912]	C:\Windows\TEMP\flexci\run-00135266\tmp\tmp0td0_rkn\6d6a38c4a67aa39c2facf257f11c1901_2.cubin.cu(114): error: identifier "type_in0_raw" is undefined	
01:57:31.184305 STDOUT 1912]		
01:57:31.184305 STDOUT 1912]	C:\Windows\TEMP\flexci\run-00135266\tmp\tmp0td0_rkn\6d6a38c4a67aa39c2facf257f11c1901_2.cubin.cu(139): error: name followed by "::" must be a class or namespace name	
01:57:31.184305 STDOUT 1912]		
01:57:31.184305 STDOUT 1912]	C:\Windows\TEMP\flexci\run-00135266\tmp\tmp0td0_rkn\6d6a38c4a67aa39c2facf257f11c1901_2.cubin.cu(139): error: expected an identifier	
01:57:31.184305 STDOUT 1912]		
01:57:31.184305 STDOUT 1912]	C:\Windows\TEMP\flexci\run-00135266\tmp\tmp0td0_rkn\6d6a38c4a67aa39c2facf257f11c1901_2.cubin.cu(142): error: name followed by "::" must be a class or namespace name	
01:57:31.184305 STDOUT 1912]		
01:57:31.184305 STDOUT 1912]	C:\Windows\TEMP\flexci\run-00135266\tmp\tmp0td0_rkn\6d6a38c4a67aa39c2facf257f11c1901_2.cubin.cu(196): error: identifier "type_in0_raw" is undefined	
01:57:31.184305 STDOUT 1912]		
01:57:31.184305 STDOUT 1912]	C:\Windows\TEMP\flexci\run-00135266\tmp\tmp0td0_rkn\6d6a38c4a67aa39c2facf257f11c1901_2.cubin.cu(204): error: identifier "BlockReduceT" is undefined	
01:57:31.184305 STDOUT 1912]		
01:57:31.184305 STDOUT 1912]	49 errors detected in the compilation of "C:\Windows\TEMP\flexci\run-00135266\tmp\tmp0td0_rkn\6d6a38c4a67aa39c2facf257f11c1901_2.cubin.cu".	
01:57:31.184305 STDOUT 1912]		
01:57:31.184305 STDOUT 1912]	---------------------------------------------------	
01:57:31.184305 STDOUT 1912]	dtype is b

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:12 (7 by maintainers)

github_iconTop GitHub Comments

1reaction
allisonvacanticommented, Mar 8, 2021

Thanks @leofang – I heard back over the weekend and the compiler folks are looking for a reproduction without any extra headers at all, including those from CUB.

Does this still happen if the empty macro is defined in the input source directly?

1reaction
allisonvacanticommented, Jan 4, 2021

At the moment, we don’t have any “official” support for NVRTC / Jitify in CUB. Some folks have been making parts of it work, but we lack testing coverage and would need to make a focused push to get any reliable level of support for it. This is something I’d like to do at some point, but don’t have the cycles for right now, unfortunately.

But on a related note – I’m working on addressing NVIDIA/cub#228 by adding more strict C++ conformance testing, specifically to work around some issues on NVRTC. I don’t see how that could fix this problem, but then again I’m not sure why this problem is happening in the first place 😃 So maybe it will help?

Read more comments on GitHub >

github_iconTop Results From Across the Web

CUB: Main Page - NVlabs
Cooperative warp-wide prefix scan, reduction, etc. Safely specialized for each underlying CUDA architecture. Block-wide "collective" primitives.
Read more >
I write a lot of compute kernels in CUDA, and my litmus test is ...
Sequoia attempted to split the kernel in the algorithms at the different levels of the memory hierarchy, and the memory owned by the...
Read more >
Faster Parallel Reductions on Kepler | NVIDIA Technical Blog
CUB is a library of common building blocks for parallel algorithms including reductions that is tuned for multiple CUDA GPU architectures and ...
Read more >
How to perform reduction on a huge 2D matrix along the row ...
You should use proper cuda error checking. That's just a standard boiler-plate statement I make. · You should validate your results. I seriously ......
Read more >
thrust/dependencies/cub/CHANGELOG.md · ma-xu/LIVE at main
in your include path, generating an error if it is not. ... When compiling CUB in C++11 mode, CUB now caches calls to...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found