Windows: Error in compiling CUB block reduction kernels
See original GitHub issueFrom the new Windows CI https://ci.preferred.jp/cupy.win.cuda100/65528/#L96449 being introduced in #4362. Looks like Windows is not happy with CUB_NS_PREFIX
(the line at which the first error was raised):
cupy/cub/cub/block/block_reduce.cuh(45): error: this declaration has no storage class or type specifier
Full log:
01:57:31.177306 STDOUT 1912] _ TestCubReduction_param_31_{backend='block', order_and_axis=('F', None), shape=(10, 20, 30, 40)}.test_cub_argmin _
01:57:31.177306 STDOUT 1912]
01:57:31.177306 STDOUT 1912] self = <<cupy.testing._bundle.TestCubReduction_param_31_{backend='block', order_and_axis=('F', None), shape=(10, 20, 30, 40)} testMethod=test_cub_argmin> parameter: {'backend': 'block', 'order_and_axis': ('F', None), 'shape': (10, 20, 30, 40)}>
01:57:31.177306 STDOUT 1912] xp = <module 'cupy' from 'C:\\Windows\\Temp\\flexci\\run-00135266\\work\\src\\cupy\\__init__.py'>
01:57:31.177306 STDOUT 1912] dtype = <class 'numpy.int8'>
01:57:31.177306 STDOUT 1912]
01:57:31.177480 STDOUT 1912] @testing.for_dtypes('bhilBHILefdFD')
01:57:31.177480 STDOUT 1912] @testing.numpy_cupy_allclose(rtol=1E-5, contiguous_check=False)
01:57:31.177480 STDOUT 1912] def test_cub_argmin(self, xp, dtype):
01:57:31.177480 STDOUT 1912] _skip_cuda90(dtype)
01:57:31.177480 STDOUT 1912] a = testing.shaped_random(self.shape, xp, dtype)
01:57:31.177480 STDOUT 1912] if self.order == 'C':
01:57:31.177480 STDOUT 1912] a = xp.ascontiguousarray(a)
01:57:31.177480 STDOUT 1912] else:
01:57:31.177480 STDOUT 1912] a = xp.asfortranarray(a)
01:57:31.177480 STDOUT 1912]
01:57:31.177480 STDOUT 1912] if xp is numpy:
01:57:31.177480 STDOUT 1912] return a.argmin(axis=self.axis)
01:57:31.177480 STDOUT 1912]
01:57:31.177480 STDOUT 1912] # xp is cupy, first ensure we really use CUB
01:57:31.177480 STDOUT 1912] ret = cupy.empty(()) # Cython checks return type, need to fool it
01:57:31.177480 STDOUT 1912] if self.backend == 'device':
01:57:31.177480 STDOUT 1912] func_name = 'cupy.core._routines_statistics.cub.'
01:57:31.177480 STDOUT 1912] func_name += 'device_reduce'
01:57:31.177480 STDOUT 1912] with testing.AssertFunctionIsCalled(func_name, return_value=ret):
01:57:31.177480 STDOUT 1912] a.argmin(axis=self.axis)
01:57:31.177480 STDOUT 1912] elif self.backend == 'block':
01:57:31.177480 STDOUT 1912] # this is the only function we can mock; the rest is cdef'd
01:57:31.177480 STDOUT 1912] func_name = 'cupy.core._cub_reduction.'
01:57:31.177480 STDOUT 1912] func_name += '_SimpleCubReductionKernel_get_cached_function'
01:57:31.177480 STDOUT 1912] func = _cub_reduction._SimpleCubReductionKernel_get_cached_function
01:57:31.177480 STDOUT 1912] if self.axis is not None and len(self.shape) > 1:
01:57:31.177480 STDOUT 1912] times_called = 1 # one pass
01:57:31.177480 STDOUT 1912] else:
01:57:31.177480 STDOUT 1912] times_called = 2 # two passes
01:57:31.177480 STDOUT 1912] with testing.AssertFunctionIsCalled(
01:57:31.178439 STDOUT 1912] func_name, wraps=func, times_called=times_called):
01:57:31.178439 STDOUT 1912] > a.argmin(axis=self.axis)
01:57:31.178439 STDOUT 1912]
01:57:31.178439 STDOUT 1912] tests\cupy_tests\sorting_tests\test_search.py:230:
01:57:31.178439 STDOUT 1912] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
01:57:31.178439 STDOUT 1912] cupy\core\core.pyx:799: in cupy.core.core.ndarray.argmin
01:57:31.178439 STDOUT 1912] cpdef ndarray argmin(self, axis=None, out=None, dtype=None,
01:57:31.178439 STDOUT 1912] cupy\core\core.pyx:816: in cupy.core.core.ndarray.argmin
01:57:31.178439 STDOUT 1912] return _statistics._ndarray_argmin(self, axis, out, dtype, keepdims)
01:57:31.178439 STDOUT 1912] cupy\core\_routines_statistics.pyx:114: in cupy.core._routines_statistics._ndarray_argmin
01:57:31.178439 STDOUT 1912] return _argmin(self, axis=axis, out=out, dtype=dtype, keepdims=keepdims)
01:57:31.178439 STDOUT 1912] cupy\core\_reduction.pyx:560: in cupy.core._reduction._SimpleReductionKernel.__call__
01:57:31.178439 STDOUT 1912] return self._call(
01:57:31.178439 STDOUT 1912] cupy\core\_reduction.pyx:346: in cupy.core._reduction._AbstractReductionKernel._call
01:57:31.178439 STDOUT 1912] cub_success = _cub_reduction._try_to_call_cub_reduction(
01:57:31.178439 STDOUT 1912] cupy\core\_cub_reduction.pyx:684: in cupy.core._cub_reduction._try_to_call_cub_reduction
01:57:31.178439 STDOUT 1912] _launch_cub(
01:57:31.178439 STDOUT 1912] cupy\core\_cub_reduction.pyx:520: in cupy.core._cub_reduction._launch_cub
01:57:31.178439 STDOUT 1912] _cub_two_pass_launch(
01:57:31.178439 STDOUT 1912] cupy\core\_cub_reduction.pyx:455: in cupy.core._cub_reduction._cub_two_pass_launch
01:57:31.178439 STDOUT 1912] func = _SimpleCubReductionKernel_get_cached_function(
01:57:31.178439 STDOUT 1912] C:\Development\Python\Python37\lib\unittest\mock.py:951: in __call__
01:57:31.178439 STDOUT 1912] return _mock_self._mock_call(*args, **kwargs)
01:57:31.179437 STDOUT 1912] C:\Development\Python\Python37\lib\unittest\mock.py:1026: in _mock_call
01:57:31.179437 STDOUT 1912] return self._mock_wraps(*args, **kwargs)
01:57:31.179437 STDOUT 1912] cupy\_util.pyx:53: in cupy._util.memoize.decorator.ret
01:57:31.179437 STDOUT 1912] result = f(*args, **kwargs)
01:57:31.179437 STDOUT 1912] cupy\core\_cub_reduction.pyx:227: in cupy.core._cub_reduction._SimpleCubReductionKernel_get_cached_function
01:57:31.179437 STDOUT 1912] return _create_cub_reduction_function(
01:57:31.179437 STDOUT 1912] cupy\core\_cub_reduction.pyx:212: in cupy.core._cub_reduction._create_cub_reduction_function
01:57:31.179437 STDOUT 1912] module = compile_with_cache(
01:57:31.179437 STDOUT 1912] cupy\core\core.pyx:1883: in cupy.core.core.compile_with_cache
01:57:31.179437 STDOUT 1912] return cuda.compile_with_cache(
01:57:31.179437 STDOUT 1912] cupy\cuda\compiler.py:396: in compile_with_cache
01:57:31.179437 STDOUT 1912] cache_in_memory, jitify)
01:57:31.179437 STDOUT 1912] cupy\cuda\compiler.py:474: in _compile_with_cache_cuda
01:57:31.179437 STDOUT 1912] log_stream, cache_in_memory, jitify)
01:57:31.179437 STDOUT 1912] cupy\cuda\compiler.py:230: in compile_using_nvrtc
01:57:31.179437 STDOUT 1912] name_expressions, log_stream, jitify)
01:57:31.179437 STDOUT 1912] cupy\cuda\compiler.py:206: in _compile
01:57:31.179437 STDOUT 1912] source, options, cu_path)
01:57:31.179437 STDOUT 1912] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
01:57:31.179437 STDOUT 1912]
01:57:31.179437 STDOUT 1912] source = 'C:\\Windows\\TEMP\\flexci\\run-00135266\\tmp\\tmp0td0_rkn\\6d6a38c4a67aa39c2facf257f11c1901_2.cubin.cu\n#include <cup...ge\n }\n\n if (_tid == 0) {\n type_mid_out& out0 = *(_out0 + blockIdx.x);\n POST_MAP(aggregate);\n }\n}\n'
01:57:31.180405 STDOUT 1912] options = ('-DFIRST_PASS=1', '--std=c++11', '-DCUPY_USE_JITIFY', '-IC:\\Windows\\Temp\\flexci\\run-00135266\\work\\src\\cupy\\co...core\\include\\cupy\\_cuda\\cuda-10.0', '-IC:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v10.0\\include', ...)
01:57:31.180405 STDOUT 1912] cu_path = 'C:\\Windows\\TEMP\\flexci\\run-00135266\\tmp\\tmp0td0_rkn\\6d6a38c4a67aa39c2facf257f11c1901_2.cubin.cu'
01:57:31.180405 STDOUT 1912]
01:57:31.180405 STDOUT 1912] def _jitify_prep(source, options, cu_path):
01:57:31.180405 STDOUT 1912] # TODO(leofang): refactor this?
01:57:31.180405 STDOUT 1912] global _jitify_header_source_map_populated
01:57:31.180405 STDOUT 1912] if not _jitify_header_source_map_populated:
01:57:31.180405 STDOUT 1912] from cupy.core import core
01:57:31.180405 STDOUT 1912] _jitify_header_source_map = core._get_header_source_map()
01:57:31.180405 STDOUT 1912] _jitify_header_source_map_populated = True
01:57:31.180405 STDOUT 1912] else:
01:57:31.180405 STDOUT 1912] # this is already cached at the C++ level, so don't pass in anything
01:57:31.180405 STDOUT 1912] _jitify_header_source_map = None
01:57:31.180405 STDOUT 1912]
01:57:31.180405 STDOUT 1912] # jitify requires the 1st line to be the program name
01:57:31.180405 STDOUT 1912] old_source = source
01:57:31.180405 STDOUT 1912] source = cu_path + '\n' + source
01:57:31.180405 STDOUT 1912]
01:57:31.180405 STDOUT 1912] # Upon failure, in addition to throw an error Jitify also prints the log
01:57:31.180405 STDOUT 1912] # to stdout. In principle we could intercept that by hijacking stdout's
01:57:31.180405 STDOUT 1912] # file descriptor (tested locally), but the problem is pytest also does
01:57:31.180405 STDOUT 1912] # the same thing internally, causing strange errors when running the tests.
01:57:31.180405 STDOUT 1912] # As a result, we currently maintain Jitify's default behavior for easy
01:57:31.180405 STDOUT 1912] # debugging, and wait for the upstream to address this issue
01:57:31.180405 STDOUT 1912] # (NVIDIA/jitify#79).
01:57:31.180405 STDOUT 1912]
01:57:31.180405 STDOUT 1912] try:
01:57:31.181367 STDOUT 1912] name, options, headers, include_names = jitify(
01:57:31.181367 STDOUT 1912] source, options, _jitify_header_source_map)
01:57:31.181367 STDOUT 1912] except Exception as e: # C++ could throw all kinds of errors
01:57:31.181367 STDOUT 1912] cex = CompileException(str(e), old_source, cu_path, options, 'jitify')
01:57:31.181367 STDOUT 1912] dump = _get_bool_env_variable(
01:57:31.181367 STDOUT 1912] 'CUPY_DUMP_CUDA_SOURCE_ON_ERROR', False)
01:57:31.181367 STDOUT 1912] if dump:
01:57:31.181367 STDOUT 1912] cex.dump(sys.stderr)
01:57:31.181367 STDOUT 1912] > raise JitifyException(str(cex))
01:57:31.181367 STDOUT 1912] E AssertionError: Only cupy raises error
01:57:31.181367 STDOUT 1912] E
01:57:31.181367 STDOUT 1912] E File "C:\Windows\Temp\flexci\run-00135266\work\src\cupy\testing\helper.py", line 40, in _call_func
01:57:31.181367 STDOUT 1912] E result = impl(self, *args, **kw)
01:57:31.181367 STDOUT 1912] E File "C:\Windows\Temp\flexci\run-00135266\work\src\tests\cupy_tests\sorting_tests\test_search.py", line 230, in test_cub_argmin
01:57:31.181367 STDOUT 1912] E a.argmin(axis=self.axis)
01:57:31.181367 STDOUT 1912] E File "cupy\core\core.pyx", line 799, in cupy.core.core.ndarray.argmin
01:57:31.181367 STDOUT 1912] E cpdef ndarray argmin(self, axis=None, out=None, dtype=None,
01:57:31.181367 STDOUT 1912] E File "cupy\core\core.pyx", line 816, in cupy.core.core.ndarray.argmin
01:57:31.181367 STDOUT 1912] E return _statistics._ndarray_argmin(self, axis, out, dtype, keepdims)
01:57:31.181367 STDOUT 1912] E File "cupy\core\_routines_statistics.pyx", line 114, in cupy.core._routines_statistics._ndarray_argmin
01:57:31.181367 STDOUT 1912] E return _argmin(self, axis=axis, out=out, dtype=dtype, keepdims=keepdims)
01:57:31.181367 STDOUT 1912] E File "cupy\core\_reduction.pyx", line 560, in cupy.core._reduction._SimpleReductionKernel.__call__
01:57:31.181367 STDOUT 1912] E return self._call(
01:57:31.182344 STDOUT 1912] E File "cupy\core\_reduction.pyx", line 346, in cupy.core._reduction._AbstractReductionKernel._call
01:57:31.182344 STDOUT 1912] E cub_success = _cub_reduction._try_to_call_cub_reduction(
01:57:31.182344 STDOUT 1912] E File "cupy\core\_cub_reduction.pyx", line 684, in cupy.core._cub_reduction._try_to_call_cub_reduction
01:57:31.182344 STDOUT 1912] E _launch_cub(
01:57:31.182344 STDOUT 1912] E File "cupy\core\_cub_reduction.pyx", line 520, in cupy.core._cub_reduction._launch_cub
01:57:31.182344 STDOUT 1912] E _cub_two_pass_launch(
01:57:31.182344 STDOUT 1912] E File "cupy\core\_cub_reduction.pyx", line 455, in cupy.core._cub_reduction._cub_two_pass_launch
01:57:31.182344 STDOUT 1912] E func = _SimpleCubReductionKernel_get_cached_function(
01:57:31.182344 STDOUT 1912] E File "C:\Development\Python\Python37\lib\unittest\mock.py", line 951, in __call__
01:57:31.182344 STDOUT 1912] E return _mock_self._mock_call(*args, **kwargs)
01:57:31.182344 STDOUT 1912] E File "C:\Development\Python\Python37\lib\unittest\mock.py", line 1026, in _mock_call
01:57:31.182344 STDOUT 1912] E return self._mock_wraps(*args, **kwargs)
01:57:31.182344 STDOUT 1912] E File "cupy\_util.pyx", line 53, in cupy._util.memoize.decorator.ret
01:57:31.182344 STDOUT 1912] E result = f(*args, **kwargs)
01:57:31.182344 STDOUT 1912] E File "cupy\core\_cub_reduction.pyx", line 227, in cupy.core._cub_reduction._SimpleCubReductionKernel_get_cached_function
01:57:31.182344 STDOUT 1912] E return _create_cub_reduction_function(
01:57:31.182344 STDOUT 1912] E File "cupy\core\_cub_reduction.pyx", line 212, in cupy.core._cub_reduction._create_cub_reduction_function
01:57:31.182344 STDOUT 1912] E module = compile_with_cache(
01:57:31.182344 STDOUT 1912] E File "cupy\core\core.pyx", line 1883, in cupy.core.core.compile_with_cache
01:57:31.182344 STDOUT 1912] E return cuda.compile_with_cache(
01:57:31.182344 STDOUT 1912] E File "C:\Windows\Temp\flexci\run-00135266\work\src\cupy\cuda\compiler.py", line 396, in compile_with_cache
01:57:31.182344 STDOUT 1912] E cache_in_memory, jitify)
01:57:31.183317 STDOUT 1912] E File "C:\Windows\Temp\flexci\run-00135266\work\src\cupy\cuda\compiler.py", line 474, in _compile_with_cache_cuda
01:57:31.183317 STDOUT 1912] E log_stream, cache_in_memory, jitify)
01:57:31.183317 STDOUT 1912] E File "C:\Windows\Temp\flexci\run-00135266\work\src\cupy\cuda\compiler.py", line 230, in compile_using_nvrtc
01:57:31.183317 STDOUT 1912] E name_expressions, log_stream, jitify)
01:57:31.183317 STDOUT 1912] E File "C:\Windows\Temp\flexci\run-00135266\work\src\cupy\cuda\compiler.py", line 206, in _compile
01:57:31.183317 STDOUT 1912] E source, options, cu_path)
01:57:31.183317 STDOUT 1912] E File "C:\Windows\Temp\flexci\run-00135266\work\src\cupy\cuda\compiler.py", line 188, in _jitify_prep
01:57:31.183317 STDOUT 1912] E raise JitifyException(str(cex))
01:57:31.183317 STDOUT 1912] E Runtime compilation failed
01:57:31.183317 STDOUT 1912]
01:57:31.183317 STDOUT 1912] cupy\cuda\compiler.py:188: AssertionError
01:57:31.183317 STDOUT 1912] ---------------------------- Captured stdout call -----------------------------
01:57:31.183317 STDOUT 1912] ---------------------------------------------------
01:57:31.183317 STDOUT 1912] --- JIT compile log for C:\Windows\TEMP\flexci\run-00135266\tmp\tmp0td0_rkn\6d6a38c4a67aa39c2facf257f11c1901_2.cubin.cu ---
01:57:31.183317 STDOUT 1912] ---------------------------------------------------
01:57:31.183317 STDOUT 1912] cupy/complex/complex.h(94): warning: __device__ annotation is ignored on a function("complex") that is explicitly defaulted on its first declaration
01:57:31.183317 STDOUT 1912]
01:57:31.183317 STDOUT 1912] cupy/complex/complex.h(101): warning: __device__ annotation is ignored on a function("complex") that is explicitly defaulted on its first declaration
01:57:31.183317 STDOUT 1912]
01:57:31.183317 STDOUT 1912] cupy/cub/cub/block/block_reduce.cuh(45): error: this declaration has no storage class or type specifier
01:57:31.183317 STDOUT 1912]
01:57:31.183317 STDOUT 1912] cupy/cub/cub/block/block_reduce.cuh(48): error: expected a ";"
01:57:31.183317 STDOUT 1912]
01:57:31.183317 STDOUT 1912] cupy/cub/cub/block/block_reduce.cuh(149): warning: parsing restarts here after previous syntax error
01:57:31.183317 STDOUT 1912]
01:57:31.183317 STDOUT 1912] cupy/cub/cub/block/block_reduce.cuh(217): error: identifier "BlockReduceAlgorithm" is undefined
01:57:31.183317 STDOUT 1912]
01:57:31.183317 STDOUT 1912] cupy/cub/cub/block/block_reduce.cuh(217): error: identifier "BLOCK_REDUCE_WARP_REDUCTIONS" is undefined
01:57:31.183317 STDOUT 1912]
01:57:31.183317 STDOUT 1912] cupy/cub/cub/block/block_reduce.cuh(220): error: identifier "CUB_PTX_ARCH" is undefined
01:57:31.183317 STDOUT 1912]
01:57:31.183317 STDOUT 1912] cupy/cub/cub/block/block_reduce.cuh(236): error: BlockReduceWarpReductions is not a template
01:57:31.183317 STDOUT 1912]
01:57:31.183317 STDOUT 1912] cupy/cub/cub/block/block_reduce.cuh(237): error: BlockReduceRakingCommutativeOnly is not a template
01:57:31.183317 STDOUT 1912]
01:57:31.183317 STDOUT 1912] cupy/cub/cub/block/block_reduce.cuh(238): error: BlockReduceRaking is not a template
01:57:31.183317 STDOUT 1912]
01:57:31.183317 STDOUT 1912] cupy/cub/cub/block/block_reduce.cuh(241): error: If is not a template
01:57:31.183317 STDOUT 1912]
01:57:31.183317 STDOUT 1912] cupy/cub/cub/block/block_reduce.cuh(241): error: identifier "BLOCK_REDUCE_WARP_REDUCTIONS" is undefined
01:57:31.183317 STDOUT 1912]
01:57:31.183317 STDOUT 1912] cupy/cub/cub/block/block_reduce.cuh(243): error: If is not a template
01:57:31.183317 STDOUT 1912]
01:57:31.183317 STDOUT 1912] cupy/cub/cub/block/block_reduce.cuh(243): error: identifier "BLOCK_REDUCE_RAKING_COMMUTATIVE_ONLY" is undefined
01:57:31.183317 STDOUT 1912]
01:57:31.183317 STDOUT 1912] cupy/cub/cub/block/block_reduce.cuh(243): error: type name is not allowed
01:57:31.183317 STDOUT 1912]
01:57:31.183317 STDOUT 1912] cupy/cub/cub/block/block_reduce.cuh(248): error: name followed by "::" must be a class or namespace name
01:57:31.183317 STDOUT 1912]
01:57:31.183317 STDOUT 1912] cupy/cub/cub/block/block_reduce.cuh(277): error: Uninitialized is not a template
01:57:31.183317 STDOUT 1912]
01:57:31.183317 STDOUT 1912] cupy/cub/cub/block/block_reduce.cuh(277): error: not a class or struct name
01:57:31.183317 STDOUT 1912]
01:57:31.183317 STDOUT 1912] cupy/cub/cub/block/block_reduce.cuh(605): error: expected a declaration
01:57:31.183317 STDOUT 1912]
01:57:31.183317 STDOUT 1912] iterator(25): warning: parsing restarts here after previous syntax error
01:57:31.183317 STDOUT 1912]
01:57:31.183317 STDOUT 1912] iterator(27): error: a template argument list is not allowed in a declaration of a primary template
01:57:31.183317 STDOUT 1912]
01:57:31.183317 STDOUT 1912] iterator(34): error: expected a declaration
01:57:31.183317 STDOUT 1912]
01:57:31.183317 STDOUT 1912] cupy/cub/cub/block/block_load.cuh(84): warning: this pragma must immediately precede a statement
01:57:31.183317 STDOUT 1912]
01:57:31.183317 STDOUT 1912] cupy/cub/cub/block/block_load.cuh(113): warning: this pragma must immediately precede a statement
01:57:31.183317 STDOUT 1912]
01:57:31.183317 STDOUT 1912] cupy/cub/cub/block/block_load.cuh(145): warning: this pragma must immediately precede a statement
01:57:31.183317 STDOUT 1912]
01:57:31.183317 STDOUT 1912] cupy/cub/cub/block/block_load.cuh(187): warning: parsing restarts here after previous syntax error
01:57:31.183317 STDOUT 1912]
01:57:31.183317 STDOUT 1912] cupy/cub/cub/block/block_load.cuh(190): error: identifier "Vector" is undefined
01:57:31.183317 STDOUT 1912]
01:57:31.183317 STDOUT 1912] cupy/cub/cub/block/block_load.cuh(190): error: identifier "Vector" is undefined
01:57:31.183317 STDOUT 1912]
01:57:31.183317 STDOUT 1912] cupy/cub/cub/block/block_load.cuh(190): error: identifier "block_ptr" is undefined
01:57:31.184305 STDOUT 1912]
01:57:31.184305 STDOUT 1912] cupy/cub/cub/block/block_load.cuh(190): error: identifier "linear_tid" is undefined
01:57:31.184305 STDOUT 1912]
01:57:31.184305 STDOUT 1912] cupy/cub/cub/block/block_load.cuh(190): error: identifier "VECTORS_PER_THREAD" is undefined
01:57:31.184305 STDOUT 1912]
01:57:31.184305 STDOUT 1912] cupy/cub/cub/block/block_load.cuh(194): error: expected a declaration
01:57:31.184305 STDOUT 1912]
01:57:31.184305 STDOUT 1912] cupy/cub/cub/block/block_load.cuh(200): warning: this pragma must immediately precede a statement
01:57:31.184305 STDOUT 1912]
01:57:31.184305 STDOUT 1912] cupy/cub/cub/block/block_load.cuh(265): warning: this pragma must immediately precede a statement
01:57:31.184305 STDOUT 1912]
01:57:31.184305 STDOUT 1912] cupy/cub/cub/block/block_load.cuh(296): warning: this pragma must immediately precede a statement
01:57:31.184305 STDOUT 1912]
01:57:31.184305 STDOUT 1912] cupy/cub/cub/block/block_load.cuh(330): warning: this pragma must immediately precede a statement
01:57:31.184305 STDOUT 1912]
01:57:31.184305 STDOUT 1912] cupy/cub/cub/block/block_load.cuh(374): warning: this pragma must immediately precede a statement
01:57:31.184305 STDOUT 1912]
01:57:31.184305 STDOUT 1912] cupy/cub/cub/block/block_load.cuh(411): warning: this pragma must immediately precede a statement
01:57:31.184305 STDOUT 1912]
01:57:31.184305 STDOUT 1912] cupy/cub/cub/block/block_load.cuh(447): warning: this pragma must immediately precede a statement
01:57:31.184305 STDOUT 1912]
01:57:31.184305 STDOUT 1912] cupy/cub/cub/block/block_load.cuh(563): warning: parsing restarts here after previous syntax error
01:57:31.184305 STDOUT 1912]
01:57:31.184305 STDOUT 1912] cupy/cub/cub/block/block_load.cuh(636): error: identifier "BlockLoadAlgorithm" is undefined
01:57:31.184305 STDOUT 1912]
01:57:31.184305 STDOUT 1912] cupy/cub/cub/block/block_load.cuh(636): error: identifier "BLOCK_LOAD_DIRECT" is undefined
01:57:31.184305 STDOUT 1912]
01:57:31.184305 STDOUT 1912] cupy/cub/cub/block/block_load.cuh(639): error: identifier "CUB_PTX_ARCH" is undefined
01:57:31.184305 STDOUT 1912]
01:57:31.184305 STDOUT 1912] cupy/cub/cub/block/block_load.cuh(661): error: identifier "BlockLoadAlgorithm" is undefined
01:57:31.184305 STDOUT 1912]
01:57:31.184305 STDOUT 1912] cupy/cub/cub/block/block_load.cuh(669): error: identifier "BLOCK_LOAD_DIRECT" is undefined
01:57:31.184305 STDOUT 1912]
01:57:31.184305 STDOUT 1912] cupy/cub/cub/block/block_load.cuh(672): error: identifier "NullType" is undefined
01:57:31.184305 STDOUT 1912]
01:57:31.184305 STDOUT 1912] cupy/cub/cub/block/block_load.cuh(722): error: identifier "BLOCK_LOAD_VECTORIZE" is undefined
01:57:31.184305 STDOUT 1912]
01:57:31.184305 STDOUT 1912] cupy/cub/cub/block/block_load.cuh(722): error: class template "BlockLoad<InputT, BLOCK_DIM_X, ITEMS_PER_THREAD, ALGORITHM, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH>::LoadInternal" has already been defined
01:57:31.184305 STDOUT 1912]
01:57:31.184305 STDOUT 1912] cupy/cub/cub/block/block_load.cuh(805): error: identifier "BLOCK_LOAD_TRANSPOSE" is undefined
01:57:31.184305 STDOUT 1912]
01:57:31.184305 STDOUT 1912] cupy/cub/cub/block/block_load.cuh(805): error: class template "BlockLoad<InputT, BLOCK_DIM_X, ITEMS_PER_THREAD, ALGORITHM, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH>::LoadInternal" has already been defined
01:57:31.184305 STDOUT 1912]
01:57:31.184305 STDOUT 1912] cupy/cub/cub/block/block_load.cuh(872): error: identifier "BLOCK_LOAD_WARP_TRANSPOSE" is undefined
01:57:31.184305 STDOUT 1912]
01:57:31.184305 STDOUT 1912] cupy/cub/cub/block/block_load.cuh(872): error: class template "BlockLoad<InputT, BLOCK_DIM_X, ITEMS_PER_THREAD, ALGORITHM, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH>::LoadInternal" has already been defined
01:57:31.184305 STDOUT 1912]
01:57:31.184305 STDOUT 1912] cupy/cub/cub/block/block_load.cuh(947): error: identifier "BLOCK_LOAD_WARP_TRANSPOSE_TIMESLICED" is undefined
01:57:31.184305 STDOUT 1912]
01:57:31.184305 STDOUT 1912] cupy/cub/cub/block/block_load.cuh(947): error: class template "BlockLoad<InputT, BLOCK_DIM_X, ITEMS_PER_THREAD, ALGORITHM, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH>::LoadInternal" has already been defined
01:57:31.184305 STDOUT 1912]
01:57:31.184305 STDOUT 1912] cupy/cub/cub/block/block_load.cuh(1055): error: Uninitialized is not a template
01:57:31.184305 STDOUT 1912]
01:57:31.184305 STDOUT 1912] cupy/cub/cub/block/block_load.cuh(1055): error: not a class or struct name
01:57:31.184305 STDOUT 1912]
01:57:31.184305 STDOUT 1912] cupy/cub/cub/block/block_load.cuh(1239): error: expected a declaration
01:57:31.184305 STDOUT 1912]
01:57:31.184305 STDOUT 1912] C:\Windows\TEMP\flexci\run-00135266\tmp\tmp0td0_rkn\6d6a38c4a67aa39c2facf257f11c1901_2.cubin.cu(8): warning: parsing restarts here after previous syntax error
01:57:31.184305 STDOUT 1912]
01:57:31.184305 STDOUT 1912] C:\Windows\TEMP\flexci\run-00135266\tmp\tmp0td0_rkn\6d6a38c4a67aa39c2facf257f11c1901_2.cubin.cu(99): error: identifier "type_in0_raw" is undefined
01:57:31.184305 STDOUT 1912]
01:57:31.184305 STDOUT 1912] C:\Windows\TEMP\flexci\run-00135266\tmp\tmp0td0_rkn\6d6a38c4a67aa39c2facf257f11c1901_2.cubin.cu(114): error: identifier "type_in0_raw" is undefined
01:57:31.184305 STDOUT 1912]
01:57:31.184305 STDOUT 1912] C:\Windows\TEMP\flexci\run-00135266\tmp\tmp0td0_rkn\6d6a38c4a67aa39c2facf257f11c1901_2.cubin.cu(139): error: name followed by "::" must be a class or namespace name
01:57:31.184305 STDOUT 1912]
01:57:31.184305 STDOUT 1912] C:\Windows\TEMP\flexci\run-00135266\tmp\tmp0td0_rkn\6d6a38c4a67aa39c2facf257f11c1901_2.cubin.cu(139): error: expected an identifier
01:57:31.184305 STDOUT 1912]
01:57:31.184305 STDOUT 1912] C:\Windows\TEMP\flexci\run-00135266\tmp\tmp0td0_rkn\6d6a38c4a67aa39c2facf257f11c1901_2.cubin.cu(142): error: name followed by "::" must be a class or namespace name
01:57:31.184305 STDOUT 1912]
01:57:31.184305 STDOUT 1912] C:\Windows\TEMP\flexci\run-00135266\tmp\tmp0td0_rkn\6d6a38c4a67aa39c2facf257f11c1901_2.cubin.cu(196): error: identifier "type_in0_raw" is undefined
01:57:31.184305 STDOUT 1912]
01:57:31.184305 STDOUT 1912] C:\Windows\TEMP\flexci\run-00135266\tmp\tmp0td0_rkn\6d6a38c4a67aa39c2facf257f11c1901_2.cubin.cu(204): error: identifier "BlockReduceT" is undefined
01:57:31.184305 STDOUT 1912]
01:57:31.184305 STDOUT 1912] 49 errors detected in the compilation of "C:\Windows\TEMP\flexci\run-00135266\tmp\tmp0td0_rkn\6d6a38c4a67aa39c2facf257f11c1901_2.cubin.cu".
01:57:31.184305 STDOUT 1912]
01:57:31.184305 STDOUT 1912] ---------------------------------------------------
01:57:31.184305 STDOUT 1912] dtype is b
Issue Analytics
- State:
- Created 3 years ago
- Comments:12 (7 by maintainers)
Top Results From Across the Web
CUB: Main Page - NVlabs
Cooperative warp-wide prefix scan, reduction, etc. Safely specialized for each underlying CUDA architecture. Block-wide "collective" primitives.
Read more >I write a lot of compute kernels in CUDA, and my litmus test is ...
Sequoia attempted to split the kernel in the algorithms at the different levels of the memory hierarchy, and the memory owned by the...
Read more >Faster Parallel Reductions on Kepler | NVIDIA Technical Blog
CUB is a library of common building blocks for parallel algorithms including reductions that is tuned for multiple CUDA GPU architectures and ...
Read more >How to perform reduction on a huge 2D matrix along the row ...
You should use proper cuda error checking. That's just a standard boiler-plate statement I make. · You should validate your results. I seriously ......
Read more >thrust/dependencies/cub/CHANGELOG.md · ma-xu/LIVE at main
in your include path, generating an error if it is not. ... When compiling CUB in C++11 mode, CUB now caches calls to...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Thanks @leofang – I heard back over the weekend and the compiler folks are looking for a reproduction without any extra headers at all, including those from CUB.
Does this still happen if the empty macro is defined in the input source directly?
At the moment, we don’t have any “official” support for NVRTC / Jitify in CUB. Some folks have been making parts of it work, but we lack testing coverage and would need to make a focused push to get any reliable level of support for it. This is something I’d like to do at some point, but don’t have the cycles for right now, unfortunately.
But on a related note – I’m working on addressing NVIDIA/cub#228 by adding more strict C++ conformance testing, specifically to work around some issues on NVRTC. I don’t see how that could fix this problem, but then again I’m not sure why this problem is happening in the first place 😃 So maybe it will help?