Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

NVRTCError: NVRTC_ERROR_COMPILATION with `cupy.cuda.compile_with_cache`

See original GitHub issue

🐛 Bug

I got the following error when I compile the model with cupy.cuda.compile_with_cache to jit.

NVRTCError: NVRTC_ERROR_COMPILATION (6)

During handling of the above exception, another exception occurred:

CompileException                          Traceback (most recent call last)

cupy/util.pyx in cupy.util.memoize.decorator.ret()

/usr/local/lib/python3.7/dist-packages/cupy/cuda/compiler.py in compile(self, options)
    440         except nvrtc.NVRTCError:
    441             log = nvrtc.getProgramLog(self.ptr)
--> 442             raise CompileException(log, self.src, self.name, options, 'nvrtc')
    443 
    444 

CompileException: /tmp/tmpan1ut480/3b7c153ce98d06488f1cbac8793f6dff_2.cubin.cu(16): error: identifier "tensor" is undefined

1 error detected in the compilation of "/tmp/tmpan1ut480/3b7c153ce98d06488f1cbac8793f6dff_2.cubin.cu".

To Reproduce

This is a colab to reproduce the error. https://colab.research.google.com/drive/1WDRCN6wPIAsl5tBFKfne0ABN49estM9P?usp=sharing

This is a minimum code.

@cupy.util.memoize(for_each_device=True)
def cupy_launch(strFunction, strKernel):
	return cupy.cuda.compile_with_cache(strKernel).get_function(strFunction)

kernel_Correlation_rearrange = " .... "

import torch
import torch.nn as nn

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()

    def forward(self, x_warp_after, x_cond):
        cupy_launch('kernel_Correlation_rearrange', cupy_kernel('kernel_Correlation_rearrange', {
          'intStride': 1,
          'input': x_warp_after,
          'output': x_cond
        }))(
        )
        return x_warp_after, x_cond

net = Net().cuda()
input1 = torch.randn([1, 256, 8, 6]).cuda()
input2 = torch.randn([1, 256, 8, 6]).cuda()
trace_model = torch.jit.trace(net, [input1, input2])

Expected behavior

I think the above error occurs when I use cupy.cuda.compile_with_cache.

Environment

CuPy version: cupy-cuda101==7.4.0
CUDA/cuDNN version: 11.0.221
PyTorch Version (e.g., 1.0): 1.8.1+cu101
OS (e.g., Linux): Ubuntu 18.04.5 LTS (x86_64)
How you installed PyTorch (conda, pip, source): pip
Build command you used (if compiling from source): no
Python version: 3.7 (64-bit runtime)
GPU models and configuration: GPU 0: Tesla T4
Any other relevant information:

Additional context

I opened an issue in the pytorch repository before, but I realized that the problem is not a pytorch issue, but a cupy issue.

Issue Analytics

State:
Created 2 years ago
Comments:7 (4 by maintainers)

Top GitHub Comments

2reactions

leofangcommented, Jun 1, 2021

This is neither PyTorch’s nor CuPy’s bug, but rather an issue in the way you did string processing to generate your kernel. Notice this error:

CompileException: /tmp/tmpan1ut480/3b7c153ce98d06488f1cbac8793f6dff_2.cubin.cu(16): error: identifier "tensor" is undefined

It is a common C/C++ error telling your the definition for an identifier tensor is missing. You should check how that identifier entered the code string. CuPy provides some env variables, and the one you need to help you debug the code generation is either CUPY_CACHE_SAVE_CUDA_SOURCE or CUPY_DUMP_CUDA_SOURCE_ON_ERROR.

By the way, it is best to not use cupy.cuda.compile_with_cache() because it is subject to change without notification (it’s considered internal API AFAIK). There is a public API cupy.RawModule for exactly this need (see tutorial).

1reaction

tommy19970714commented, Jun 2, 2021

Thanks to @kmaehashi 's advice, this problem has been solved.

The problem is that when converting to jit, the int type becomes a tensor type. I solved the problem by rewriting the following code.

def cupy_kernel(strFunction, objVariables):
	strKernel = globals()[strFunction].replace('{{intStride}}', str(objVariables['intStride']))

	while True:
		objMatch = re.search('(SIZE_)([0-4])(\()([^\)]*)(\))', strKernel)

		if objMatch is None:
			break
		# end

		intArg = int(objMatch.group(2))

		strTensor = objMatch.group(4)
		intSizes = objVariables[strTensor].size()
                
                #####
                # HERE: I was changed following lines.
		replaceStr = str(intSizes[intArg]).replace("tensor", "int")
		strKernel = strKernel.replace(objMatch.group(), replaceStr)
                # HERE
                #####
	# end

	while True:
		objMatch = re.search('(VALUE_)([0-4])(\()([^\)]+)(\))', strKernel)

		if objMatch is None:
			break
		# end

		intArgs = int(objMatch.group(2))
		strArgs = objMatch.group(4).split(',')

		strTensor = strArgs[0]
		intStrides = objVariables[strTensor].stride()
		strIndex = [ '((' + strArgs[intArg + 1].replace('{', '(').replace('}', ')').strip() + ')*' + str(intStrides[intArg]) + ')' for intArg in range(intArgs) ]

		strKernel = strKernel.replace(objMatch.group(0), strTensor + '[' + str.join('+', strIndex) + ']')
	# end

	return strKernel