question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Possible to get / set gridDim and blockDim for custom Elementwise kernel?

See original GitHub issue

Thanks for the great library, it’s really a pleasure to work with!

I’ve written a custom Elementwise kernel that needs workspace memory for each thread. I’d like to allocate this beforehand and pass it through as raw parameters, as I often exceed the in-kernel dynamic memory limitations. To do so, I need to know how many threads are going to be launched.

From what I can tell, there is currently no way to set the grid dimensions or the block dimensions for an Elementwise kernel. There was a merge to allow Elementwise set the block_size (https://github.com/cupy/cupy/pull/2914), which seemed fine, but when I double-checked in master (https://github.com/cupy/cupy/blob/master/cupy/core/_kernel.pyx#L701) the max_block_size still appears to be set at the constant of 128.

At the moment, I have worked out the maximum number of threads that are going to be launched by tracing the cupy code and am using the following:

# line 701 in cupy/core/_kernel.pyx
block_max_size = 128 
# line 183 in cupy/cuda/function.pyx
gridx = min(0x7fffffff, (size + block_max_size - 1) // block_max_size) 
thread_count = gridx * block_max_size

Obviously this is not great. Is there any way to get / set the number of threads? Have I totally missed something? Thanks!

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Reactions:1
  • Comments:5 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
leofangcommented, Jun 6, 2020

Hi @izaid I think you can give the master or v8.0.0b3 a try. The regression was fixed.

0reactions
izaidcommented, Apr 7, 2020

So, I know this can be done with cupy.RawKernel, but then I’d have to add all the elementwise looping capabilities, etc. And since this feature was available previously, I was hoping we could bring back the ability to set both the block and grid dimensions. Thanks again!

Read more comments on GitHub >

github_iconTop Results From Across the Web

CUDA gridDim, blockDim are always user defined?
Yes, if you start your kernel in the dimension <<<5,7>>> it will have 5 blocks and 7 threads per block. Note that you...
Read more >
How to Choose the Grid Size and Block Size for a CUDA ...
So far we have determined the value of block_size. Then, let's discuss grid_size, which is the total number of threads. For a general...
Read more >
User-Defined Kernels — CuPy 11.4.0 documentation
An elementwise kernel can be defined by the ElementwiseKernel class. ... It is possible to use custom types (composite types such as structures...
Read more >
Kernel programming - CUDA.jl
Kernel programming. This section lists the package's public functionality that corresponds to special CUDA functions for use in device code.
Read more >
2- Custom CUDA Kernels in Python with Numba | Kaggle
Ufuncs are fantastically elegant, and for any scalar operation that ought to be performed element wise on data, ufuncs are likely the right...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found