Possible to get / set gridDim and blockDim for custom Elementwise kernel?
See original GitHub issueThanks for the great library, it’s really a pleasure to work with!
I’ve written a custom Elementwise kernel that needs workspace memory for each thread. I’d like to allocate this beforehand and pass it through as raw
parameters, as I often exceed the in-kernel dynamic memory limitations. To do so, I need to know how many threads are going to be launched.
From what I can tell, there is currently no way to set the grid dimensions or the block dimensions for an Elementwise kernel. There was a merge to allow Elementwise set the block_size (https://github.com/cupy/cupy/pull/2914), which seemed fine, but when I double-checked in master (https://github.com/cupy/cupy/blob/master/cupy/core/_kernel.pyx#L701) the max_block_size still appears to be set at the constant of 128.
At the moment, I have worked out the maximum number of threads that are going to be launched by tracing the cupy code and am using the following:
# line 701 in cupy/core/_kernel.pyx
block_max_size = 128
# line 183 in cupy/cuda/function.pyx
gridx = min(0x7fffffff, (size + block_max_size - 1) // block_max_size)
thread_count = gridx * block_max_size
Obviously this is not great. Is there any way to get / set the number of threads? Have I totally missed something? Thanks!
Issue Analytics
- State:
- Created 3 years ago
- Reactions:1
- Comments:5 (4 by maintainers)
Top GitHub Comments
Hi @izaid I think you can give the master or v8.0.0b3 a try. The regression was fixed.
So, I know this can be done with
cupy.RawKernel
, but then I’d have to add all the elementwise looping capabilities, etc. And since this feature was available previously, I was hoping we could bring back the ability to set both the block and grid dimensions. Thanks again!