Use cuTENSOR in reduction routines
See original GitHub issueFor the performance, _AbstractReductionKernel
should use cuTENSOR by default if cupy.cuda.cutensor_enabled
is True
.
Issue Analytics
- State:
- Created 4 years ago
- Comments:11 (11 by maintainers)
Top Results From Across the Web
cuTENSOR: A High-Performance CUDA Library For Tensor ...
Main computational routines: Direct (i.e., transpose-free) tensor contractions. Tensor reductions (including partial reductions).
Read more >CUTENSOR
Tensor reductions. • Element-wise operations (e.g., ... Potential Use Cases: HPC & AI ... const cutensorContractionPlan_toid *plan,.
Read more >Overview — CuPy 9.3.0 documentation
CUB/cuTENSOR backends for reduction and other routines. Customizable memory allocator and memory pool. cuDNN utilities. Full coverage of NCCL APIs. CuPy uses on ......
Read more >cutensor - PyPI
Main computational routines: Direct (i.e., transpose-free) tensor contractions. Tensor reductions (including partial reductions).
Read more >High Performance Third-order Hierarchical Tucker Tensor ...
... use the properties of low rank tensors, significantly reduce the amount of storage. ... vectors matrix U. We use the routine gesvd(·)...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
After applying #2921 cub is faster in all cases
I compared the performance between CUB and cuTENSOR. benchmark script: https://gist.github.com/asi1024/ee62c50fd1254acb0e9431473862a014 Output on V100:
cuTENSOR is faster in batch reduction, and CUB is faster in full reduction.