`ndarray.take` is relatively slow
See original GitHub issueChainreX’s ndarray.take()
method is relatively slower than ones in NumPy and CuPy.
Related to #6081.
Benchmarks
- Data: mnist train images (
(6000, 784)
) - Epochs:
20
The benchmark script is here.
# numpy.ndarray.take(indices=<numpy.ndarray>) (batch_size=100)
0.5389023162424564
# cupy.ndarray.take(indices=<numpy.ndarray>) (batch_size=100)
0.864742174744606
# chainerx.ndarray.take(indices=<chainerx.ndarray>) (device=native:0, batch_size=100)
18.46649692207575
# chainerx.ndarray.take(indices=<chainerx.ndarray>) (device=cuda:0, batch_size=100)
2.645330995321274
Issue Analytics
- State:
- Created 5 years ago
- Comments:6 (4 by maintainers)
Top Results From Across the Web
Why is numpy.take slow on the result of numpy.split when ...
I have noticed that the slow-down depends on the array dimension (not the number of array dimensions, but the number of array elements)....
Read more >Performance Tips of NumPy ndarray - Shih-Chin
Avoid unnecessarily array copy, use views and in-place operations whenever possible. · Beware of memory access patterns and cache effects.
Read more >NumPy Array Processing With Cython: 5000x Faster
When the maxsize variable is set to 1 million, the Cython code runs in 0.096 seconds while Python takes 0.293 seconds (Cython is...
Read more >[question] how to reduce overhead of using jax.numpy ...
on the GPU backend, we're using a relatively slow memory allocation strategy (see Default XLA/GPU memory allocator is synchronous/slow. #417); ...
Read more >NumPy Optimization: Vectorization and Broadcasting
Compared to languages like C/C++ , Python loops are relatively slower. ... Taking advantage of this fact, NumPy delegates most of the operations...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Np. Reopen it when I hit the problem again.
ChainerX native backend is currently not optimized at all. CUDA backend shoud be (at least almost) as fast as cupy, though.