question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

`ndarray.take` is relatively slow

See original GitHub issue

ChainreX’s ndarray.take() method is relatively slower than ones in NumPy and CuPy.

Related to #6081.

Benchmarks

  • Data: mnist train images ((6000, 784))
  • Epochs: 20

The benchmark script is here.

# numpy.ndarray.take(indices=<numpy.ndarray>) (batch_size=100)
0.5389023162424564

# cupy.ndarray.take(indices=<numpy.ndarray>) (batch_size=100)
0.864742174744606

# chainerx.ndarray.take(indices=<chainerx.ndarray>) (device=native:0, batch_size=100)
18.46649692207575

# chainerx.ndarray.take(indices=<chainerx.ndarray>) (device=cuda:0, batch_size=100)
2.645330995321274

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:6 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
okapiescommented, Jun 7, 2019

Np. Reopen it when I hit the problem again.

1reaction
niboshicommented, Jan 25, 2019

ChainerX native backend is currently not optimized at all. CUDA backend shoud be (at least almost) as fast as cupy, though.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Why is numpy.take slow on the result of numpy.split when ...
I have noticed that the slow-down depends on the array dimension (not the number of array dimensions, but the number of array elements)....
Read more >
Performance Tips of NumPy ndarray - Shih-Chin
Avoid unnecessarily array copy, use views and in-place operations whenever possible. · Beware of memory access patterns and cache effects.
Read more >
NumPy Array Processing With Cython: 5000x Faster
When the maxsize variable is set to 1 million, the Cython code runs in 0.096 seconds while Python takes 0.293 seconds (Cython is...
Read more >
[question] how to reduce overhead of using jax.numpy ...
on the GPU backend, we're using a relatively slow memory allocation strategy (see Default XLA/GPU memory allocator is synchronous/slow. #417); ...
Read more >
NumPy Optimization: Vectorization and Broadcasting
Compared to languages like C/C++ , Python loops are relatively slower. ... Taking advantage of this fact, NumPy delegates most of the operations...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found