Dev Observability
Product
Pricing
Docs
Resources
Blog
Company
Debug Wordle

question-mark

Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

`ndarray.take` is relatively slow

See original GitHub issue

ChainreX’s ndarray.take() method is relatively slower than ones in NumPy and CuPy.

Related to #6081.

Benchmarks

Data: mnist train images ((6000, 784))
Epochs: 20

The benchmark script is here.

# numpy.ndarray.take(indices=<numpy.ndarray>) (batch_size=100)
0.5389023162424564

# cupy.ndarray.take(indices=<numpy.ndarray>) (batch_size=100)
0.864742174744606

# chainerx.ndarray.take(indices=<chainerx.ndarray>) (device=native:0, batch_size=100)
18.46649692207575

# chainerx.ndarray.take(indices=<chainerx.ndarray>) (device=cuda:0, batch_size=100)
2.645330995321274

Issue Analytics

State:
Created 5 years ago
Comments:6 (4 by maintainers)

Top GitHub Comments

1reaction

okapiescommented, Jun 7, 2019

Np. Reopen it when I hit the problem again.

1reaction

niboshicommented, Jan 25, 2019

ChainerX native backend is currently not optimized at all. CUDA backend shoud be (at least almost) as fast as cupy, though.

Read more comments on GitHub >

Top Results From Across the Web

Why is numpy.take slow on the result of numpy.split when ...

I have noticed that the slow-down depends on the array dimension (not the number of array dimensions, but the number of array elements)....

Performance Tips of NumPy ndarray - Shih-Chin

Avoid unnecessarily array copy, use views and in-place operations whenever possible. · Beware of memory access patterns and cache effects.

NumPy Array Processing With Cython: 5000x Faster

When the maxsize variable is set to 1 million, the Cython code runs in 0.096 seconds while Python takes 0.293 seconds (Cython is...

[question] how to reduce overhead of using jax.numpy ...

on the GPU backend, we're using a relatively slow memory allocation strategy (see Default XLA/GPU memory allocator is synchronous/slow. #417); ...

NumPy Optimization: Vectorization and Broadcasting

Compared to languages like C/C++ , Python loops are relatively slower. ... Taking advantage of this fact, NumPy delegates most of the operations...

Top Related Medium Post

No results found

Top Related StackOverflow Question

No results found

Troubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.

Top Related Reddit Thread

No results found

Top Related Hackernoon Post

No results found

Top Related Tweet

No results found

Top Related Dev.to Post

No results found

Top Related Hashnode Post

No results found

Variable and Parameter specification details

Rewrite function tests using `testing.FunctionTestCase`