question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Indexing performance in backward

See original GitHub issue

Hi,

First of all, thanks for the great library!

I’ve been experimenting with it and noticed that indexing in nn.MessagePassing is using regular indexing (i.e., x[idx]). People have suggested in this issue that either torch.index_select() or torch.nn.functional.embedding might be faster.

Do you have any thoughts on this? For the use case in MessagePassing, I think we might benefit from using the less general index_select, for instance. I’d be happy to add a PR for that, if you’re interested.

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:5 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
rusty1scommented, Feb 17, 2019

Speed ups are really impressive 😃 Would you like to submit a PR? Thank you in advance.

1reaction
fadelcommented, Feb 16, 2019

I’ve written this (admittedly hacky) script to test x[idx], torch.index_select, and torch.nn.functional.embedding using your gcn.py script as a starting point. Now, the use case is not necessarily general (also the model/data are small), but I guess it’s enough to showcase the aforementioned issue.

I’ve run it using my laptop’s GPU and got the following results using PyTorch’s bottleneck tool.

  • Using regular tensor indexing
--------------------------------------------------------------------------------
  Environment Summary
--------------------------------------------------------------------------------
PyTorch 1.0.1 compiled w/ CUDA 10.0.130
Running with Python 3.7 and CUDA 10.0.130

--------------------------------------------------------------------------------
  autograd profiler output (CUDA mode)
--------------------------------------------------------------------------------
        top 15 events sorted by cpu_time_total

	Because the autograd profiler uses the CUDA event API,
	the CUDA time column reports approximately max(cuda_time, cpu_time).
	Please ignore this output if your code does not use CUDA.

-----------------  ---------------  ---------------  ---------------  ---------------  ---------------
Name                      CPU time        CUDA time            Calls        CPU total       CUDA total
-----------------  ---------------  ---------------  ---------------  ---------------  ---------------
to                     10274.693us      10280.640us                1      10274.693us      10280.640us
IndexBackward           5628.419us       4430.176us                1       5628.419us       4430.176us
index_put_              5586.677us       4400.757us                1       5586.677us       4400.757us
IndexBackward           5542.723us       4390.625us                1       5542.723us       4390.625us
IndexBackward           5522.821us       4311.707us                1       5522.821us       4311.707us
IndexBackward           5507.986us       4305.908us                1       5507.986us       4305.908us
IndexBackward           5502.442us       4358.154us                1       5502.442us       4358.154us
index_put_              5500.491us       4352.295us                1       5500.491us       4352.295us
IndexBackward           5494.718us       4302.124us                1       5494.718us       4302.124us
IndexBackward           5492.988us       4294.922us                1       5492.988us       4294.922us
index_put_              5481.013us       4282.776us                1       5481.013us       4282.776us
IndexBackward           5478.108us       4280.029us                1       5478.108us       4280.029us
IndexBackward           5477.163us       4288.818us                1       5477.163us       4288.818us
IndexBackward           5477.154us       4255.371us                1       5477.154us       4255.371us
index_put_              5469.539us       4275.391us                1       5469.539us       4275.391us
  • Using torch.index_select
--------------------------------------------------------------------------------
  Environment Summary
--------------------------------------------------------------------------------
PyTorch 1.0.1 compiled w/ CUDA 10.0.130
Running with Python 3.7 and CUDA 10.0.130

--------------------------------------------------------------------------------
  autograd profiler output (CUDA mode)
--------------------------------------------------------------------------------
        top 15 events sorted by cpu_time_total

	Because the autograd profiler uses the CUDA event API,
	the CUDA time column reports approximately max(cuda_time, cpu_time).
	Please ignore this output if your code does not use CUDA.

---------  ---------------  ---------------  ---------------  ---------------  ---------------
Name              CPU time        CUDA time            Calls        CPU total       CUDA total
---------  ---------------  ---------------  ---------------  ---------------  ---------------
to             10281.322us      10281.570us                1      10281.322us      10281.570us
index           5515.740us        386.963us                1       5515.740us        386.963us
index           5444.239us         92.529us                1       5444.239us         92.529us
index           5438.014us        103.027us                1       5438.014us        103.027us
index           5437.841us         94.482us                1       5437.841us         94.482us
index           5433.881us        103.271us                1       5433.881us        103.271us
index           5414.890us        102.295us                1       5414.890us        102.295us
index           5400.339us         93.018us                1       5400.339us         93.018us
index           5399.472us        101.074us                1       5399.472us        101.074us
index           5391.510us         92.041us                1       5391.510us         92.041us
index           5389.171us        103.516us                1       5389.171us        103.516us
index           5388.936us         93.506us                1       5388.936us         93.506us
index           5385.860us         91.797us                1       5385.860us         91.797us
index           5385.458us        104.370us                1       5385.458us        104.370us
index           5384.333us         93.750us                1       5384.333us         93.750us
  • Using torch.nn.functional.embedding
--------------------------------------------------------------------------------
  Environment Summary
--------------------------------------------------------------------------------
PyTorch 1.0.1 compiled w/ CUDA 10.0.130
Running with Python 3.7 and CUDA 10.0.130

--------------------------------------------------------------------------------
  autograd profiler output (CUDA mode)
--------------------------------------------------------------------------------
        top 15 events sorted by cpu_time_total

	Because the autograd profiler uses the CUDA event API,
	the CUDA time column reports approximately max(cuda_time, cpu_time).
	Please ignore this output if your code does not use CUDA.

---------  ---------------  ---------------  ---------------  ---------------  ---------------
Name              CPU time        CUDA time            Calls        CPU total       CUDA total
---------  ---------------  ---------------  ---------------  ---------------  ---------------
to             10323.890us      10324.223us                1      10323.890us      10324.223us
index           4073.636us         95.703us                1       4073.636us         95.703us
index           4049.680us         95.825us                1       4049.680us         95.825us
index           4049.172us        104.980us                1       4049.172us        104.980us
index           4043.177us        107.910us                1       4043.177us        107.910us
index           4038.790us        147.217us                1       4038.790us        147.217us
index           4036.591us         96.069us                1       4036.591us         96.069us
index           4035.337us        104.614us                1       4035.337us        104.614us
index           4031.769us        135.803us                1       4031.769us        135.803us
index           4023.240us        120.117us                1       4023.240us        120.117us
index           4021.195us        139.404us                1       4021.195us        139.404us
index           4020.351us         95.093us                1       4020.351us         95.093us
index           4020.332us        132.324us                1       4020.332us        132.324us
index           4019.825us        113.953us                1       4019.825us        113.953us
index           4015.754us        162.109us                1       4015.754us        162.109us
Read more comments on GitHub >

github_iconTop Results From Across the Web

Indexing performance in backward · Issue #95 - GitHub
This addresses issue #95. Regular tensor indexing has performance issues in the backward pass on GPU. Since torch. nn.
Read more >
SQL Server Index Backward Scan: Understanding, Tuning
In specific situations, SQL Server Engine finds that reading of the index data from the end to the beginning with the Backward scan...
Read more >
Speeding up index scan backwards query
When I run Explain Analyze, I see that it is using an index scan backwards on the slow query ( desc ).
Read more >
Read from the Right End of the Index: BACKWARD Scans
This is a common enough situation: you have a table with a clustered index on an integer value which increases with each row....
Read more >
Backwards Scans - Brent Ozar Unlimited®
Backwards scans occur when SQL decides that it's faster to start at the end of an index and scan towards the beginning. This...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found