question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

torch.split / torch.chunk cause "too complex strides" for Inductor + CUDA graph

See original GitHub issue

When a model uses output of torch.split / torch.chunk ops as input, or has those ops internally, torch._debug_has_internal_overlap can complain about the size + strides of those tensors being too hard to determine whether memory overlap exists, preventing enablement of CUDA graph:

Example 1: using output of torch.split as input to graph

all_embs = torch.randn(8, 101168)
emb_split = [98400, 340, 40, 328, 380, 1320, 360]
split_emb = torch.split(all_embs, emb_split, dim=1)

split_emb now has tensors of size and stride:

torch.Size([8, 98400])
(101168, 1)
torch.Size([8, 340])
(101168, 1)
torch.Size([8, 40])
(101168, 1)
torch.Size([8, 328])
(101168, 1)
torch.Size([8, 380])
(101168, 1)
torch.Size([8, 1320])
(101168, 1)
torch.Size([8, 360])
(101168, 1)

Passing split_emb as input to “Inductor + CUDA graph” backend will cause CUDA graph to be disabled.

Example 2: using torch.chunk within a graph

_2384 = torch.randn(8, 4096)
chunk_list = torch.chunk(_2384, 16, dim=1)

chunk_list now has 16 tensors of same size and stride:

torch.Size([8, 256])
(4096, 1)

If we do this torch.chunk op within the graph and apply “Inductor + CUDA graph” backend, CUDA graph will be disabled.

cc. @ngimel

Issue Analytics

  • State:closed
  • Created a year ago
  • Reactions:1
  • Comments:25 (18 by maintainers)

github_iconTop GitHub Comments

1reaction
ngimelcommented, Oct 14, 2022

Yes, looks good, you can assume there are no negative strides.

1reaction
janselcommented, Oct 14, 2022

We should stop using torch._debug_has_internal_overlap() and write our own check that is more selective.

Perhaps:

x.numel() != <last element ptr> - <first element ptr> + 1

Then switch our cudagraphs wrapper to just copy the underlying storage and use as_strided().

Read more comments on GitHub >

github_iconTop Results From Across the Web

torch.split — PyTorch 1.13 documentation
Each chunk is a view of the original tensor. If split_size_or_sections is an integer type, then tensor will be split into equally sized...
Read more >
Untitled
20 cm grup tare, V268 part 2, Garrett hnatiuk, Sancet resultado exames, Aleksandra kosecka usta, Peter travers rolling stone james bond, Type book...
Read more >
PyTorch internals - ezyang's blog
Strides are actually one of the distinctive features of PyTorch, so it's worth ... At the very most abstract level, when you call...
Read more >
Untitled
Yugioh world championship 2011 codes, Tengo hambre frases, Torch light procession eastbourne, Hcis meditech, Second cousin twice removed chart, ...
Read more >
arXiv:2103.15358v2 [cs.CV] 27 May 2021
The result indicates that the “local attention + global memory” structure in Vision Longformer is a desirable inductive bias for vision ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found