Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

torch.split / torch.chunk cause "too complex strides" for Inductor + CUDA graph

See original GitHub issue

When a model uses output of torch.split / torch.chunk ops as input, or has those ops internally, torch._debug_has_internal_overlap can complain about the size + strides of those tensors being too hard to determine whether memory overlap exists, preventing enablement of CUDA graph:

Example 1: using output of `torch.split` as input to graph

all_embs = torch.randn(8, 101168)
emb_split = [98400, 340, 40, 328, 380, 1320, 360]
split_emb = torch.split(all_embs, emb_split, dim=1)

split_emb now has tensors of size and stride:

torch.Size([8, 98400])
(101168, 1)
torch.Size([8, 340])
(101168, 1)
torch.Size([8, 40])
(101168, 1)
torch.Size([8, 328])
(101168, 1)
torch.Size([8, 380])
(101168, 1)
torch.Size([8, 1320])
(101168, 1)
torch.Size([8, 360])
(101168, 1)

Passing split_emb as input to “Inductor + CUDA graph” backend will cause CUDA graph to be disabled.

Example 2: using `torch.chunk` within a graph

_2384 = torch.randn(8, 4096)
chunk_list = torch.chunk(_2384, 16, dim=1)

chunk_list now has 16 tensors of same size and stride:

torch.Size([8, 256])
(4096, 1)

If we do this torch.chunk op within the graph and apply “Inductor + CUDA graph” backend, CUDA graph will be disabled.

cc. @ngimel

Issue Analytics

State:
Created a year ago
Reactions:1
Comments:25 (18 by maintainers)

Top GitHub Comments

1reaction

ngimelcommented, Oct 14, 2022

Yes, looks good, you can assume there are no negative strides.

1reaction

janselcommented, Oct 14, 2022

We should stop using torch._debug_has_internal_overlap() and write our own check that is more selective.

Perhaps:

x.numel() != <last element ptr> - <first element ptr> + 1

Then switch our cudagraphs wrapper to just copy the underlying storage and use as_strided().

Top Results From Across the Web

torch.split — PyTorch 1.13 documentation

Each chunk is a view of the original tensor. If split_size_or_sections is an integer type, then tensor will be split into equally sized...

Untitled

20 cm grup tare, V268 part 2, Garrett hnatiuk, Sancet resultado exames, Aleksandra kosecka usta, Peter travers rolling stone james bond, Type book...

PyTorch internals - ezyang's blog

Strides are actually one of the distinctive features of PyTorch, so it's worth ... At the very most abstract level, when you call...

Untitled

Yugioh world championship 2011 codes, Tengo hambre frases, Torch light procession eastbourne, Hcis meditech, Second cousin twice removed chart, ...

arXiv:2103.15358v2 [cs.CV] 27 May 2021

The result indicates that the “local attention + global memory” structure in Vision Longformer is a desirable inductive bias for vision ...

Troubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.

Start Free

Top Related Reddit Thread

No results found

Top Related Tweet

No results found

Top Related Dev.to Post

No results found

torch.split / torch.chunk cause "too complex strides" for Inductor + CUDA graph

Example 1: using output of `torch.split` as input to graph

Example 2: using `torch.chunk` within a graph

Issue Analytics

Top GitHub Comments

Top Results From Across the Web

Top Related Medium Post

Top Related StackOverflow Question

Troubleshoot Live Code

Top Related Reddit Thread

Top Related Hackernoon Post

Top Related Tweet

Top Related Dev.to Post

Top Related Hashnode Post

pytorch core test failure: LLVM ERROR: Broken function found, compilation aborted!

Simple training script with loss cause an exception

torch.split / torch.chunk cause "too complex strides" for Inductor + CUDA graph

Example 1: using output of torch.split as input to graph

Example 2: using torch.chunk within a graph

Issue Analytics

Top GitHub Comments

Top Results From Across the Web

Top Related Medium Post

Top Related StackOverflow Question

Troubleshoot Live Code

Top Related Reddit Thread

Top Related Hackernoon Post

Top Related Tweet

Top Related Dev.to Post

Top Related Hashnode Post

pytorch core test failure: LLVM ERROR: Broken function found, compilation aborted!

Simple training script with loss cause an exception

Example 1: using output of `torch.split` as input to graph

Example 2: using `torch.chunk` within a graph