question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Performance will be improved by setting input strides=output strides for Clip in DirectMLX

See original GitHub issue

I am investigating for the performance of MobileNet V2 from TFLite models with “nhwc” layout and MobileNet V2 from ONNX models with “nchw” layout on the implementation with DirectML and DirectMLX API.

I find that nhwc MobileNetV2 model has lots of Clip after Conv2d, the Clip will cost much time on inference. I guess that the Clip will do memory copy and hasn’t be optimized in compilation stage.

I have a workaround to resolve this problem: set Clip’s input strides same as its’ output strides by changing this line to TensorDesc outputTensor = inputTensor in DirectMLX.h, the Clip will be optimized just like fused into Conv2d, and then the inference time will be significantly reduced to be as same as nchw MobileNetV2.

When building nhwc MobileNetV2 model, we need append Identity after each Conv2d to transpose output tensor from default nchw to nhwc, then transpose this output tensor from nhwc to nchw as the next Conv2d’s input tensor. In my opinion, I suppose that the Identity and Reinterpret can be optimized by DML in this model like: Conv0->Identity(nchw->nhwc)->Reinterpret strides(nhwc->nchw)->Conv1 just like transpose sinking in OpenVINO backend.

I guess that the Identity and Reinterpret sinking may be blocked when there is Clip like: Conv0->Identity(nchw->nhwc)->Clip->Reinterpret strides(nhwc->nchw)->Conv1 . I verified that if I remove Identity to run Conv0->Reinterpret strides(nchw->nhwc)->Clip(input strides = output strides)->Reinterpret strides(nhwc->nchw)->Conv1 , the inference time will be much lower than before.

So in conclusion, I suggest setting Clip’s input strides same as its’ output strides by changing this line to TensorDesc outputTensor = inputTensor in DirectMLX.h.

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:8 (8 by maintainers)

github_iconTop GitHub Comments

1reaction
mingmingtasdcommented, May 17, 2022

@mingmingtasd Is this closeable? 🤔

Yes, close it, thanks! @fdwr @adtsai

1reaction
mingmingtasdcommented, May 16, 2022

Thanks for your detailed explanation and suggestions, very helpful! @adtsai

Read more comments on GitHub >

github_iconTop Results From Across the Web

Using strides to express padding and memory layout
In this article. Two-dimensional (2D) arrays; Higher dimensions; Packed tensors; Broadcasting with strides; Padding with strides; DirectML ...
Read more >
Deep Learning Controlled Temporal Upsampling - NTNU Open
One way to improve performance is to render at a lower resolution, and then upsample the image to the output resolution.
Read more >
A Gentle Introduction to Padding and Stride for Convolutional ...
The convolutional layer in convolutional neural networks systematically applies filters to an input and creates output feature maps. Although ...
Read more >
Neural Networks for Chess - arXiv
the first input 0 is always set to 1, as mentioned above. So variables 1 and 2 ... The step-size is referred to...
Read more >
Fractional output dimensions of "sliding-windows ...
The fraction part comes from the stride operation. Without stride, the output size should be output_no_stride = input + 2*pad - filter +...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found