question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

WebGPU Performance Issues

See original GitHub issue

i just tried new tfjs-backend-webgpu 0.0.1-alpha.8 on tfjs 3.9.0
environment: chrome 96 canary on windows 11

first, great job on adding tons of new ops - from perspective of supported kernel ops, webgpu is becoming usable!

however, switch to WGSL is anything but useful so far - it comes as a major performance degredation

overall, webgpu has gotten slower than webgl
(and webgl itself has become significantly slower since tfjs 3.4.0 - this is discussed separately in several open issues)

not to mention that new work that has gone into webgl to make it manageable (enable uniforms) has no effect on webgpu

comparing warmup times
(fyi, my app by default uses 8 simple models running in parallel - total models size is actually tiny, below 30mb):

  • webgl (default settings)

    14 sec (double the value with uniforms enabled)

  • webgl with WEBGL_PACK_DEPTHWISECONV=false and WEBGL_USE_SHAPES_UNIFORMS=true

    7 sec (pretty good)

  • webgpu (default settings)

    25 sec (this is incredibily slow)

  • webgpu with WEBGPU_USE_GLSL=true

    15 sec (already slower than webgl)

  • wasm (no real warmup, included for refrerence only)

    2 sec

imo, when developing new backend, goal should be that its better than the previous one - not just that it passes unit tests if webgpu is not significantly improved, it will be a d.o.a. once released

cc @qjia7 and @xhcao due to work on webgpu
cc @pyu10055 as assignee on webgl performance degradation issue

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:14 (7 by maintainers)

github_iconTop GitHub Comments

1reaction
vladmandiccommented, Oct 6, 2021

Thank you for the notes, here are full details
I’ve created an automated test so its easy to check all scenarios…

Performance Testing

Environment: tfsj 3.9.0 and tfjs-backend-webgpu 0.0.1-alpha.8 Hardware: Notebook with Intel Coffee Lake i7-8750 and nVidia GTX 1050Ti

Notes

  • WebGPU GSLS code has been recently removed and cannot be compared with new WGSL
  • WebGL warmup has massive benefit of ~80% of browser shader caching
  • WebGPU warmup has little benefit of ~12% of browser shader caching
  • WebGPU is much faster on inference compared to WebGL
  • WebGPU is faster to warmup than WebGL in most cases
    Except when WebGL shaders are cached in browser cross-session and uniforms are enabled WebGL is 2x faster than WebGPU in that scenario showing necessity of caching support
  • WebGL performance benefits of uniforms is massive at 2x and I dont see any side-effects
    Will this be enabled by default in the future?
  • WebGL packing caused massive performance regression in TFJS in 3.4.0 (3.3.0 is last unaffected version)
    There are several open issues, but no progress?
  • Using tf.zeros as input is convinient, but does not produce realistic results
    Test using real input image to excercise real-world model execution path

Test Results

{ message: 'initial', warmup: 3134, inference: 2638, tfjs: '3.9.0', backend: 'wasm', tensors: 304, agent: 'Chrome/94', env: [] }
{ message: 'cached', warmup: 3119, inference: 2618, tfjs: '3.9.0', backend: 'wasm', tensors: 304, agent: 'Chrome/94', env: [] }
{ message: 'initial', warmup: 11836, inference: 61, tfjs: '3.9.0', backend: 'webgl', tensors: 304, agent: 'Chrome/94', env: [] }
{ message: 'cached', warmup: 2665, inference: 60, tfjs: '3.9.0', backend: 'webgl', tensors: 304, agent: 'Chrome/94', env: [] }
{ message: 'initial', warmup: 6128, inference: 54, tfjs: '3.9.0', backend: 'webgl', tensors: 304, agent: 'Chrome/94', env: [ { WEBGL_PACK_DEPTHWISECONV: false }, { WEBGL_USE_SHAPES_UNIFORMS: true } ] }
{ message: 'cached', warmup: 1202, inference: 67, tfjs: '3.9.0', backend: 'webgl', tensors: 304, agent: 'Chrome/94', env: [ { WEBGL_PACK_DEPTHWISECONV: false }, { WEBGL_USE_SHAPES_UNIFORMS: true } ] }
{ message: 'initial', warmup: 5018, inference: 23, tfjs: '3.9.0', backend: 'webgpu', tensors: 304, agent: 'Chrome/94', env: [] }
{ message: 'cached', warmup: 4454, inference: 22, tfjs: '3.9.0', backend: 'webgpu', tensors: 304, agent: 'Chrome/94', env: [] }

Issues

Using WebGPU backend is causing a lot of warnings although execution seems to work:

> warning Binding size bigger than maximum uniform buffer binding size: binding 0 given 146313216 bytes, maximum is 16384 bytes
    at ValidateBufferBinding (../../third_party/dawn/src/dawn_native/BindGroup.cpp:114)
    at ValidateBindGroupDescriptor (../../third_party/dawn/src/dawn_native/BindGroup.cpp:290)
    at CreateBindGroup (../../third_party/dawn/src/dawn_native/Device.cpp:1043)

Reproduction

Fully automated test in NodeJS using puppeteer and reproducible anytime
Code available at https://gist.github.com/vladmandic/fbdcaf7fe2e2add5c33b98936d4d5740

1reaction
gyagpcommented, Oct 4, 2021

@vladmandic Thanks for the good comments and data, as always! Chrome 94 was released on Sep 21, with WebGPU Origin Trial support. This means in addition to Chrome Canary, we may use Chrome Stable (still need option --enable-unsafe-webgpu) for WebGPU experiment now. Unfortunately, Chrome decided not to support GLSL anymore for WebGPU (changes happened in master so all the release channels would be impacted, including Canary and Stable), so WGSL is the only one that can be consumed now. We always align well with WebGPU development (My team also heavily contributes to WebGPU spec, CTS and Chrome impl) and started the TFJS GLSL to WGSL transition in June. After fixing many critical perf issues in Chrome (e.g., workgroup memory init perf regression) together with Google and working around perf issues in TFJS (e.g., hardware limits), we finished the transition after 3+ months of work. Internally we have daily track of performance against almost all the workloads defined in TFJS e2e benchmarks. Before switching to WGSL, we double-checked there was no performance regression regarding to warmup time and run time. For sure, due to resources, we could only cover very limited platforms (Actually only Intel Coffee Lake and Tiger Lake are under daily test), and very limited workloads. We’d like to hear more details from your side (e.g., hardware configuration) to understand the regression. We’ll investigate right after our holidays (We are off from Oct 1 to 7 for National Day Holidays). BTW,

  1. The uniform idea was already implemented in WebGPU backend. Google thought it great, so we’re bringing it to WebGL backend.
  2. Comparing with WebGL, compiled shaders couldn’t be cached in Chrome. We already raised this implementation issue to Chrome and it’s going to take a while for its implementation (not easy).

Thanks again for your valuable feedback, hopes to hear more details from your side about warmup regression (e.g., hardware configuration), and looks forward to more collaborations in the future!

Read more comments on GitHub >

github_iconTop Results From Across the Web

WebGPU computations performance in comparison to WebGL
The most critical issue developers face creating web apps is a performance comparing to native applications. A lot of "know how" is required...
Read more >
Why WebGPU backend is slower? - Questions - Babylon.js
I tried to play around with WebGPU renderer and found that on average it produces less FPS comparing to WebGL while having longer ......
Read more >
EVALUATION OF THE PERFORMANCE OF WEBGPU IN A ...
This thesis answers the question by implementing a peer-to-peer cluster and testing it with two prob- lems, Matrix multiplication and Mandelbrot ...
Read more >
I'm unsure how they're getting better performance. Webgl is ...
Afaik the biggest problem with gpu access from browsers is that the you instructions are not at all hardware secure. So everything has...
Read more >
WebGPU - W3C
GitHub Issues are preferred for discussion on this specification. Alternatively, you can send comments to the GPU for the Web Working Group's ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found