Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

WebGPU Performance Issues

See original GitHub issue

i just tried new tfjs-backend-webgpu 0.0.1-alpha.8 on tfjs 3.9.0
environment: chrome 96 canary on windows 11

first, great job on adding tons of new ops - from perspective of supported kernel ops, webgpu is becoming usable!

however, switch to WGSL is anything but useful so far - it comes as a major performance degredation

overall, webgpu has gotten slower than webgl
(and webgl itself has become significantly slower since tfjs 3.4.0 - this is discussed separately in several open issues)

not to mention that new work that has gone into webgl to make it manageable (enable uniforms) has no effect on webgpu

comparing warmup times
(fyi, my app by default uses 8 simple models running in parallel - total models size is actually tiny, below 30mb):

webgl (default settings)

14 sec (double the value with uniforms enabled)
webgl with WEBGL_PACK_DEPTHWISECONV=false and WEBGL_USE_SHAPES_UNIFORMS=true

7 sec (pretty good)
webgpu (default settings)

25 sec (this is incredibily slow)
webgpu with WEBGPU_USE_GLSL=true

15 sec (already slower than webgl)
wasm (no real warmup, included for refrerence only)

2 sec

imo, when developing new backend, goal should be that its better than the previous one - not just that it passes unit tests if webgpu is not significantly improved, it will be a d.o.a. once released

cc @qjia7 and @xhcao due to work on webgpu
cc @pyu10055 as assignee on webgl performance degradation issue

Issue Analytics

State:
Created 2 years ago
Comments:14 (7 by maintainers)

Top GitHub Comments

1reaction

vladmandiccommented, Oct 6, 2021

Thank you for the notes, here are full details
I’ve created an automated test so its easy to check all scenarios…

Performance Testing

Environment: tfsj 3.9.0 and tfjs-backend-webgpu 0.0.1-alpha.8 Hardware: Notebook with Intel Coffee Lake i7-8750 and nVidia GTX 1050Ti

Notes

WebGPU GSLS code has been recently removed and cannot be compared with new WGSL
WebGL warmup has massive benefit of ~80% of browser shader caching
WebGPU warmup has little benefit of ~12% of browser shader caching
WebGPU is much faster on inference compared to WebGL
WebGPU is faster to warmup than WebGL in most cases
Except when WebGL shaders are cached in browser cross-session and uniforms are enabled WebGL is 2x faster than WebGPU in that scenario showing necessity of caching support
WebGL performance benefits of uniforms is massive at 2x and I dont see any side-effects
Will this be enabled by default in the future?
WebGL packing caused massive performance regression in TFJS in 3.4.0 (3.3.0 is last unaffected version)
There are several open issues, but no progress?
Using tf.zeros as input is convinient, but does not produce realistic results
Test using real input image to excercise real-world model execution path

Test Results

{ message: 'initial', warmup: 3134, inference: 2638, tfjs: '3.9.0', backend: 'wasm', tensors: 304, agent: 'Chrome/94', env: [] }
{ message: 'cached', warmup: 3119, inference: 2618, tfjs: '3.9.0', backend: 'wasm', tensors: 304, agent: 'Chrome/94', env: [] }
{ message: 'initial', warmup: 11836, inference: 61, tfjs: '3.9.0', backend: 'webgl', tensors: 304, agent: 'Chrome/94', env: [] }
{ message: 'cached', warmup: 2665, inference: 60, tfjs: '3.9.0', backend: 'webgl', tensors: 304, agent: 'Chrome/94', env: [] }
{ message: 'initial', warmup: 6128, inference: 54, tfjs: '3.9.0', backend: 'webgl', tensors: 304, agent: 'Chrome/94', env: [ { WEBGL_PACK_DEPTHWISECONV: false }, { WEBGL_USE_SHAPES_UNIFORMS: true } ] }
{ message: 'cached', warmup: 1202, inference: 67, tfjs: '3.9.0', backend: 'webgl', tensors: 304, agent: 'Chrome/94', env: [ { WEBGL_PACK_DEPTHWISECONV: false }, { WEBGL_USE_SHAPES_UNIFORMS: true } ] }
{ message: 'initial', warmup: 5018, inference: 23, tfjs: '3.9.0', backend: 'webgpu', tensors: 304, agent: 'Chrome/94', env: [] }
{ message: 'cached', warmup: 4454, inference: 22, tfjs: '3.9.0', backend: 'webgpu', tensors: 304, agent: 'Chrome/94', env: [] }

Issues

Using WebGPU backend is causing a lot of warnings although execution seems to work:

> warning Binding size bigger than maximum uniform buffer binding size: binding 0 given 146313216 bytes, maximum is 16384 bytes
    at ValidateBufferBinding (../../third_party/dawn/src/dawn_native/BindGroup.cpp:114)
    at ValidateBindGroupDescriptor (../../third_party/dawn/src/dawn_native/BindGroup.cpp:290)
    at CreateBindGroup (../../third_party/dawn/src/dawn_native/Device.cpp:1043)

Reproduction

Fully automated test in NodeJS using puppeteer and reproducible anytime
Code available at https://gist.github.com/vladmandic/fbdcaf7fe2e2add5c33b98936d4d5740

1reaction

gyagpcommented, Oct 4, 2021

@vladmandic Thanks for the good comments and data, as always! Chrome 94 was released on Sep 21, with WebGPU Origin Trial support. This means in addition to Chrome Canary, we may use Chrome Stable (still need option --enable-unsafe-webgpu) for WebGPU experiment now. Unfortunately, Chrome decided not to support GLSL anymore for WebGPU (changes happened in master so all the release channels would be impacted, including Canary and Stable), so WGSL is the only one that can be consumed now. We always align well with WebGPU development (My team also heavily contributes to WebGPU spec, CTS and Chrome impl) and started the TFJS GLSL to WGSL transition in June. After fixing many critical perf issues in Chrome (e.g., workgroup memory init perf regression) together with Google and working around perf issues in TFJS (e.g., hardware limits), we finished the transition after 3+ months of work. Internally we have daily track of performance against almost all the workloads defined in TFJS e2e benchmarks. Before switching to WGSL, we double-checked there was no performance regression regarding to warmup time and run time. For sure, due to resources, we could only cover very limited platforms (Actually only Intel Coffee Lake and Tiger Lake are under daily test), and very limited workloads. We’d like to hear more details from your side (e.g., hardware configuration) to understand the regression. We’ll investigate right after our holidays (We are off from Oct 1 to 7 for National Day Holidays). BTW,

The uniform idea was already implemented in WebGPU backend. Google thought it great, so we’re bringing it to WebGL backend.
Comparing with WebGL, compiled shaders couldn’t be cached in Chrome. We already raised this implementation issue to Chrome and it’s going to take a while for its implementation (not easy).

Thanks again for your valuable feedback, hopes to hear more details from your side about warmup regression (e.g., hardware configuration), and looks forward to more collaborations in the future!