model inference causes unresponsiveness and even system crash when using `webgl` backend
See original GitHub issuemodel inference causes unresponsiveness and even system crash when using webgl
backend
while running perfectly fine using tfjs-node
using webgl
in browser
gpu memory usage raises to >4GB although model is not that heavy at all
inference time is measured as below 40 ms, but actual wall time between frames is closer to 3,000 ms!
during that time browser is completely unresponsive (not just active tab)
and overall system responsiveness is reduced
and after just several frames it will result in either browser crash or webgl error logged in console
it even resulted in a system crash - BSOD with stop code VIDEO_SCHEDULER_INTERNAL_ERROR
!!!
yes, its a client-side code that can result in a system crash - doesn’t get much worse than that
its almost like a some webgl
call is causing something really bad to happen between browser and gpu drivers
using tfjs-node-gpu
in node
works without any problems
low memory usage and inference time below 30ms
even round-trip workflow (browser client talking via websockets to server that does processing) results in pretty good frame rate and no overall issues
model
model itself is a simple tfjs graph model with 8.8MB weights
it takes 720x720 image as input and produces 720x720 image as output
converted from tf saved model with original at https://systemerrorwang.github.io/White-box-Cartoonization/
reproduction
full reproduction using https://github.com/vladmandic/anime
- browser code that causes problems is https://github.com/vladmandic/anime/blob/main/src/anime-clientside.ts
- nodejs code entry point is https://github.com/vladmandic/anime/blob/main/src/node.ts (which works just fine)
environment
- tensorflow 3.19.0
- chome 103.0.1264.71
- windows 11 build 22621.436
- nvidia drivers 516.59
Issue Analytics
- State:
- Created a year ago
- Comments:17 (8 by maintainers)
Top GitHub Comments
@pyu10055 confirmed!
actually, its not 10x, its closer to 20x on my system plus no crashes
great job with #6639
setInterval would cause a crash due to exploding overlapping inference requests, but why would setTimeout cause any problems?