Enabling shape uniforms gives incorrect output with MatMulPackedProgram
See original GitHub issueSystem information
- Have I written custom code (as opposed to using a stock example script provided in TensorFlow.js): Yes
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04): MacOS Monterey (12.4)
- Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device: Reproduces on both desktop and mobile
- TensorFlow.js installed from (npm or script link): built from source
- TensorFlow.js version (use command below): 3.16.0
- Browser version: Chrome 102.0.5005.61
- Tensorflow.js Converter Version: N/A
Describe the current behavior
- https://github.com/tensorflow/tfjs/pull/5502 fixed some issues with providing shapes as uniforms.
- However, we are still getting incorrect outputs for
MatMulPackedProgram
when settingWEBGL_USE_SHAPES_UNIFORMS
to true. - Upon investigation, we are getting the same shader key for 2 shaders with different shader code. The shader code differs in this if statement https://github.com/tensorflow/tfjs/blob/8c7fd026bb9940c926a94f70d7bee5ef1f51a1ef/tfjs-backend-webgl/src/shader_compiler.ts#L1058
- In one shader the if path is taken (ie
texShape != null && util.arraysEqual(shape, texShape) == true
) while it is not taken in the other shader. So far, we have only reproduced this forMatMulPackedProgram
. MatMulPackedProgram
takes a 3d input (batch, dim0, dim1) which is then “squeezed down” to 2d when batch == 1. In https://github.com/tensorflow/tfjs/blob/8c7fd026bb9940c926a94f70d7bee5ef1f51a1ef/tfjs-backend-webgl/src/gpgpu_math.ts#L430,x.shape
is 3-dimensional, whilexTexShape
is 2-dimensional. Hence,isLogicalShapTexShapeEqual
is always false forMatMulPackedProgram
even if the input shape and texShape match exactly after dropping the first (batch) dimension.- As a result, if we have 2
MatMulPackedProgram
s , where all the parameters for shader key generation match except forisLogicalShapTexShapeEqual
, the programs point to the same compiled shader instead of 2 separate shaders. Based on which shader is compiled first, the other produces incorrect outputs.
Standalone code to reproduce the issue
- I haven’t been able to reproduce this in an existing open-source model. It only reproduces in our internal model.
Possible fix
- In https://github.com/tensorflow/tfjs/blob/8c7fd026bb9940c926a94f70d7bee5ef1f51a1ef/tfjs-backend-webgl/src/gpgpu_math.ts#L430, changing
util.arraysEqual(x.shape, xTexShape);
toutil.arraysEqual(uniformShape, xTexShape);
fixes the issue. And if I’m understanding the code correctly, we should useuniformShape
in place ofx.shape
throughout thatif
block.
Issue Analytics
- State:
- Created a year ago
- Comments:12
Top Results From Across the Web
No results found
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@Linchenn you’re right, this doesn’t reproduce for me either 😕 I wonder if I was on some incorrect TFJS version, but will try to reproduce it locally once again and get back to you.
@Linchenn thanks to your code snippet above, I found the following matmul shapes from our network where the output doesn’t match with shape uniforms turned on and off: