question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Enabling shape uniforms gives incorrect output with MatMulPackedProgram

See original GitHub issue

System information

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow.js): Yes
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): MacOS Monterey (12.4)
  • Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device: Reproduces on both desktop and mobile
  • TensorFlow.js installed from (npm or script link): built from source
  • TensorFlow.js version (use command below): 3.16.0
  • Browser version: Chrome 102.0.5005.61
  • Tensorflow.js Converter Version: N/A

Describe the current behavior

  • https://github.com/tensorflow/tfjs/pull/5502 fixed some issues with providing shapes as uniforms.
  • However, we are still getting incorrect outputs for MatMulPackedProgram when setting WEBGL_USE_SHAPES_UNIFORMS to true.
  • Upon investigation, we are getting the same shader key for 2 shaders with different shader code. The shader code differs in this if statement https://github.com/tensorflow/tfjs/blob/8c7fd026bb9940c926a94f70d7bee5ef1f51a1ef/tfjs-backend-webgl/src/shader_compiler.ts#L1058
  • In one shader the if path is taken (ie texShape != null && util.arraysEqual(shape, texShape) == true) while it is not taken in the other shader. So far, we have only reproduced this for MatMulPackedProgram.
  • MatMulPackedProgram takes a 3d input (batch, dim0, dim1) which is then “squeezed down” to 2d when batch == 1. In https://github.com/tensorflow/tfjs/blob/8c7fd026bb9940c926a94f70d7bee5ef1f51a1ef/tfjs-backend-webgl/src/gpgpu_math.ts#L430, x.shape is 3-dimensional, while xTexShape is 2-dimensional. Hence, isLogicalShapTexShapeEqual is always false for MatMulPackedProgram even if the input shape and texShape match exactly after dropping the first (batch) dimension.
  • As a result, if we have 2 MatMulPackedPrograms , where all the parameters for shader key generation match except for isLogicalShapTexShapeEqual , the programs point to the same compiled shader instead of 2 separate shaders. Based on which shader is compiled first, the other produces incorrect outputs.

Standalone code to reproduce the issue

  • I haven’t been able to reproduce this in an existing open-source model. It only reproduces in our internal model.

Possible fix

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:12

github_iconTop GitHub Comments

1reaction
shanumantesccommented, Aug 31, 2022

@Linchenn you’re right, this doesn’t reproduce for me either 😕 I wonder if I was on some incorrect TFJS version, but will try to reproduce it locally once again and get back to you.

1reaction
shanumantesccommented, Aug 1, 2022

@Linchenn thanks to your code snippet above, I found the following matmul shapes from our network where the output doesn’t match with shape uniforms turned on and off:

  • 102400x96 ; 96x4
  • 102400x96 ; 96x2
shape0 = [1, 102400, 96];
shape1 = [1, 96, 94]
const a = tf.randomNormal(shape0);
const b = tf.randomNormal(shape1);

tf.env().set('WEBGL_USE_SHAPES_UNIFORMS', false);
let c0 = tf.matMul(a, b);

tf.env().set('WEBGL_USE_SHAPES_UNIFORMS', true);
let c1 = tf.matMul(a, b);

if (tf.any(tf.greater(tf.abs(c0.sub(c1)), tf.scalar(1e-2))).dataSync()[0]) {
  console.log("Failed");
}
Read more comments on GitHub >

github_iconTop Results From Across the Web

No results found

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found