Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Mismatch in packed depthwise conv 2d results on Mali GPU

See original GitHub issue

Please make sure that this is a bug. As per our GitHub Policy, we only address code/doc bugs, performance issues, feature requests and build/installation issues on GitHub. tag:bug_template

System information

Have I written custom code (as opposed to using a stock example script provided in TensorFlow.js): Yes, test case shared below
Mobile device : Pixel 6 Pro (Reproduces on any android device with Mali GPU that I tried)
TensorFlow.js installed from (npm or script link): 3.19.0
Browser version: Chrome 103.0.5060.53

Describe the current behavior Packed depthwise conv2d produces incorrect result on Mali GPUs when WEBGL_MAX_TEXTURE_SIZE is left at the default value (which is 4096 on most modern android devices). In one of our networks, we end up creating a 3672x1 sized texture for weights, which produces incorrect outputs (presumably some error in sampling the texture, but that is just a guess). Setting max texture size lower than 3672 fixes the issue.

I have attached sample code below to produce the issue (uses the same filter dims as the layer which caused inaccuracy in our original network). The code does the following:

We first run packed depthwise conv with 4096 as the texture size (the default value of max texture size on all browsers I tried, the value is hardcoded for consistent results)
Next, we re-run the same node with a max size of 2048.
Finally, we set backend to cpu to get the reference output.
With size 2048, the outputs match the reference, but with the default size, the outputs do not match.

Note: The mismatch occurs only on Mali GPUs with Android based on my tests. iOS, MacOS chrome, Androids with Adreno GPU all produce the correct result with default texture size (of 4096)

Standalone code to reproduce the issue

tf.ENV.set('WEBGL_PACK_DEPTHWISECONV', true)

let w = Array.from({length: 3 * 3 * 816}, () => Math.random())
let x = Array.from({length: 12 * 10 * 816}, () => Math.random())

let inputs = {
    filter: tf.tensor(w, [3, 3, 816, 1]),
    x: tf.tensor(x, [1, 12, 10, 816]),
    strides: 1,
    pad: [[0, 0], [1, 1], [1, 1], [0, 0]],
    dataFormat: "channelsLast",
    dilations: 1,
    activation: 'relu'
};

tf.setBackend('webgl')
tf.ENV.set('WEBGL_MAX_TEXTURE_SIZE', 4096)
let out_4096 = tf.fused.depthwiseConv2d(inputs);

tf.ENV.set('WEBGL_MAX_TEXTURE_SIZE', 2048)
inputs.x = tf.tensor(x, [1, 12, 10, 816])
inputs.filter = tf.tensor(w, [3, 3, 816, 1])
let out_2048 = tf.fused.depthwiseConv2d(inputs);

tf.setBackend('cpu')
inputs.x = tf.tensor(x, [1, 12, 10, 816])
inputs.filter = tf.tensor(w, [3, 3, 816, 1])
let out_reference = tf.fused.depthwiseConv2d(inputs);

const doTensorsDiffer = function(t0, t1) {
	return tf.any(tf.greater(tf.abs(tf.sub(t0, t1)), tf.scalar(1e-2))).dataSync()[0];
}

console.log("Default and 2048 differ? " + doTensorsDiffer(out_4096, out_2048));
console.log("Reference and 2048 differ? " + doTensorsDiffer(out_reference, out_2048));
console.log("Reference and 4096 differ? " + doTensorsDiffer(out_reference, out_4096));

Other info / logs Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached.

Issue Analytics

State:
Created a year ago
Comments:27

Top GitHub Comments

3reactions

RamiSJ12commented, Aug 22, 2022

tf.ENV.set(‘WEBGL_PACK_DEPTHWISECONV’, true)

let w = Array.from({length: 3 * 3 * 816}, () => Math.random()) let x = Array.from({length: 12 * 10 * 816}, () => Math.random())

let inputs = { filter: tf.tensor(w, [3, 3, 816, 1]), x: tf.tensor(x, [1, 12, 10, 816]), strides: 1, pad: [[0, 0], [1, 1], [1, 1], [0, 0]], dataFormat: “channelsLast”, dilations: 1, activation: ‘relu’ };

tf.setBackend(‘webgl’) tf.ENV.set(‘WEBGL_MAX_TEXTURE_SIZE’, 4096) let out_4096 = tf.fused.depthwiseConv2d(inputs);

tf.ENV.set(‘WEBGL_MAX_TEXTURE_SIZE’, 2048) inputs.x = tf.tensor(x, [1, 12, 10, 816]) inputs.filter = tf.tensor(w, [3, 3, 816, 1]) let out_2048 = tf.fused.depthwiseConv2d(inputs);

tf.setBackend(‘cpu’) inputs.x = tf.tensor(x, [1, 12, 10, 816]) inputs.filter = tf.tensor(w, [3, 3, 816, 1]) let out_reference = tf.fused.depthwiseConv2d(inputs);

const doTensorsDiffer = function(t0, t1) { return tf.any(tf.greater(tf.abs(t0.sub(t1)), tf.scalar(1e-2))).dataSync()[0]; }

console.log("Default and 2048 differ? " + doTensorsDiffer(out_4096, out_2048)); console.log("Reference and 2048 differ? " + doTensorsDiffer(out_reference, out_2048)); console.log("Reference and 4096 differ? " + doTensorsDiffer(out_reference, out_4096));

1reaction

shanumantesccommented, Sep 1, 2022

Thanks for the fix @Linchenn!

Top Results From Across the Web

Developer Guide :: NVIDIA Deep Learning TensorRT ...

This NVIDIA TensorRT Developer Guide demonstrates how to use the C++ and Python APIs for implementing the most common deep learning layers.

Release notes | Barracuda | 1.3.3-preview

Performance: improved inference time for 'Tiny Yolo v2' model on Mali-G71 from 600ms to 190ms. Compute: significantly improved precision of ...

Pruning and Quantization for Deep Neural Network ...

We compare current techniques, analyze their strengths and weaknesses, present compressed network accuracy results on a number of frameworks, and provide ...

AI-Driven Performance Modeling for AI Inference Workloads

This results in the second major difference: the need for sophisticated feature ... The low inference latency prediction performance on the Mali GPU...

RELEASE.md · MindSpore/mindspore

Pack 、ops. ... Support Java call on Mali or Adreno GPU. ... channels and out_channels , the 2D convolution layer is also a...