`tf.keras.layers.experimental.EinsumDense` gives different results depending on batch size
See original GitHub issuePlease go to TF Forum for help and support:
https://discuss.tensorflow.org/tag/keras
If you open a GitHub issue, here is our policy:
It must be a bug, a feature request, or a significant problem with the documentation (for small docs fixes please send a PR instead). The form below must be filled out.
Here’s why we have that policy:.
Keras developers respond to issues. We want to focus on work that benefits the whole community, e.g., fixing bugs and adding features. Support only helps individuals. GitHub also notifies thousands of people when issues are filed. We want them to see you communicating an interesting problem, rather than being redirected to Stack Overflow.
System information.
- Have I written custom code (as opposed to using a stock example script provided in TensorFlow): yes
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Linux Ubuntu 16.04
- TensorFlow installed from (source or binary):
- TensorFlow version (use command below): 2.6.0 and 2.7.0
- Python version: 3.7
- Bazel version (if compiling from source):
- GPU model and memory:
- Exact command to reproduce:
You can collect some of this information using our environment capture script:
https://github.com/tensorflow/tensorflow/tree/master/tools/tf_env_collect.sh
You can obtain the TensorFlow version with: python -c “import tensorflow as tf; print(tf.version.GIT_VERSION, tf.version.VERSION)”
Describe the problem.
the output of EinsumDense
layer depends on number of examples in a batch.
The issue is rather important because attention implementation uses einsum dense layers.
Describe the current behavior
when using tf.keras.layers.experimental.EinsumDense
layer produce different result depending on the batch size.
Describe the expected behavior
the output of the layer should be independent of batch dimension
- Do you want to contribute a PR? (yes/no): no
- If yes, please read this page for instructions
- Briefly describe your candidate solution(if contributing):
Standalone code to reproduce the issue.
import tensorflow as tf
dense = tf.keras.layers.experimental.EinsumDense(
"bac,acd->bda",
output_shape=[64,4],
bias_axes="da"
)
x = tf.random.uniform((80,4,32))
print(dense(x)[0, :4].numpy())
print(dense(x[:1])[0, :4].numpy())
I’ve got different result in 7th-8th order:
[[ 0.12284548 0.2814498 -0.34291047 0.00810905]
[ 0.23635934 -0.08506497 0.12073331 0.33535597]
[-0.30136532 0.34854767 0.41540402 0.1328382 ]
[-0.04340287 -0.07197566 0.17427945 -0.29642397]]
[[ 0.12284552 0.2814498 -0.34291047 0.00810904]
[ 0.23635934 -0.08506497 0.12073331 0.33535597]
[-0.3013654 0.3485477 0.415404 0.1328382 ]
[-0.04340289 -0.07197568 0.17427945 -0.29642397]]
Having several of Einsum layers leads to accumulated error and wrong predictions
Similar behavior can be observed for the tf.einsum
:
import tensorflow as tf
x = tf.random.uniform((80,4,32))
w = tf.random.uniform((4,32,64))
print(tf.einsum("bac,acd->bda", x, w)[0, :4].numpy())
print()
print(tf.einsum("bac,acd->bda", x[:1], w)[0, :4].numpy())
I’ve got different result in 7th-8th order:
[[8.515028 6.533536 8.1234455 6.415581 ]
[8.387905 8.020177 8.997409 6.6566358]
[7.485701 7.6112795 8.145813 7.514901 ]
[8.022375 7.008651 7.9132357 5.918109 ]]
[[8.515028 6.5335355 8.123446 6.4155803]
[8.387905 8.020177 8.997409 6.6566353]
[7.485701 7.611279 8.145813 7.5149007]
[8.022375 7.0086513 7.9132347 5.9181094]]
Source code / logs.
Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached. Try to provide a reproducible test case that is the bare minimum necessary to generate the problem.
Issue Analytics
- State:
- Created 2 years ago
- Comments:5 (2 by maintainers)
@Ghostvv,
Sure, I am closing this issue here and we can followup in the tensorflow issue. Thanks!
@Ghostvv,
Please take a look at this comment. Thanks!