question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Loading tensorflow first and then loading transformers errors

See original GitHub issue

🐛 Bug

Information

Model I am using: Bert

Language I am using the model on (English, Chinese …):

The problem arises when using:

  • the official example scripts: (give details below)
  • my own modified scripts: (give details below)

The tasks I am working on is:

  • an official GLUE/SQUaD task: (give the name)
  • my own task or dataset: (give details below)

To reproduce

Steps to reproduce the behavior:

Run:

import tensorflow as tf
from transformers import TFBertForSequenceClassification
model = TFBertForSequenceClassification.from_pretrained('/path/to/my/tf/model/', from_pt = True)

Will produce the following output (with error):

>>> import tensorflow as tf
2020-02-20 09:36:51.035083: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libnvinfer.so.6
2020-02-20 09:36:51.036337: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libnvinfer_plugin.so.6
>>> from transformers import TFBertForSequenceClassification
>>> model = TFBertForSequenceClassification.from_pretrained('/path/to/my/tf/model/', from_pt = True)
2020-02-20 09:36:52.226797: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2020-02-20 09:36:52.230595: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one
 NUMA node, so returning NUMA node zero
2020-02-20 09:36:52.231392: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 0 with properties: 
pciBusID: 0000:02:02.0 name: GRID RTX6000-24Q computeCapability: 7.5
coreClock: 1.77GHz coreCount: 72 deviceMemorySize: 23.88GiB deviceMemoryBandwidth: 625.94GiB/s
2020-02-20 09:36:52.231447: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-02-20 09:36:52.231475: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-02-20 09:36:52.233199: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2020-02-20 09:36:52.233465: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2020-02-20 09:36:52.234866: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2020-02-20 09:36:52.235660: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2020-02-20 09:36:52.235707: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-02-20 09:36:52.235845: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one
 NUMA node, so returning NUMA node zero
2020-02-20 09:36:52.236261: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-02-20 09:36:52.236765: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1697] Adding visible gpu devices: 0
2020-02-20 09:36:52.237022: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2020-02-20 09:36:52.241987: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3192500000 Hz
2020-02-20 09:36:52.242277: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0xeb8bae0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-02-20 09:36:52.242294: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2020-02-20 09:36:52.435669: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-02-20 09:36:52.436129: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0xec01900 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2020-02-20 09:36:52.436153: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): GRID RTX6000-24Q, Compute Capability 7.5
2020-02-20 09:36:52.436350: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-02-20 09:36:52.436672: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 0 with properties: 
pciBusID: 0000:02:02.0 name: GRID RTX6000-24Q computeCapability: 7.5
coreClock: 1.77GHz coreCount: 72 deviceMemorySize: 23.88GiB deviceMemoryBandwidth: 625.94GiB/s
2020-02-20 09:36:52.436706: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-02-20 09:36:52.436716: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-02-20 09:36:52.436744: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2020-02-20 09:36:52.436755: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2020-02-20 09:36:52.436765: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2020-02-20 09:36:52.436774: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2020-02-20 09:36:52.436781: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-02-20 09:36:52.436861: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-02-20 09:36:52.437204: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-02-20 09:36:52.437493: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1697] Adding visible gpu devices: 0
2020-02-20 09:36:52.437528: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-02-20 09:36:52.936429: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1096] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-02-20 09:36:52.936466: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102]      0 
2020-02-20 09:36:52.936474: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] 0:   N 
2020-02-20 09:36:52.936737: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-02-20 09:36:52.937283: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-02-20 09:36:52.937654: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1241] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 21423 MB memory) -> physical GPU (device: 0, name: GRID RTX6000-24Q, pci bus id: 0000:02:02.0, compute capability: 7.5)
2020-02-20 09:36:54.066446: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-02-20 09:36:54.066688: E tensorflow/stream_executor/cuda/cuda_blas.cc:238] failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED
2020-02-20 09:36:54.066725: E tensorflow/stream_executor/cuda/cuda_blas.cc:238] failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED
2020-02-20 09:36:54.066732: W tensorflow/stream_executor/stream.cc:2041] attempting to perform BLAS operation using StreamExecutor without BLAS support
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/my_lib/python3.7/site-packages/transformers/modeling_tf_utils.py", line 345, in from_pretrained
    return load_pytorch_checkpoint_in_tf2_model(model, resolved_archive_file, allow_missing_keys=True)
  File "/my_lib/python3.7/site-packages/transformers/modeling_tf_pytorch_utils.py", line 93, in load_pytorch_checkpoint_in_tf2_model
    tf_model, pt_state_dict, tf_inputs=tf_inputs, allow_missing_keys=allow_missing_keys
  File "/my_lib/python3.7/site-packages/transformers/modeling_tf_pytorch_utils.py", line 125, in load_pytorch_weights_in_tf2_model
    tf_model(tf_inputs, training=False)  # Make sure model is built
  File "/my_lib/python3.7/site-packages/tensorflow_core/python/keras/engine/base_layer.py", line 822, in __call__
    outputs = self.call(cast_inputs, *args, **kwargs)
  File "/my_lib/python3.7/site-packages/transformers/modeling_tf_bert.py", line 916, in call
    outputs = self.bert(inputs, **kwargs)
  File "/my_lib/python3.7/site-packages/tensorflow_core/python/keras/engine/base_layer.py", line 822, in __call__
    outputs = self.call(cast_inputs, *args, **kwargs)
  File "/my_lib/python3.7/site-packages/transformers/modeling_tf_bert.py", line 567, in call
    encoder_outputs = self.encoder([embedding_output, extended_attention_mask, head_mask], training=training)
  File "/my_lib/python3.7/site-packages/tensorflow_core/python/keras/engine/base_layer.py", line 822, in __call__
    outputs = self.call(cast_inputs, *args, **kwargs)
  File "/my_lib/python3.7/site-packages/transformers/modeling_tf_bert.py", line 376, in call
    layer_outputs = layer_module([hidden_states, attention_mask, head_mask[i]], training=training)
  File "/my_lib/python3.7/site-packages/tensorflow_core/python/keras/engine/base_layer.py", line 822, in __call__
    outputs = self.call(cast_inputs, *args, **kwargs)
  File "/my_lib/python3.7/site-packages/transformers/modeling_tf_bert.py", line 352, in call
    attention_outputs = self.attention([hidden_states, attention_mask, head_mask], training=training)
  File "/my_lib/python3.7/site-packages/tensorflow_core/python/keras/engine/base_layer.py", line 822, in __call__
    outputs = self.call(cast_inputs, *args, **kwargs)
  File "/my_lib/python3.7/site-packages/transformers/modeling_tf_bert.py", line 301, in call
    self_outputs = self.self_attention([input_tensor, attention_mask, head_mask], training=training)
  File "/my_lib/python3.7/site-packages/tensorflow_core/python/keras/engine/base_layer.py", line 822, in __call__
    outputs = self.call(cast_inputs, *args, **kwargs)
  File "/my_lib/python3.7/site-packages/transformers/modeling_tf_bert.py", line 230, in call
    mixed_query_layer = self.query(hidden_states)
  File "/my_lib/python3.7/site-packages/tensorflow_core/python/keras/engine/base_layer.py", line 822, in __call__
    outputs = self.call(cast_inputs, *args, **kwargs)
  File "/my_lib/python3.7/site-packages/tensorflow_core/python/keras/layers/core.py", line 1131, in call
    outputs = standard_ops.tensordot(inputs, self.kernel, [[rank - 1], [0]])
  File "/my_lib/python3.7/site-packages/tensorflow_core/python/ops/math_ops.py", line 4106, in tensordot
    ab_matmul = matmul(a_reshape, b_reshape)
  File "/my_lib/python3.7/site-packages/tensorflow_core/python/util/dispatch.py", line 180, in wrapper
    return target(*args, **kwargs)
  File "/my_lib/python3.7/site-packages/tensorflow_core/python/ops/math_ops.py", line 2798, in matmul
    a, b, transpose_a=transpose_a, transpose_b=transpose_b, name=name)
  File "/my_lib/python3.7/site-packages/tensorflow_core/python/ops/gen_math_ops.py", line 5616, in mat_mul
    _ops.raise_from_not_ok_status(e, name)
  File "/my_lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py", line 6606, in raise_from_not_ok_status
    six.raise_from(core._status_to_exception(e.code, message), None)
  File "<string>", line 3, in raise_from
tensorflow.python.framework.errors_impl.InternalError: Blas GEMM launch failed : a.shape=(15, 768), b.shape=(768, 768), m=15, n=768, k=768 [Op:MatMul] name: tf_bert_for_sequence_classification/bert/encoder/layer_._0/attention/self/query/Tensordot/MatMul/
>>> 

However, if I load transformers first and then load tensorflow, there is no problem… (Output from console):

>>> from transformers import TFBertForSequenceClassification
2020-02-20 09:40:54.413603: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libnvinfer.so.6
2020-02-20 09:40:54.414946: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libnvinfer_plugin.so.6
>>> import tensorflow as tf
>>> model = TFBertForSequenceClassification.from_pretrained('/path/to/my/tf/model/', from_pt = True)
2020-02-20 09:40:55.402943: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2020-02-20 09:40:55.407404: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one
 NUMA node, so returning NUMA node zero
2020-02-20 09:40:55.407771: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 0 with properties: 
pciBusID: 0000:02:02.0 name: GRID RTX6000-24Q computeCapability: 7.5
coreClock: 1.77GHz coreCount: 72 deviceMemorySize: 23.88GiB deviceMemoryBandwidth: 625.94GiB/s
2020-02-20 09:40:55.407828: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-02-20 09:40:55.407858: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-02-20 09:40:55.409288: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2020-02-20 09:40:55.409560: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2020-02-20 09:40:55.410954: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2020-02-20 09:40:55.411852: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2020-02-20 09:40:55.411906: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-02-20 09:40:55.412020: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one
 NUMA node, so returning NUMA node zero
2020-02-20 09:40:55.412437: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-02-20 09:40:55.412712: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1697] Adding visible gpu devices: 0
2020-02-20 09:40:55.412957: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2020-02-20 09:40:55.417720: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3192500000 Hz
2020-02-20 09:40:55.417908: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5be91f0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-02-20 09:40:55.417927: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2020-02-20 09:40:55.604909: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-02-20 09:40:55.605396: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5cc07b0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2020-02-20 09:40:55.605419: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): GRID RTX6000-24Q, Compute Capability 7.5
2020-02-20 09:40:55.605632: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-02-20 09:40:55.605947: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 0 with properties: 
pciBusID: 0000:02:02.0 name: GRID RTX6000-24Q computeCapability: 7.5
coreClock: 1.77GHz coreCount: 72 deviceMemorySize: 23.88GiB deviceMemoryBandwidth: 625.94GiB/s
2020-02-20 09:40:55.605984: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-02-20 09:40:55.606000: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-02-20 09:40:55.606032: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2020-02-20 09:40:55.606045: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2020-02-20 09:40:55.606058: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2020-02-20 09:40:55.606070: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2020-02-20 09:40:55.606080: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-02-20 09:40:55.606159: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-02-20 09:40:55.606493: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-02-20 09:40:55.606763: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1697] Adding visible gpu devices: 0
2020-02-20 09:41:00.803464: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1096] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-02-20 09:41:00.803503: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102]      0 
2020-02-20 09:41:00.803509: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] 0:   N 
2020-02-20 09:41:00.803804: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-02-20 09:41:00.804291: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-02-20 09:41:00.804643: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1241] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 20754 MB memory) -> physical GPU (device: 0, name: GRID RTX6000-24Q, pci bus id: 0000:02:02.0, compute capability: 7.5)
>>> 

Expected behavior

Environment info

  • transformers version: 2.5.0
  • Platform: Linux
  • Python version: 3.7.5
  • PyTorch version (GPU?):
  • Tensorflow version (GPU?): 2.1.0
  • Using GPU in script?:Yes
  • Using distributed or parallel set-up in script?: No

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Reactions:1
  • Comments:5

github_iconTop GitHub Comments

1reaction
BramVanroycommented, Feb 20, 2020

I don’t use Tensorflow daily (I use PyTorch), but my far-fetched guess would be that because of the loading order, in one case two TF sessions are created which both do Created TensorFlow device (you can see that in the trace). That might, then, cause that device to not be able to distinguish the sessions or run out of memory to allocate or something like this.

Someone else might chip in here.

0reactions
stale[bot]commented, Apr 20, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Troubleshoot - Hugging Face
Troubleshoot. Sometimes errors occur, but we are here to help! This guide covers some of the most common issues we've seen and how...
Read more >
Can't load TF transformer model with keras ... - Stack Overflow
It is the incompatibility of TensorFlow versions between the trained model and the TensorFlow version you are loading the model.
Read more >
Neural machine translation with a Transformer and Keras | Text
Setup. Begin by installing TensorFlow Datasets for loading the dataset and TensorFlow Text for text preprocessing:.
Read more >
TensorFlow and Transformers - Towards Data Science
TensorFlow support in the transformers library came later than ... But, BERT uses a predefined set of mappings — hence why we loaded...
Read more >
Saving and loading models in TensorFlow - KDnuggets
How to save deep learning models in TensorFlow 2, and different types, ... Let's load important python libraries and dataset first.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found