Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Loading tensorflow first and then loading transformers errors

See original GitHub issue

🐛 Bug

Information

Model I am using: Bert

Language I am using the model on (English, Chinese …):

The problem arises when using:

the official example scripts: (give details below)
my own modified scripts: (give details below)

The tasks I am working on is:

an official GLUE/SQUaD task: (give the name)
my own task or dataset: (give details below)

To reproduce

Steps to reproduce the behavior:

Run:

import tensorflow as tf
from transformers import TFBertForSequenceClassification
model = TFBertForSequenceClassification.from_pretrained('/path/to/my/tf/model/', from_pt = True)

Will produce the following output (with error):

>>> import tensorflow as tf
2020-02-20 09:36:51.035083: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libnvinfer.so.6
2020-02-20 09:36:51.036337: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libnvinfer_plugin.so.6
>>> from transformers import TFBertForSequenceClassification
>>> model = TFBertForSequenceClassification.from_pretrained('/path/to/my/tf/model/', from_pt = True)
2020-02-20 09:36:52.226797: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2020-02-20 09:36:52.230595: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one
 NUMA node, so returning NUMA node zero
2020-02-20 09:36:52.231392: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 0 with properties: 
pciBusID: 0000:02:02.0 name: GRID RTX6000-24Q computeCapability: 7.5
coreClock: 1.77GHz coreCount: 72 deviceMemorySize: 23.88GiB deviceMemoryBandwidth: 625.94GiB/s
2020-02-20 09:36:52.231447: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-02-20 09:36:52.231475: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-02-20 09:36:52.233199: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2020-02-20 09:36:52.233465: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2020-02-20 09:36:52.234866: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2020-02-20 09:36:52.235660: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2020-02-20 09:36:52.235707: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-02-20 09:36:52.235845: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one
 NUMA node, so returning NUMA node zero
2020-02-20 09:36:52.236261: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-02-20 09:36:52.236765: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1697] Adding visible gpu devices: 0
2020-02-20 09:36:52.237022: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2020-02-20 09:36:52.241987: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3192500000 Hz
2020-02-20 09:36:52.242277: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0xeb8bae0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-02-20 09:36:52.242294: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2020-02-20 09:36:52.435669: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-02-20 09:36:52.436129: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0xec01900 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2020-02-20 09:36:52.436153: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): GRID RTX6000-24Q, Compute Capability 7.5
2020-02-20 09:36:52.436350: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-02-20 09:36:52.436672: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 0 with properties: 
pciBusID: 0000:02:02.0 name: GRID RTX6000-24Q computeCapability: 7.5
coreClock: 1.77GHz coreCount: 72 deviceMemorySize: 23.88GiB deviceMemoryBandwidth: 625.94GiB/s
2020-02-20 09:36:52.436706: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-02-20 09:36:52.436716: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-02-20 09:36:52.436744: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2020-02-20 09:36:52.436755: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2020-02-20 09:36:52.436765: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2020-02-20 09:36:52.436774: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2020-02-20 09:36:52.436781: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-02-20 09:36:52.436861: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-02-20 09:36:52.437204: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-02-20 09:36:52.437493: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1697] Adding visible gpu devices: 0
2020-02-20 09:36:52.437528: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-02-20 09:36:52.936429: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1096] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-02-20 09:36:52.936466: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102]      0 
2020-02-20 09:36:52.936474: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] 0:   N 
2020-02-20 09:36:52.936737: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-02-20 09:36:52.937283: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-02-20 09:36:52.937654: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1241] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 21423 MB memory) -> physical GPU (device: 0, name: GRID RTX6000-24Q, pci bus id: 0000:02:02.0, compute capability: 7.5)
2020-02-20 09:36:54.066446: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-02-20 09:36:54.066688: E tensorflow/stream_executor/cuda/cuda_blas.cc:238] failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED
2020-02-20 09:36:54.066725: E tensorflow/stream_executor/cuda/cuda_blas.cc:238] failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED
2020-02-20 09:36:54.066732: W tensorflow/stream_executor/stream.cc:2041] attempting to perform BLAS operation using StreamExecutor without BLAS support
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/my_lib/python3.7/site-packages/transformers/modeling_tf_utils.py", line 345, in from_pretrained
    return load_pytorch_checkpoint_in_tf2_model(model, resolved_archive_file, allow_missing_keys=True)
  File "/my_lib/python3.7/site-packages/transformers/modeling_tf_pytorch_utils.py", line 93, in load_pytorch_checkpoint_in_tf2_model
    tf_model, pt_state_dict, tf_inputs=tf_inputs, allow_missing_keys=allow_missing_keys
  File "/my_lib/python3.7/site-packages/transformers/modeling_tf_pytorch_utils.py", line 125, in load_pytorch_weights_in_tf2_model
    tf_model(tf_inputs, training=False)  # Make sure model is built
  File "/my_lib/python3.7/site-packages/tensorflow_core/python/keras/engine/base_layer.py", line 822, in __call__
    outputs = self.call(cast_inputs, *args, **kwargs)
  File "/my_lib/python3.7/site-packages/transformers/modeling_tf_bert.py", line 916, in call
    outputs = self.bert(inputs, **kwargs)
  File "/my_lib/python3.7/site-packages/tensorflow_core/python/keras/engine/base_layer.py", line 822, in __call__
    outputs = self.call(cast_inputs, *args, **kwargs)
  File "/my_lib/python3.7/site-packages/transformers/modeling_tf_bert.py", line 567, in call
    encoder_outputs = self.encoder([embedding_output, extended_attention_mask, head_mask], training=training)
  File "/my_lib/python3.7/site-packages/tensorflow_core/python/keras/engine/base_layer.py", line 822, in __call__
    outputs = self.call(cast_inputs, *args, **kwargs)
  File "/my_lib/python3.7/site-packages/transformers/modeling_tf_bert.py", line 376, in call
    layer_outputs = layer_module([hidden_states, attention_mask, head_mask[i]], training=training)
  File "/my_lib/python3.7/site-packages/tensorflow_core/python/keras/engine/base_layer.py", line 822, in __call__
    outputs = self.call(cast_inputs, *args, **kwargs)
  File "/my_lib/python3.7/site-packages/transformers/modeling_tf_bert.py", line 352, in call
    attention_outputs = self.attention([hidden_states, attention_mask, head_mask], training=training)
  File "/my_lib/python3.7/site-packages/tensorflow_core/python/keras/engine/base_layer.py", line 822, in __call__
    outputs = self.call(cast_inputs, *args, **kwargs)
  File "/my_lib/python3.7/site-packages/transformers/modeling_tf_bert.py", line 301, in call
    self_outputs = self.self_attention([input_tensor, attention_mask, head_mask], training=training)
  File "/my_lib/python3.7/site-packages/tensorflow_core/python/keras/engine/base_layer.py", line 822, in __call__
    outputs = self.call(cast_inputs, *args, **kwargs)
  File "/my_lib/python3.7/site-packages/transformers/modeling_tf_bert.py", line 230, in call
    mixed_query_layer = self.query(hidden_states)
  File "/my_lib/python3.7/site-packages/tensorflow_core/python/keras/engine/base_layer.py", line 822, in __call__
    outputs = self.call(cast_inputs, *args, **kwargs)
  File "/my_lib/python3.7/site-packages/tensorflow_core/python/keras/layers/core.py", line 1131, in call
    outputs = standard_ops.tensordot(inputs, self.kernel, [[rank - 1], [0]])
  File "/my_lib/python3.7/site-packages/tensorflow_core/python/ops/math_ops.py", line 4106, in tensordot
    ab_matmul = matmul(a_reshape, b_reshape)
  File "/my_lib/python3.7/site-packages/tensorflow_core/python/util/dispatch.py", line 180, in wrapper
    return target(*args, **kwargs)
  File "/my_lib/python3.7/site-packages/tensorflow_core/python/ops/math_ops.py", line 2798, in matmul
    a, b, transpose_a=transpose_a, transpose_b=transpose_b, name=name)
  File "/my_lib/python3.7/site-packages/tensorflow_core/python/ops/gen_math_ops.py", line 5616, in mat_mul
    _ops.raise_from_not_ok_status(e, name)
  File "/my_lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py", line 6606, in raise_from_not_ok_status
    six.raise_from(core._status_to_exception(e.code, message), None)
  File "<string>", line 3, in raise_from
tensorflow.python.framework.errors_impl.InternalError: Blas GEMM launch failed : a.shape=(15, 768), b.shape=(768, 768), m=15, n=768, k=768 [Op:MatMul] name: tf_bert_for_sequence_classification/bert/encoder/layer_._0/attention/self/query/Tensordot/MatMul/
>>>

However, if I load transformers first and then load tensorflow, there is no problem… (Output from console):

>>> from transformers import TFBertForSequenceClassification
2020-02-20 09:40:54.413603: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libnvinfer.so.6
2020-02-20 09:40:54.414946: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libnvinfer_plugin.so.6
>>> import tensorflow as tf
>>> model = TFBertForSequenceClassification.from_pretrained('/path/to/my/tf/model/', from_pt = True)
2020-02-20 09:40:55.402943: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2020-02-20 09:40:55.407404: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one
 NUMA node, so returning NUMA node zero
2020-02-20 09:40:55.407771: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 0 with properties: 
pciBusID: 0000:02:02.0 name: GRID RTX6000-24Q computeCapability: 7.5
coreClock: 1.77GHz coreCount: 72 deviceMemorySize: 23.88GiB deviceMemoryBandwidth: 625.94GiB/s
2020-02-20 09:40:55.407828: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-02-20 09:40:55.407858: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-02-20 09:40:55.409288: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2020-02-20 09:40:55.409560: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2020-02-20 09:40:55.410954: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2020-02-20 09:40:55.411852: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2020-02-20 09:40:55.411906: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-02-20 09:40:55.412020: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one
 NUMA node, so returning NUMA node zero
2020-02-20 09:40:55.412437: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-02-20 09:40:55.412712: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1697] Adding visible gpu devices: 0
2020-02-20 09:40:55.412957: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2020-02-20 09:40:55.417720: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3192500000 Hz
2020-02-20 09:40:55.417908: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5be91f0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-02-20 09:40:55.417927: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2020-02-20 09:40:55.604909: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-02-20 09:40:55.605396: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5cc07b0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2020-02-20 09:40:55.605419: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): GRID RTX6000-24Q, Compute Capability 7.5
2020-02-20 09:40:55.605632: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-02-20 09:40:55.605947: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 0 with properties: 
pciBusID: 0000:02:02.0 name: GRID RTX6000-24Q computeCapability: 7.5
coreClock: 1.77GHz coreCount: 72 deviceMemorySize: 23.88GiB deviceMemoryBandwidth: 625.94GiB/s
2020-02-20 09:40:55.605984: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-02-20 09:40:55.606000: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-02-20 09:40:55.606032: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2020-02-20 09:40:55.606045: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2020-02-20 09:40:55.606058: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2020-02-20 09:40:55.606070: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2020-02-20 09:40:55.606080: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-02-20 09:40:55.606159: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-02-20 09:40:55.606493: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-02-20 09:40:55.606763: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1697] Adding visible gpu devices: 0
2020-02-20 09:41:00.803464: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1096] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-02-20 09:41:00.803503: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102]      0 
2020-02-20 09:41:00.803509: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] 0:   N 
2020-02-20 09:41:00.803804: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-02-20 09:41:00.804291: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-02-20 09:41:00.804643: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1241] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 20754 MB memory) -> physical GPU (device: 0, name: GRID RTX6000-24Q, pci bus id: 0000:02:02.0, compute capability: 7.5)
>>>

Expected behavior

Environment info

transformers version: 2.5.0
Platform: Linux
Python version: 3.7.5
PyTorch version (GPU?):
Tensorflow version (GPU?): 2.1.0
Using GPU in script?:Yes
Using distributed or parallel set-up in script?: No

Issue Analytics

State:
Created 4 years ago
Reactions:1
Comments:5

Top GitHub Comments

1reaction

BramVanroycommented, Feb 20, 2020

I don’t use Tensorflow daily (I use PyTorch), but my far-fetched guess would be that because of the loading order, in one case two TF sessions are created which both do Created TensorFlow device (you can see that in the trace). That might, then, cause that device to not be able to distinguish the sessions or run out of memory to allocate or something like this.

Someone else might chip in here.

0reactions

stale[bot]commented, Apr 20, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.