Google colab tpu_driver: DEADLINE_EXCEEDED
See original GitHub issueAs of this morning, this nerfies training colab notebook was working. For some reason, since a couple of hours, executing this cell:
# @title Configure notebook runtime
# @markdown If you would like to use a GPU runtime instead, change the runtime type by going to `Runtime > Change runtime type`.
# @markdown You will have to use a smaller batch size on GPU.
runtime_type = 'tpu' # @param ['gpu', 'tpu']
if runtime_type == 'tpu':
import jax.tools.colab_tpu
jax.tools.colab_tpu.setup_tpu()
print('Detected Devices:', jax.devices())
now delivers an error:
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-2-4e527b212d00> in <module>()
8 jax.tools.colab_tpu.setup_tpu()
9
---> 10 print('Detected Devices:', jax.devices())
2 frames
/usr/local/lib/python3.7/dist-packages/jax/_src/lib/xla_bridge.py in devices(backend)
312 List of Device subclasses.
313 """
--> 314 return get_backend(backend).devices()
315
316
/usr/local/lib/python3.7/dist-packages/jax/_src/lib/xla_bridge.py in get_backend(platform)
256 @lru_cache(maxsize=None) # don't use util.memoize because there is no X64 dependence.
257 def get_backend(platform=None):
--> 258 return _get_backend_uncached(platform)
259
260
/usr/local/lib/python3.7/dist-packages/jax/_src/lib/xla_bridge.py in _get_backend_uncached(platform)
246 if backend is None:
247 if platform in _backends_errors:
--> 248 raise RuntimeError(f"Requested backend {platform}, but it failed "
249 f"to initialize: {_backends_errors[platform]}")
250 raise RuntimeError(f"Unknown backend {platform}")
RuntimeError: Requested backend tpu_driver, but it failed to initialize: DEADLINE_EXCEEDED: Failed to connect to remote server at address: grpc://10.113.198.178:8470. Error from gRPC: Deadline Exceeded. Details:
I tried changing the TPU driver, by following the recommendation of https://github.com/google/jax/issues/4408 :
import tensorflow as tf
from tf.python.tpu.client.client import Client
c = Client()
c.configure_tpu_version("tpu_driver0.1-dev20200320", restart_type='ifNeeded')
c.wait_for_healthy()
to which the back end never responded:
WARNING:root:Waiting for TPU "grpc://10.36.75.242:8470" with state "None" and health "None" to become healthy
WARNING:root:Waiting for TPU "grpc://10.36.75.242:8470" with state "None" and health "None" to become healthy
WARNING:root:Waiting for TPU "grpc://10.36.75.242:8470" with state "None" and health "None" to become healthy
WARNING:root:Waiting for TPU "grpc://10.36.75.242:8470" with state "None" and health "None" to become healthy
WARNING:root:Waiting for TPU "grpc://10.36.75.242:8470" with state "None" and health "None" to become healthy
I did not change anything to make the code break from morning to this afternoon. I made sure it’s running a TPU, I factory reseted the instance, and I even tried with another Google account.
Issue Analytics
- State:
- Created 2 years ago
- Comments:8
Top Results From Across the Web
Colab TPU setup fails with nightly driver #8472 - google/jax
Please: Check for duplicate issues. Provide a complete example of how to reproduce the bug, wrapped in triple backticks like this: import ...
Read more >How to read logs before Deadline Exceeded on Init TPU system
Is there a way to track what is going on behind the scenes with a tf.debugger or something similar? This is the only...
Read more >Colab notebooks | Cloud TPU
Java is a registered trademark of Oracle and/or its affiliates. Why Google. Choosing Google Cloud ...
Read more >Step-by-Step Use of Google Colab's Free TPU - Heartbeat
Google Colab TPU Free Service · Basic TensorFlow Functions Required To Use TPU · Convolutional Neural Network: CNN Trained on MNIST Dataset ·...
Read more >Ever wondered what GPU or TPU Google Colab provides?
nvidia-smi and run on Colab (only if you have an NVIDIA driver installed). The free version of Colab mostly provides a Tesla K80...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
#8485 adds the ability to specify a TPU driver version within
setup_tpu()
.I got the same issues. In my case, I turned off my VPN (surfshark) and re-run the terminal. Then, it worked. I’m not sure VPN is caused by the errors, but it’s considerable.
jax.tools.colab_tpu.setup_tpu() print("TPU: ", jax.devices())