ResourceExhaustedError when using TPU
See original GitHub issueI have a few notebooks on Colab Pro which use TPU and worked perfectly a day ago, but now everything crashes
ResourceExhaustedError: 9 root error(s) found.
(0) Resource exhausted: {{function_node __inference_train_function_453721}} Compilation failure: Ran out of memory in memory space hbm. Used 16.79G of 7.48G hbm. Exceeded hbm capacity by 9.31G.
Is there any changelog where I can see what did change in Colab, or this something to do with the TPU infrastructure?
I can make it work by reducing the batch size but it has to be reduced like twice making models train at least 2x slower.
I’ve noticed that TensorFlow started to show strange warnings:
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow/python/data/ops/multi_device_iterator_ops.py:601: get_next_as_optional (from tensorflow.python.data.ops.iterator_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Iterator.get_next_as_optional()` instead.
WARNING:tensorflow:Callbacks method `on_train_batch_end` is slow compared to the batch time (batch time: 0.0340s vs `on_train_batch_end` time: 0.4056s). Check your callbacks.
Issue Analytics
- State:
- Created 3 years ago
- Reactions:7
- Comments:22 (4 by maintainers)
Top Results From Across the Web
[TPU/GPU] Resource Exhausted Error, but I don't have a large ...
For some reason, when trying to define my model using transfer learning, I always get a ResourceExhaustedError, no matter how low I try...
Read more >Use TPU in Google Colab - python - Stack Overflow
You need to create TPU strategy: strategy = tf.distribute.TPUStrategy(resolver). And than use this strategy properly:
Read more >Troubleshooting TensorFlow - TPU - Google Cloud
ResourceExhaustedError: Ran out of memory in memory space hbm; used: YYY; limit: 7.48G. Frameworks and Configurations Affected.
Read more >"ResourceExhaustedError: received trailing metadata size ...
Hi! This is my first time training with a TPU in Colab and I am facing an error I have never seen before....
Read more >Resource exhausted: OOM when allo… - Apple Developer
ResourceExhaustedError: 2 root error(s) found. (0) Resource exhausted: OOM when allocating tensor with shape[256,384,3072] and type float on ...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
For anyone looking to help get this fixed: making comments on the upstream issue is most helpful.
For anyone looking to use TF 2.2 with a TPU for now, this should get you unblocked:
@gena Great! I am glad it is working for you now!