question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

ResourceExhaustedError after several iterations in a grid search

See original GitHub issue

First off, make sure to check your support options.

The preferred way to resolve usage related matters is through the docs which are maintained up-to-date with the latest version of Talos.

If you do end up asking for support in a new issue, make sure to follow the below steps carefully.

1) Confirm the below

  • I have looked for an answer in the Docs
  • My Python version is 3.5 or higher
  • I have searched through the issues Issues for a duplicate
  • I’ve tested that my Keras model works as a stand-alone

2) Include the output of:

talos.__version__ == 0.6.7

3) Explain clearly what you are trying to achieve

I am running a grid search that gives 36 rounds. After about 4 or 5 rounds, during a model.fit I suddenly get hit by a ResourceExhaustedError. I think this is very odd given that I am able to complete at least 3 rounds of fitting on the GPU (with a model and batch size that takes up pretty much all the gpu memory), so it seems that there is a small but significant memory leak somewhere. Any ideas what it could be?

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:33 (12 by maintainers)

github_iconTop GitHub Comments

1reaction
bjtho08commented, Apr 23, 2020

Sure! I use custom keras.utils.Sequence data generators, so I have two dummy variables for my scan command as shown below:

  dummy_x = np.empty((1, BATCH_SIZE, 208, 208))
   dummy_y = np.empty((1, BATCH_SIZE))

   scan_object = ta.Scan(
       x=dummy_x,
       y=dummy_y,
       disable_progress_bar=False,
       print_params=True,
       model=talos_model,
       params=p,
       experiment_name="talos/" + date_string,
       reduction_method='gamify',
   )

I will take a look at talos 1.0 right away!

1reaction
bjtho08commented, Apr 23, 2020

I would love to, but that option crashes my python kernel, so it’s not really possible. This is a long-standing Keras bug, I believe.

Read more comments on GitHub >

github_iconTop Results From Across the Web

TensorFlow OOM when looping over multiple experiments
ResourceExhaustedError : OOM when allocating tensor. The models are all the same with a grid search on the learning rate.
Read more >
tf.keras.backend.clear_session | TensorFlow v2.11.0
Keras starts with a blank state at each iteration # and memory consumption is constant over time. tf.keras.backend.clear_session() model = tf.keras.
Read more >
OOM when allocating tensor with shape[128,8,21]....
I was on Epoch 1 / 100 and 2054 / 20736 iterations when it crashed with this message. OS: Windows 10. CUDA v10....
Read more >
Your First Deep Learning Project in Python with Keras Step-by ...
The model will always have some error, but the amount of error will level out after some point for a given model configuration....
Read more >
Hyperparameter Tuning - Intro to Deep Learning
There are no set rules for choosing many of these hyperparameters, ... This can be done in many ways, such as through a...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found