question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Tuner intermittently failing

See original GitHub issue

If the bug is related to a specific library below, please raise an issue in the respective repo directly:

TensorFlow Data Validation Repo

TensorFlow Model Analysis Repo

TensorFlow Transform Repo

TensorFlow Serving Repo

System information

  • Have I specified the code to reproduce the issue (Yes, No): Yes
  • Environment in which the code is executed (e.g., Local(Linux/MacOS/Windows), Interactive Notebook, Google Cloud, etc): Kubeflow through Vertex
  • TensorFlow version: 2.7
  • TFX Version: 1.6.1
  • Python version: 3.7
  • Python dependencies (from pip freeze output):

Describe the current behavior Tuner intermittently failing.

Describe the expected behavior Shouldn’t fail

Other info / logs Error Best HyperParameters: {‘space’: [{‘class_name’: ‘Choice’, ‘config’: {‘name’: ‘learning_rate’, ‘default’: 0.0001, ‘conditions’: [], ‘values’: [0.0001, 0.001, 0.01, 0.1, 0.2], ‘ordered’: True}}], ‘values’: {‘learning_rate’: 0.01}} "

Error Best Hyperparameters are written to gs://…/Tuner_Logistic_Regression_2754698840043945984/best_hyperparameters/best_hyperparameters.txt. "

Error Terminating chief oracle at PID: 16 "

Error Terminating chief oracle at PID: 16 "

I was finding approximately 1 in every 5 runs were failing with the above logs in Vertex. Looking into the issue further I noticed I had a strange setup in my code:

My Vertex Tuner had num_parallel_trials set as 3 as below: return tfx.extensions.google_cloud_ai_platform.Tuner( module_file=model_trainer, examples=transform.outputs['transformed_examples'], transform_graph=transform.outputs['transform_graph'], schema=schema, train_args=tfx.proto.TrainArgs(num_steps=train_num_steps), eval_args=tfx.proto.EvalArgs(num_steps=eval_num_steps), tune_args=tfx.proto.TuneArgs( # num_parallel_trials=3 means that 3 search loops are # running in parallel. num_parallel_trials=3), custom_config=custom_config).with_id(tuner_id)

But where I was just trying to keep processing time to a minimum while trying out TFX and Vertex I set my Tuner’s max_trails to 2. So less than the num_parallel_trials: tuner = kt.RandomSearch( hypermodel=hypermodel, max_trials=2, hyperparameters=hyperparams, seed=123, allow_new_entries=False, objective=kt.Objective('val_binary_accuracy', 'max'), directory=fn_args.working_dir, project_name=project_name)

I’ve been able to stop the issue by increasing the max_trials to 3, but ideally this wouldn’t be necessary or some kind of warning / error describing the issue with the setup.

Issue Analytics

  • State:open
  • Created a year ago
  • Reactions:1
  • Comments:5 (1 by maintainers)

github_iconTop GitHub Comments

3reactions
SylvainGavoillecommented, May 5, 2022

To complete the issue, here is an example of code reproducing the problem. The phenomenon is random and occurs one time over 6 pipeline runs with the code presented in this notebook.

Thank you for your help.

0reactions
tanguycdlscommented, May 16, 2022

Hello @pindinagesh, could you take a look @SylvainGavoille did a reproducible example that fails one over 6 pipeline runs. Can you take a look and tell us if it’s an issue in KerasTuner side or here ? but the current python code fails in TFX side.

Thanks,

Read more comments on GitHub >

github_iconTop Results From Across the Web

Possible Problems with the TV Tuner and How to Fix Them
Make sure that there are no loose cables because they may cause weak or absent signal.
Read more >
5 Ways To Fix Snark Tuner Not Working - CMUSE
There are many reasons why the Snark tuner may not work on your guitar. Fortunately, you can fix this problem by following some...
Read more >
5 Things To Try If Your Satellite Radio Keeps Cutting Out
2. Failing Tuner ... Your satellite radio tuner is responsible for receiving the signal from the satellites. (Read more about how the process...
Read more >
Why Your Car Stereo Only Works Sometimes - Lifewire
These are all faults that can cause an intermittent failure, where the car stereo will sometimes work and sometimes not work, ...
Read more >
Maddeningly intermittent Adcom tuner - AudioKarma
I've been trying to fix an Adcom GTP-600 tuner section that fails in a peculiar way: it loses the ability to tune just...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found