Katib experiments run indefintely without completing a single trial
See original GitHub issue/kind bug
Hi, I’m setting a Katib job through the Kale deployment panel - after creating a Kale pipeline. The pipeline builds successfully but the Katib experiments run forever and don’t complete a single trial.
I expect the Katib jobs to run successfully, but to no avail.
Any way/suggestion to go about this?
Environment:
- Kubeflow version (
kfctl version
): - Minikube version (
minikube version
): - Kubernetes version: (use
kubectl version
): - OS (e.g. from
/etc/os-release
):
Issue Analytics
- State:
- Created 3 years ago
- Reactions:1
- Comments:39 (15 by maintainers)
Top Results From Across the Web
UI doesn't show graph and all trials stay in running state - GitHub
It's almost like katib is not getting the return code (0 or otherwise) from the spawned process so things remain in the running...
Read more >Running an Experiment - Kubeflow
You can run the experiment without specifying the goal . In that case, Katib runs the experiment until the corresponding successful trials ......
Read more >Educational Learning Theories: 2nd Edition
A student is not completing homework assignments. The teacher and the ... Nabi and Clark (2008) conducted experiments about individual's.
Read more >Duration of Adjuvant Aromatase-Inhibitor Therapy in ...
The primary analysis included all the patients who were still participating in the trial and who had no recurrence 2 years after ...
Read more >Japan, South Korea can stop GMO testing -wheat group official
Japan and South Korea are continuing to test the U.S. wheat they buy to make sure the grain is not contaminated with an...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@andreyvelich I’ve been able to figure out what to do for my version of Katib - using the
goTemplate
on mytrialTemplate
withapiVersion: batch/v1
andkind: Job
for CatBoost and other sklearn models (unlike the recent version of Katib). Will close this issue now but may re-open it if another issue occurs with my Katib version.@andreyvelich No, stopped working with the Kale deployment panel after I reported the problem. I’ve been making use of yaml scripts in the Katib UI on my Kubeflow cluster. And the command gave this: