Error running katib on latest master (04/13)
See original GitHub issueAfter deploying katib following getting started guide, I’ve seen the following errors:
$ kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
katib dlk-manager-698ccb5fdc-hb7xc 0/1 CrashLoopBackOff 6 13m
katib modeldb-backend-6855d95fb4-2sxw9 1/1 Running 0 14m
katib modeldb-db-6cf5bb764-5s65f 1/1 Running 0 14m
katib modeldb-frontend-5868bffc64-rhrr7 1/1 Running 0 14m
katib vizier-core-86c5566c88-kvsp9 0/1 CrashLoopBackOff 6 13m
katib vizier-db-64557596dc-mpgh4 1/1 Running 0 13m
katib vizier-suggestion-random-6b4d6db6-m8l94 0/1 CrashLoopBackOff 6 13m
kube-system kube-dns-5c6c5b55b-qmd9l 3/3 Running 0 16m
I’ve managed to get it running; it turns out the command is not correct. For example, I have to change this:
spec:
serviceAccountName: vizier-core
containers:
- name: vizier-core
image: katib/vizier-core
args:
- "-w"
- "dlk"
ports:
- name: api
containerPort: 6789
to
spec:
serviceAccountName: vizier-core
containers:
- name: vizier-core
image: katib/vizier-core
args:
- ./vizier-manager <-- add this line
- "-w"
- "dlk"
ports:
- name: api
containerPort: 6789
However, based on docker file for vizier-core, vizier-manager
is already set as entrypoint,
FROM golang:alpine AS build-env
# The GOPATH in the image is /go.
ADD . /go/src/github.com/kubeflow/hp-tuning
WORKDIR /go/src/github.com/kubeflow/hp-tuning/manager
RUN go build -o vizier-manager
FROM alpine:3.7
WORKDIR /app
COPY --from=build-env /go/src/github.com/kubeflow/hp-tuning/manager/vizier-manager /app/
COPY --from=build-env /go/src/github.com/kubeflow/hp-tuning/manager/visualise /
ENTRYPOINT ["./vizier-manager"]
CMD ["-w", "dlk"]
Anything wrong with the above 👆 setup?
Issue Analytics
- State:
- Created 5 years ago
- Comments:7 (7 by maintainers)
Top Results From Across the Web
Not getting in to the container - Pod terminating immediately
The problem is that katib doesnot get into the container [main training code]. The pipeline executes successfully but the best parameter JSON is ......
Read more >Getting Started with Katib - Kubeflow
This guide shows how to get started with Katib and run a few examples using the command line and the Katib user interface...
Read more >Hyperparameter Tuning with Katib - YouTube
Getting started with Katib → https://goo.gle/3hI9sFi How do you go about setting the number of epochs you want to train?
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Hi @ddysher . Do you use
katib/~
docker images? I’m sorry I didn’t update the images the latest version. It is not automated… I updated the images. Please retry. If you will still have a problem, show me the log of vizier-core.kubectl -n katib logs deploy/vizier-core
sure